Skip to content
Snippets Groups Projects
Forked from emulab / emulab-devel
24546 commits behind the upstream repository.
arch.txt 17.81 KiB
#
#
# EMULAB-COPYRIGHT
# Copyright (c) 2003 University of Utah and the Flux Group.
# All rights reserved.
#
======================
Emulab Source Tree Map
======================

This file documents roughly the contents of our source tree as of
April, 2003. Some of the entries in here are per-script, others are
for a group of scripts, in which case the documentation inside the
individual scripts should be sufficient explanation. The end of the
file also has some overview-ish stuff about abstractions and things
like that.

[This file maintained by testbed-ops@emulab.net]

For big picture and some details, read the OSDI'02 paper, in
doc/papers/netbed-osdi02* and on the Web.

Accounts
 - unix accounts
   - unix group management (per-proj and per-group)
 - ssh key distribution
 - sfs key distribution
 - account permissions (web only, ron/wa, root/non-root, etc.)
 - emulab permissions
   - control of hardware/hw config.
   - administrative control
 - hierarchical organization
   - delegation at all levels
   - trust models and their security impact

Assign (resource allocation algorithms)
 - the Testbed Mapping Problem: read draft of our upcoming CCR paper in doc/papers
   - NP Hard
   - in some ways, constraint satisfaction problem
     - but more, because not all satisfactory solutions are equal
   - time constraints: we're an interactive system, and need to
     perform on interactive timescales - a few seconds max to get a
     good answer
   - variation in wide area
     - soft matching
     - complicated more by fact that we can combine the unknown
       (wide-area link) with something we control (traffic shaping)
 - Emulab solution
   - many "valid" solutions, but difference between near-optimal and
     random valid soln. is huge and important
   - sim. annealing core
   - highly optimized
     - clever domain specific tricks
     - main purpose is to conserve scarce resources (nodes,
       interswitch bandwidth, soon special hw like GigE)
   - lots of parameters, not always clear how to tune them
 - Netbed solution
   - typically no exact match, just some that may be closer than
     others - very fuzzy matching
   - genetic algo. core
   - not as highly developed yet, but meets our needs
   - main purpose is to find a real-world overlay that matches the
     supplied topology as closely as possible

Capture/console (node consoles - "'zero-penalty' remote research")
 - serial line consoles to nodes replace kbd/vga
 - fine-grained access control
   - changes quickly when node changes "ownership"
 - simple, secure remote access
   - ACLs, authenticated ssl tunnel program + standard telnet client

CD-ROM (remote node mgmt/robustness, adding nodes to the system)
 - simple to add a node
 - fallback boot method (CD-ROM) when disk is hozed
 - path for self-update and disk reimaging
 - goal is to reduce need for human intervention whenever possible

Database (centralized store for persistent shared system state)
 - lots o' stuff here
 - most stuff falls into one of several categories
   - semi-permanant hw setup info (wires, ifaces, nodes, outlets)
   - current hardware configs (reservations, ifaces, vlans, etc)
   - semi-permanant sw setup info (disk images, OS's, etc.)
   - current sw setup (traffic shaping, trafgen, routing, etc.)
   - virtualized expt info (topology, config, etc)
   - administrative info (users, groups, projects, etc.)
   - misc. config bits and logging
 - sw engineering issues
   - db schema must match sw build

IXP (special hw resources) [not released due to Intel license restrictions]
 - use as testbed infrastructure
   - traffic shaping
 - use for experimentation
   - shared facil. gives more people access, increases usage
   - emulab is good environment w/many tools

Event system (distributed event coordination/communication)
 - "Elvin" publish/subscribe system underneath (imported from elsewhere)
 - used in several directions
   - emulab to nodes/programs
   - nodes to emulab
   - programs on emulab server to each other
   - can be nodes to nodes too
 - delay agent
   - coordinated control of traffic shaping
   - changes can initiate anywhere
     - automatic timed changes from emulab
     - manual changes from emulab server or a node
   - allows for reactive traffic shaping, trace playback, etc.
 - nsetrafgen
   - control of NSE simulators and their traffic generation
 - program agent
   - start/stop arbitrary program
   - timed or manual, and allows reactivity
 - event scheduler
   - controls timed events
   - may be submitted apriori or during a run
 - stated uses it heavily, but is described elsewhere
 - tevc/tevd
   - simple command line client for use on any server or node
 - trafgen
   - traffic generation via TG toolkit
   - patched to allow control via events

install (emulab cluster site configuration tools)
 - for making more emulabs
 - mostly automated install process
   - FreeBSD "port"/"meta-port"-style install script
   - installs dependencies as needed
   - performs emulab-specific install tasks
 - one for configuring a "boss" node (secure server)
 - one for configuring an "ops" node (public server)

ipod/apod (node control without power control hardware)
 - "ICMP Ping-Of-Death" and big brother, "Authenticated Ping-Of-Death"
 - reboot pingable but hung node without external intervention
 - adds robustness and greater control
 - especially important where only other alternative is a human
Libaries (Software engineering?)
 - shared constants
 - common interfaces
 - database routines and abstractions
 - important for robust, maintainable software

OS tools (disk images, etc)
 - management of disk contents
 - image creation
   - imagezip
     - lots of cool tricks here - read the frisbee paper
 - image distribution/installation
   - frisbee
     - lots to say here... read the paper in USENIX'03 and doc/papers
   - growdisk - partition management on heterogeneous nodes
 - deltas
   - deprecated - dump/restore
   - with our incredible disk image tools, it is way faster to 
     just reload the disk instead of checking it first
 - tarfile installation
   - easy changes without forcing a customized disk image

PXE/DHCP - node boot process
 - automatic database-driven control of nodes
 - can't assume anything about the disk
 - node always boots off of PXE so we get control 
 - talk to the database (via bootinfo)
   - may be told to boot a tftp kernel or a specific partition
   - tftp kernels (often with Memory file systems) used for:
     - disk image creation/installation
     - NetBoot
     - OSKit kernels
 - in emulab disk images, nodes self-configure using a pull model
   - see also TMCD
 - progress monitored by stated

Security
 - always conscious of threat model
 - segregate public server (ops)
 - limited shells on secure server
   - secure server trusted by all nodes
     - emulab performs config tasks on behalf of user
 - plasticwrap/paperbag - transparently run commands on secure server
 - suexec during web execution adds extra layer of security and
   permission checks
 - lastlogs 
   - track logins on servers and nodes, report into main db
 - giving away root on the nodes causes issues 
 - passwords
   - we enforce good ones via checkpass/cracklib
   - have expirations

Sensors
 - monitor nodes
 - healthd - temperature, etc
 - slothd - activity measurements
   - detect tty, network, cpu activity and report it
   - low overhead
   - agile 
     - extremely low latency in detecting new activity in an idle node
     - higher latency okay for detecting beginning of inactivity
       - when its active, stay out of the way...

TBSetup 
 - core of testbed software
 - primary focus: expt config tasks
   - and auxiliary functions necessary for expt config stuff
 - assign_wrapper
   - interface between db data representation and resource allocation
     algorithms. Call the solver and use the output to set up the
     database state that runs the rest of the process.
 - batch daemon
   - core of a pretty typical batch system
   - allows for more automation
     - submit expt even when no resources are avail., runs later
 - checkports - ?
 - console reset/setup
   - control console access (see also capture section)
 - db2ns - dump our db data rep back into an ns file
 - eventsys start/control
   - start up event schedulers for each expt - see event section
 - exports setup
   - control access to files via NFS on nodes
   - create an /etc/exports file based on current node "ownership" and
     group membership
   - controls access to all home dirs, proj dirs, and group dirs
 - frisbeelauncher
   - wrapper to set up a frisbee server when trying to load a disk
 - libaudit - track requests for certain control actions
 - libtbsetup - see libraries section
 - libtestbed - see libraries section
 - mkgroup/mkproj, rmgroup/rmproj, rmuser
   - manage users, groups, and projects (sync unix world to match db)
 - named_setup
   - set up dns subdomains for each expt
   - create aliases for each node that are consistent across swapins
 - node_control - change node sw setup params (boot params, startup)
 - node_reboot
   - reboot a node as gracefully as possible
   - try 'ssh reboot', IPOD, then power cycle, as needed.
 - node_update - push mounts/accounts changes to nodes
 - nscheck - syntax check an ns file for use in emulab
 - os_load - start a frisbee disk reload
 - os_select - configure node boot params
 - os_setup 
   - major part of expt config
   - db says what nodes should be running, so make it happen
   - may load disks, then reboots nodes and waits for them to come up
 - portstats - diag. tool for switch port counters
 - power - power control program
 - ptopgen - generate description of currently available hw
 - reload_daemon
   - first-cut node manager
   - reload disks when nodes get freed
 - resetvlans - clear any vlans made up of a set of nodes
 - routecalc - generate shortest path routes for a topology
 - sched_reload - set up a disk reload for later
 - sched_reserve - set up a node to go to an expt when freed
 - setgroups - update unix groups file with current membership
 - sfskey update - sync live sfskey config with db config
 - snmpit - SNMP switch control
   - supports multiple switch types
   - configures VLANs into "links" and "LANs" in topologies
   - read other switch data (ie for portstats)
 - startexp/endexp - begin/end experiments
   - wrappers called from web
   - start takes a "new" expt and an ns file
     - prerun it and swap it in, and send mail, leaving "active" expt
   - end takes a expt that is "new", "swapped", "active", or "terminated"
     - swap out if needed, and tbend it, then clean up the last bits
 - staticroutes
   - take db topology info and pass it to routecalc to generate static
     shortest-path routes for the expt. Save result in db.
 - swapexp
   - called from web - swap in, out, or restart an expt.
   - performs some checks, some locking, and calls tbswap or tbrestart
 - tbprerun
   - parse an ns file into the database, fully preparing it for swapin
 - tbswap
   - swap an expt in or out
   - performs a long list of sw/hw setup tasks
 - tbend
   - end an expt that has been swapped out
   - clean out virtual state
 - tbreport
   - dump a report of the experiment's configuration (virt and phys)
 - tbresize
   - older interface for rudimentary expt editing
   - add nodes to an expt, either unconnected or in a LAN
 - tbrestart
   - restart an expt without completely swapping out and back in
   - restart event system, reset ready/startup/boot status, port cntrs
 - vnode_setup
   - called from os_setup
   - configures multiplexed virtual nodes
   - mechanism: ssh runs a script in on the disk
 - wanassign/wanlinksolve (see assign section)
 - wanlinkinfo - display info on wide-area nodes from db
 - checkpass - see security section
 - ns2ir
   - The Parser
   - similar to/based on ns parser
   - rewrote methods to put info into database
   - performs emulab-specific checks
   - we supply a library that they use to get access to
     emulab-specific commands

Testsuite (regression testing - software engineering?)
 - automated system runs lists of tests in different modes
   - modes are levels of reality
 - used for regression testing ("did we break something?") 
   - and development ("does this new thing work?"
 - test mode (aka frontend mode):
   - all scripts run like normal, but whenever something would have
     touched hardware, assume it succeeded, and return
   - doesn't touch nodes/switches, etc, but does all the db changes
 - full mode:
   - reserve some nodes from the testbed
   - set up "redirect" for certain critical daemons
   - set up an alternate db, make our nodes the only free ones
   - run alternate daemons (or live daemons use alt. db for our nodes)
   - entire system runs like normal, but off of a separate installed
     set of scripts
 - very flexible
   - tests can modify db, run arbitrary scripts
   - simple to use in normal case
     - check that normal expt path runs w/o errors
 - work in progress:
   - use full mode to verify accuracy/precision of traffic shaping
   - some parts may evolve to a set of tests that we run quickly at
     after swapping in before turning it over to the user

TMCD - Testbed Master Control Daemon
 - Server for node self-configuration
   - provides controlled access to the database
   - supports a pull model
   - recieves various reports/messages from nodes
 - TMCC - Testbed Master Control Client
   - currently supported on FreeBSD and Linux, and ported to OpenBSD
   - tool for nodes<->emulab communication
   - part of a set of node initialization scripts
 - Node self-configuration process
   - report "I'm alive"
   - update config scripts (currently via sup)
   - run the config, which sets up:
     - interfaces, accounts, mounts, agents, startup programs, testbed
       daemons, installs tarfiles/rpms/etc, starts ntp, traffic shaping,
       virtual nodes, routing (gated/ospf and static/manual routes),
       hostname, /etc/hosts, IPOD/APOD, sfs, etc.
   - used on local nodes and widearea nodes, as well as inside jails

Tools (built for emulab, but useful outside of it too)
 - pcapper
   - traffic visualization tool
   - realtime tcl/tk graph of packets/throughput
   - categorized by traffic types

Visualization
 - graphical view of topologies in the database

Web Interface
 - Main configuration/administrative interface
 - Manage projects, groups, users    
   - edit user info, ssh keys, sfs keys, etc.
   - push account updates to nodes
 - Control nodes/experiments
   - start/end/swap expts
   - control nodes, delays, etc.
   - NetBuild GUI for creating expts/nsfiles
   - node status/monitoring
 - Get info about Emulab/Netbed
   - even download a CD, and get a key to join Netbed
   - all the documentation
     - tutorials, FAQs, etc.
   - publications, photos, some of our users, etc.
 - manage project data
   - disk images, custom OS's, etc.
 - for admins etc, also provides web db access and cvs web access

Stated ("state-dee") - node state management daemon
 - listens for node state events
   - performs triggered actions
   - watches for problems/timeouts
     - sends notifications at times
   - updates the database with current state
 - watches how nodes reboot, reload, etc
 - several "state machines" (operational modes) define what is correct
   - each node is somewhere in some state machine always
 - reports successful boots, reloads, etc.

Netbed Wide-area nodes
 - Most emulab abstractions have netbed wide-area counterpart
   - same methods/abstractions/tools used in LAN or WAN environment
   - easy to switch from a wide-area run to an emulated run (or simulated)
 - Boot process a little different
 - Many parallels to local area case
   - SFS instead of NFS for shared homedirs
   - Can set up links as tunnels with 192.168.* addresses
   - Accounts same (except for rootness)
   - Traffic generation

Simulated Nodes
 - many nodes simulated inside NSE on a single phys. node
 - can interact with real network
 - traffic gen can happen inside
 - links, etc. all work like normal
 - Due to NS limitations/abstractions, lots of things in the real
   world don't have a parallel here

Multiplexed Nodes
 - many nodes run on one physical node, and appear as many individual nodes
 - Implemented with "jail" on FreeBSD, or "____" on Linux
 - Goal to be as close to normal physical nodes as possible
 - creates lots of issues with multiplexing of virtual links onto
   physical links
   - routing, demultiplexing, etc

Cross-cutting Abstractions
 - Four different environments
   - Emulab/emulation (dedicated phys.) nodes, wide-area nodes, 
     simulated nodes, and multiplexed ("virtual") nodes
   - can mix and match in same expt
   - in many cases, same expt can run in any (or several) of the
     environments with few or no changes
 - Nodes
   - Emulated/emulab: dedicated physical nodes in a cluster
     - get root, can reboot, serial console, total control of node
       - including OS, disk imaging, etc.
   - Widearea: shared nodes, geographically distributed
     - get an account (non-root, typically)
     - sometimes get a jail / "virtual server"
     - less control (of OS, rebooting, etc.)
   - Simulated: nodes inside of an NS simulator
     - nodes are simulated, don't run an OS, etc.
     - functionality programmed via NS models
   - Multiplexed: jails / virtual servers on cluster nodes
     - Almost as real as emulation nodes
     - allows bigger scale, risks potential for side-effects
     - same level of control as emulation nodes
 - Links
   - Emulated/emulab:
     - completely controllable network characteristics
     - including LAN speeds or shaped links
     - isolated control network
     - very realistic, predictable, repeatable
   - Widearea:
     - network is the real/raw internet
     - tunnels are optionally configured
     - no separate control network
     - completely realistic, but unpredictable
   - Simulated:
     - links inside NSE (NS Emulator)
     - NSE does shaping
     - real and sim worlds can talk to each other
   - Multiplexed:
     - Same capabilities as normal emulated/emulab links
     - some tricks involved to get everything to work right

---EOF---