Hardware Recommendations
Contents
Purpose of this document
The purpose of this document is to share our experience in selecting the
hardware in Emulab, to aid others who are considering similar projects. Rather
than a recipe for building testbeds, this document gives a set of
recommendations, and outlines the consequences of each choice.
Nodes
- NICs
- At least 2 - one for control net, one for the experimental
network. Our attempts to use nodes with only 1 NIC have not
met with much success. 3 interfaces lets you have delay nodes,
but only linear topologies (no branching) unless you
"multiplex" the physical links. 4 lets you have
(really simple) routers. We opted to go with 5.
The control net
interface should have PXE capability and need only be 100Mb.
All experimental
interfaces should be the same, unless you are purposely going
for a heterogenous environment. The control net interface can
be different. For the control net we have used
Intel Pro100 (fxp) and Pro1000 (em) cards,
as well as builtin Broadcom (bge) and 3Com (xl) devices.
Note that the older the card/motherboard, the more likely
it is that either it will not have PXE or that the PXE
implementation will have bugs, so it's
best to get a sample first and try it out before committing to
a card. Newer (say 2003 or newer) cards will most likely
have a correctly functioning PXE.
Depending on usage, it may be OK to get a large number
of nodes with 2 interfaces to be edge nodes, and a smaller
number with more interfaces to be routers.
- Case/Chassis
- This will depend on your space requirements. The cheapest
option is to buy standard desktop machines, but these are not
very space-efficient. 3U or 4U rackmount cases generally have
plenty of space for PCI cards and CPU cooling fans, but still
may consume too much space for a large-scale testbed. Smaller
cases (1U or 2U) have fewer PCI slots (usually 2 for 2U cases,
and 1 or 2 in 1U cases), and often require custom motherboards
and/or "low profile" PCI cards.
Heat is an issue in smaller cases, as they may not have room
for CPU fans. This limits the processor speed that can be used.
For our first round of machines ("pc600"), we bought standard
motherboards and 4U cases, and assembled the machines ourselves.
For our second round of PCs ("pc850"), we opted for the (now
discontinued) Intel ISP1100 server platform,
which includes a 1U case, custom motherboard with 2 onboard
NICs and serial console redirection, and a power supply.
For the third round of PCs ("pc3000") we got Dell 2850s which sport 2U cases and a custom motherboard
with 2 builtin NICs and serial console redirection.
- CPU
- Take your pick. Note that small case size (ie. 1U) may
limit your options, due to heat issues. Many experiments will
not be CPU bound, but some uses (ie. cluster computing
experiments) will appreciate fast CPUs. Are latest round
of machines have 3Ghz processors.
- Memory
- While many applications don't need much, memory is still
cheap and you should get at least 512MB, and preferably
1-2GB. We chose to go with ECC, since it is not much more
expensive than non-ECC, and with our large scale, ECC will help
protect against failure.
- Motherboard
- Serial console redirection is nice, and the BIOS should
have the ability to PXE boot from a card or a builtin
interface.
Health monitoring hardware (temperature,
fans, etc.) is good too, but not required.
If you are planning to have 1 or more 1000Mb interfaces you
should be looking for PCI-X or PCI Express busses.
Onboard network interfaces can allow you get get more NICs,
something especially valuable for small cases with a limited
number of expansion slots.
- Hard Drive
- Pretty much any one is OK. With a large number of nodes,
you are likely to run into failures, so they should be
reasonably reliable. As for size, whatever the current
market "sweet spot" is should be more than adequate.
Our standard images are at most 6GB. Consider SCSI if
you have money and need speed. Consider a second drive
if you have money and case room, and need a whole lot
of space for logging, etc.
- Floppy
- Handy for BIOS updates, diagnostics and possibly for
older OS installs.
You definitely need one if you don't have a CDROM drive.
- CDROM
- Recommended if you might grow your testbed large
(more than a couple hundred machines) some day.
For reliable scaling we may someday migrate away
from PXE-booting to CD-based booting,
as we already do with our widearea and wireless nodes.
- VGA
- Only needed if motherboard does not have serial console
redirection or if you want to run Windows.
Cheap is fine, cheap and builtin is better.
Switches
- Control net
- Number of ports
- You'll need one port for each testbed node, plus
ports for control nodes, power controllers, etc.
100Mb ports are fine.
- VLANs
- Allows partitioning of the control network,
which protects control hw (switches, power, etc.) from
nodes and world, and private machines. Without them,
control hardware can be attached to an additional NIC
on the boss node (requires extra cheap switch.)
- Router
- Required if VLANs are used. Must have DHCP/bootp
forwarding. (Cisco calls this the 'IP Helper') A MSFC
card in a Cisco Catalyst supervisor module works well
for us, but a PC would probably suffice.
- Firewall
- Can be the router, without it VLAN security is
pretty much useless, meaning that it may be possible
for unauthorized people to reboot nodes and configure
experimental network switches.
- Port MAC security
- Helps prevent nodes from impersonating each other
and control nodes. This is an unlikely attack, since
it must be attempted by an experimenter or someone who
has compromised an experimental node already.
- Multicast
- Switch multicast support (IGMP snooping) and
router (if present) is used to allow multicast loading
of disk images. Otherwise, they must be done unicast
and serially, which is an impediment to re-loading
disks after experiments (to return nodes to a clean
state.)
- Experimental net
- Vendor
- Our software currently supports:
Cisco Catalyst 65xx, 40xx, 29xx, and 55xx series,
Nortel 1100 and 5510,
Foundry 1500 and 9604,
and Intel 510T
switches (although the Intels don't support everything you'd
want for a large-scale production testbed). Other
switches by the same vendors will likely be easy to support -
in particular, we expect that Cisco switches supporting
the 'CISCO-VTP-MIB', and the 'CISCO-STACK-MIB' or
'CISCO-VLAN-MEMBERSHIP-MIB' are likely to 'just work'.
(You can see a list of which MIBs are
supported by which devices at Cisco's
MIB page.) If you will be using multiple Ciscos for the
experimental net, and trunking between them, your switch
should also support the 'CISCO-PAGP-MIB', though this is not
required. Switches from other vendors can theoretically be
used if they support SNMP for management, but will likely
require significant work. We have found that you can often
find good deals on new or used equipment on EBay.
For used Cisco equipment, we buy from Atantica.
- Number of ports
- You'll need as many ports as you have experimental
interfaces on your nodes. In addition, you need 1 port
to connect the experimental net switch to the control
net (for SNMP configuration.) If you're using multiple
switches, you need sufficient ports to 'stack' them
together - If your switches are 100Mbit, Gigabit ports
are useful for this.
- VLAN support
- Optimally, configurable via SNMP (we have no tools
to configure it otherwise.) If not available, all
experimental isolation is lost, delay nodes can't be
used well, and method for coming up with globally
unique IP addresses will be required. So, VLANs are
basically necessary.
- Trunking
- If multiple switches, should have a way to trunk
them (or experiments are limited to 1 switch.) Ideally,
all trunked switches should support VLAN management
(like Cisco's VTP) and a VLAN trunking protocol like
802.1q . It's good if the trunk links are at least an
order of magnitude faster than node links, and link
aggregation (ie. EtherChannel) is desirable.
Servers
Two is preferable, though 1 could be made to work if you are willing
to invest some time and effort. The NFS server
needs enough space for /users and /proj , as well as a few extra gigs
for build trees, logs, and images. If you have more than 128 nodes,
and plan to use Cyclades or RocketPort serial ports, you need one
"tip server" machine per 128 serial lines (other serial muxes may
have similar limitations.) Tip servers do not need to be very
powerful. A 1000Mb network connection is suggested for disk image
distribution machine (usually boss.) though you can get by with 100Mb.
Database machine should have reasonably fast CPU and plenty of RAM.
Other Hardware
- Network cables
- We use Cat5E, chosen because they are not much more
expensive than Cat5, and can be used in the future for gigabit
Ethernet. It has been our experience that 'boots' on cables do
more harm than good. The main problems are that they make it
difficult to disconnect the cables one connected, and that they
get in the way on densely-connected switches. Cables with
'molded strain relief' are better than cables with boots, but
are often much more extensive. We buy cables in two-foot
increments, which keeps slack low without making the order too
complicated. Our standard so far has been to make control net
cables red, experimental net cables yellow, serial cables
white, and cables for control hardware (such as power
controllers) green. We've bought all of our cables from dataaccessories.com,
and have had excellent luck with them. With prices of $3.00 to
$4.25 for premade cables, it's not too expensive to by them
rather than make your own, and it's much easier.
- Serial cables
- We use Cat5E, but with a special pin pattern on the ends to
avoid interference between the transmit/receive pairs. We use
RJ-45 connectors on both ends, and a custom serial hood to
connect to the DB-9 serial ports on the nodes. Unfortunately,
the custom pinout is different for different serial mux
manufacturers. Contact us to get our custom cable specs
for Cyclades or RocketPort.
- Power controllers
- Without them, nodes can not be reliably rebooted. We
started out with 8-port SNMP-controlled power controllers from
APC. We then switched to
using the
RPC-27 from BayTech,
20-outlet vertically-mounted, serial-controlled power
controllers. Currently, we use the BayTech rackmount RPC14
series. The serial controllers are generally cheaper, and
the more ports on each controller, the cheaper.
Take note of the power requirements of your machines before
buying power controllers! While we can support 20 old 850Mhz
machines on a single 30A/120V power controller, we can only
support 7 Dell 2850s per 30A/120V controller. So we went
with 8-port controllers for the latter.
- Serial (console) ports
- Custom kernels/OSes (specifically, the OSKit) may not
support ssh, etc. Also useful if an experimenter somehow scrogs
the network. We have used the Cyclades
Cyclom Ze serial adapters, which allow up to 128 serial ports
in a single PC. But those have been discontinued and they
also require a 5V PCI slot. For our latest round of machines,
we use the
Comtrol RocketPort series of serial muxes.
In particular, we use the 32-port RocketPort Universal PCI
interfaces (pn 99126-7) and 1U rackmount panels (pn 98700-0).
As mentioned, you can only have 4 cards (or 128 ports) per
machine, so if you have more than 128 nodes, you will need
multiple machines to host the interfaces.
Note also that the 32-port RocketPort is a full-height PCI
card, so make sure you have a machine with full-height
(not low-profile) slots!
- Serial port reset controllers
-
- It may be possible to build (or buy) serial-port
passthroughs that are wired to reset pins on MB. Some
motherboard chipsets (eg. Intel LX+) have this feature built
on. NOT TESTED by Utah, and may not be 100% reliable (MB may be
able to get into a state where reset pins are not functional.)
Theoretically nicer to the hardware than power controllers.
-
- "Whack on LAN" power controllers
-
One of our students developed a hardware modification to
the Wake-on-LAN mechanism to allow us to reset machines
via the network. See
this abstract for a little info.
-