Commit 9c564f92 authored by Grant Ayers's avatar Grant Ayers

Moved old setup documentation into setup-archive

parent f4122169
This diff is collapsed.
This file documents the process of adding a new node to the testbed.
A. Information about the node
-----------------------------
1. MAC address. For each port in the new node you need to find out the
MAC address, and which port (eth0/fxp0) it is in software. You need
to know both the Linux (eth) and BSD (fxp,xl,dc,etc.) names.
2. Wiring. We need to know which physical port on the back of the
machine maps to eth0, eth1, etc., and where each port is connected
to the cisco (get module/port, ie 3/21).
3. Power. Plug it into a power controller, and make note of which one
it is (name or IP) and which port you plug it into (1-8).
4. Serial line(s). When you plug in the serial lines, make sure which
ports on the serial expander they are plugged into.
B. If the node is of a new type
-------------------------------
1. You'll need specs on the nodes for the node_types and node_type_attributes
tables. For node_types, you'll just need a name for the type:
insert into `node_types` (class,type) values ("pc", "pc2800d");
For node_type_attributes you will create a row for each of several
attributes. For this, you will need a name for the processor class
(e.g. Core Duo), speed (MHz), RAM size (in MB), boot hard disk type and unit
('ad' and '0' for IDE, 'da' and '0' for SCSI, 'ad' and '4' for SATA),
boot disk size (in GB), max # of physical cards it holds (including the
motherboard as a card if it has built-in ethernet), and the approximate
amount of time it takes the machine to "power cycle" (in seconds).
You'll also need to give it a default OS id (the default OS to boot)
and image ID (disk image the default OS comes from), which port is the
control net (e.g. 4) and what its Linux name is (e.g. "eth4"), how many
links this node can delay (usually: num_experimental_links / 2), and
how many virtual nodes ("jails") the machine can support.
Example:
insert into `node_type_attributes` values
('pc2800d','processor','Pentium D','string'),
('pc2800d','frequency','2800','integer'),
('pc2800d','memory','2048','integer'),
('pc2800d','disktype','ad','string'),
('pc2800d','bootdisk_unit','4','integer'),
('pc2800d','disksize','160.00','float'),
('pc2800d','max_interfaces','4','integer'),
('pc2800d','power_delay','60','integer'),
('pc2800d','default_imageid','emulab-ops-FBSD54+FC4-STD','string'),
('pc2800d','default_osid','emulab-ops-FBSD54','string'),
('pc2800d','control_network','0','integer'),
('pc2800d','control_interface','eth0','string'),
('pc2800d','delay_capacity','2','integer'),
('pc2800d','virtnode_capacity','50','integer');
There are also assorted other attributes you need not change, just
use these:
insert into `node_type_attributes` values
('pc2800d','delay_osid','FBSD-STD','string'),
('pc2800d','jail_osid','FBSD-STD','string'),
('pc2800d','adminmfs_osid','FREEBSD-MFS','string'),
('pc2800d','diskloadmfs_osid','FRISBEE-MFS','string'),
('pc2800d','pxe_boot_path','/tftpboot/pxeboot.emu','string'),
('pc2800d','imageable','1','boolean'),
('pc2800d','rebootable','1','boolean'),
('pc2800d','simnode_capacity','0','integer'),
('pc2800d','trivlink_maxspeed','0','integer');
2. There are several scripts that limit searches to certain classes.
If the new type you have added does not have class "pc", you may need
to include this new class as appropriate.
Some of the scripts that might need to be updated are:
/db/avail.in
/db/nfree.in
/tbsetup/assign_wrapper.in
/tbsetup/batchexp.in
/tbsetup/reload_daemon.in
/tbsetup/exports_setup.in
/tbsetup/snmpit_lib.pm
/www/nodecontrol_list.php3
/www/reserved.php3
/www/showexp_list.php3
/www/tutorial/nscommands.html
/www/updown.php3
/sql/database-create.sql
C. What to do on boss:
----------------------
1. Insert entries into interfaces table using info from A(1). Try:
insert into interfaces (node_id,card,port,MAC,IP,interface_type,iface)
values
("pcN",0,1,"00b0d0f01020",NULL,"BSDTYPE","eth0")
2. Insert entries into wires table using info from A(2). Try:
insert into wires (node_id1,card1,port1,node_id2,card2,port2) values
("pcN",0,1,"ciscoX",5,1)
For the control interface do:
insert into wires (type,node_id1,card1,port1,node_id2,card2,port2)
values ("Control","pcN",0,1,"ciscoX",5,1)
Check to make sure your cards and ports match up with what you
entered in the interfaces table.
3. Insert entry into outlets table, using info from A(3). Try:
insert into outlets (node_id, power_id, outlet) values
("pcN","powerX",Y)
4. Add entries to the nodes table for each node. Try:
insert into nodes (node_id,type,phys_nodeid,role,def_boot_osid,priority,op_mode)
values
("pcN","pc1u","pcN","testnode","FBSD45-STD",P,'NORMAL')
P (priority) is the where it gets printed out. These need to be
ascending numbers, and in the right region. See the table.
4a. Add entries into the tiplines table. The "server" field is where
the actual capture process runs:
INSERT INTO tiplines VALUES ('pc1','pc1','users.emulab.net',0,0,'');
INSERT INTO tiplines VALUES ('pc111','pc111','tipserv1.emulab.net',0,0,'');
You need to add the usual lines in /etc/remote on the machine
where the capture process runs. In addition, add a line on users
to that users can use tip to connect to a console on a remote tip
server. So, on users:
pc111|tbpc111:dv=/dev/tip/pc111:br#115200:nt:pa=none:
The device field is ignored, but something must be there.
5. Until you are ready to put it in service, reserve it to an expt,
either with nalloc or by adding an entry to the reserved table
directly. You'll probably also want to put its ports in a vlan to
enable them.
6. Add the node to the system files:
- DNS: on boss, cd /etc/namedb
co -l emulab.net.db.head
add these lines with all the others:
pcN IN A 155.101.132.N
IN MX 10 ops
IN MX 20 fast.cs.utah.edu.
ci -u emulab.net.db.head
cd reverse/
co -l 155.101.132.db
in 155.101.132.db, make these changes:
update serial number on line 10
add entry for node, like this:
N IN PTR pcN.emulab.net.
ci -u 155.101.132.db
run /usr/testbed/sbin/named_setup to update.
- DHCP: on boss, cd /usr/local/etc/
if you added a new node type, then you need to add a line
of the form:
%%nodetype=<type>
(where <type> is the new type is called) to dhcpd.conf.template.
Then as root run:
dhcpd_makeconf dhcpd.conf.template > Ndhcpd.conf
you can diff dhcpd.conf with the new file to verify nothing
catostrophic happened. Finally:
sudo cp Ndhcpd.conf dhcpd.conf
sudo /usr/local/etc/rc.d/2.dhcpd.sh restart
- tip: on ops or tipserv1, edit /etc/remote
add a line like this:
pcN:dv=/dev/tip/pcN:br#115200:nt:pa=none:
pcN-tty:dv=/dev/cua<port #>:br#115200:nt:pa=none:
then do these:
sudo touch /var/log/tiplogs/pcN.log
sudo touch /var/log/tiplogs/pcN.run
- capture: on ops or tipserv1, edit /usr/site/etc/capture.rc:
add a line like this:
/usr/site/bin/capture -r -s 115200 pcN tty<port#> >/dev/null 2>&1 &
D. How to get the first image on it:
------------------------------------
1. If everything is set up right, you can use the magic PXE Flash Floppy
to put the right thing on the PXE card. Edit the BIOS to put the
boot order to Floppy, PXE, Hard Drive, then reboot it.
2. If everything goes right, you should see it PXE boot and find its
DHCP info, then contact the ProxyDHCP server to get its bootinfo
data, then it should decide according to that what to boot.
3. If process 3 went okay to that point, do an os_load to try to
install the standard testbed images for the node.
4. If it doesn't seem to be working just like the others, talk to
Leigh and Mike.
E. What next
------------
1. Test it out and see if it works well enough to put into service. If
its ready, release it into the wild with nfree or by deleting its
entry in the reserved table.
2. Do some more tests to find any obvious problems. Fix them, if any.
3. Sit back and relax for a few minutes until the bug reports start
flowing in.
#
# EMULAB-COPYRIGHT
# Copyright (c) 2002-2006 University of Utah and the Flux Group.
# All rights reserved.
#
#####
##### Configuration suggestions for Cisco switches
#####
This file contains some configuration guidelines that we (Utah) have found
useful to improve the performance of our Cisco switches.
All commands given are to be typed at the (enable) prompt on your cisco
switches. They are for CatOS - switches that run IOS may not have these
commands.
<ports> means a list of ports, which on the CatOS command line, can include
lists and rages, such as "3/1,3/2" or "3/1-48" or "3/1-48,4/1-48,5/1-48"
##### Allowing ports to come up quicker
This one is useful on both the experimental and control nets:
set spantree portfast <ports> enable
Use this on all ports that are directly connected to nodes, servers, power
controllers - anything that is not another switch. Normally, the switch waits
a while (several seconds) when a port first comes up before forwarding traffic
from this port - it does so to prevent loops in the switch topology. The main
place you will see the benefit of this is on the control net - with portfast
disabled, the first few DHCP packets sent by booting nodes will get dropped,
causing the DHCP to take much longer than necessary.
##### Reducing stray traffic
Disable spanning tree (STP.) If on, STP sends out packets approximately every
two seconds on every port. You can disable it on all VLANs with the command
set spantree disable all
There are two major consequences (for our purposes) of disabling STP:
1) You cannot have _any_ loops in your switch topology, or bad things will
happen.
2) VLAN pruning on trunks won't work, causing broadcast traffic to be
forwarded across trunks that it does not need to cross. We've added
features to snmpit to manually do STP's job in this case, so this
problem is taken care of.
You must have STP disabled on _all_ switches that are trunked together! If it
is enabled on even one, STP traffic will be seen on all of them.
The switch doesn't trust you to use portfast responsibly. So, it has a
'bpdu-guard' feature that helps guard against loops. Turn off this feature
with the command:
set spantree portfast bpdu-guard disable
Cisco uses a protocol called 'CDP' to discover other Cisco devices. This sends
out small packets every two minutes. You can disable it with:
set cdp disable <ports>
Ideally, you should only disable CDP on ports that don't have other Cisco
devices attached, but in practice, running with CDP disabled on all ports is
fine.
Switch ports will, by default, try to negotiate trunking and channeling.
Cisco provides a handy macro:
set port host <ports>
to disable both of these. Also enables portfast on the ports.
##### Setting MAC address aging time
We have found that some experimenters use applications, kernels, etc. that only
receive traffic, not send it. This presents a problem, because it prevents the
switch from learning which port the node is on, and thus broadcasting traffic
for it to every port in the VLAN. This can be solved by 'priming' - ie. having
the receive-only node send some traffic (like an ARP response) at the beginning
of the experiment. However, the default aging time of 300 seconds makes this
impractical. So, we have disabled this aging, making learned MACs permanent
(until the VLAN is torn down.)
You must do this for each VLAN, with the command:
set cam agingtime <vlan> 0
For convenience, we've supplied a file (in this directory) called
'no-cam-aging.cfg' that disables aging on VLANs 2-999 (the ones potentially
used by our software.) Transfer this file to the switch using the:
copy tftp config
command.
We also suggest that you do this on your control network as well - part of the
booting process leaves the nodes sitting dormant at a boot loader for extended
periods of time, so the switch will tend to forget their MACs. Turning off
aging is not critical, but we suggest it, because it will reduce stray traffic
while the switch re-learns MAC addresses.
##### Setting up multicast between multiple switches
If you have more than one switch on the experimental or control networks, you
may need to do a little setup to get multicast between them. The symptom of
this problem is that multicast doesn't work between two nodes on different
switches, and if you run 'show multicast groups' on each switch, some will show
the group as existing, and others will not.
Run the following command for both sides (ie. on both switches) of every trunk
link:
set multicast router 1/1
(assuming that port 1/1 is your trunk link). If you are using EtherChannel to
bond together multiple links to form a single trunk, you only need to run this
command for the first port in the channel.
We had some problems running this command on the trunk on one of our switches:
it failed with the error:
Failed to add port 2/1 to multicast router port list.
What I finally did to resolve this was to tear down the trunk link and
EtherChannel that port was a part of, run the command on it (which succeeded
this time), and then build the EtherChannel and trunk back up.
##### Setting the clock
Since bos is an NTP server, you should set your switches to sync time with it.
On CatOS, this is accomplised with:
set ntp server 10.11.12.1
set ntp timezone MST -7
set ntp summertime MDT
set ntp summertime enable
set ntp summertime recurring
set ntp client enable
show time
Of course, you'll need to replace 10.11.12.1 with the IP address your boss node
uses to talk to the switches (usually its control-hardware interface), and
'MST', -7, and 'MDT' with the names of your timezone and its offset from GMT.
If you don't use daylight-savings time, leave out the 'summertime' steps, and
instead do:
set ntp summertime enable
Watch the output of 'show time' for a while to make sure the clock syncs up.
It may take a few minutes.
On IOS, these commands are:
configure terminal
ntp server 10.11.12.1
clock timezone MST -7
clock summer-time MDT recurring
exit
... and to see the current time, run 'show clock'
##### IOS commands
The above commands are given under the assummption that your switches are
running CatOS. If you are running IOS, here are a few notes that may help you
'translate' the above commands.
Interfaces in CatOS are named as module/port, while interfaces in IOS are named
as TypeModule/Port - For example, if module 1 has gigabit interfaces, what you
call 1/1 in CatOS is Gi1/1 in IOS. 100Mbit Ethernet is 'Fa'. (Really, these are
'GigabitEthernet' and 'FastEthernet' respectively, but you can abbreviate them.)
In order to operate on many interfaces at once, you can issue configuration
commands like this:
range gi1/1 - 48, gi2/1 - 48, gi3/1 - 48
... which would configure all 48 Gigabit interfaces on modules 1, 2, and 3.
The equivalent of 'set port host' (which sets portfast, disabled BPDU guard,
etc.) is:
switchport host
... applied to an interface or a range of interfaces. As in:
interface range gi1/1 - 48, gi2/1 - 48, gi3/1 - 48
switchport host
exit
In order to disable spanning tree, you would use:
no spanning-tree vlan 1-1005
In order to create a VLAN and set its name:
vlan 10
name control-hardware
exit
In order to set the IP address of the interface in VLAN 10:
interface vlan 10
ip address 10.11.13.183 255.255.255.0
exit
In order to enable an interface:
interface vlan10
no shutdown
exit
In order to remove a VLAN:
no vlan 1000
To put an interface into a VLAN:
interface gi0/1
switchport access vlan 10
exit
In order to turn on trunking for an interface:
interface gi0/1
switchport mode trunk
exit
In order to turn off trunking for an interface:
interface gi0/1
switchport mode access
exit
In order to put interfaces into an EtherChannel:
interface range gi1/41 - 48
channel-group 1 mode on
exit
(Notes: If you want to make more than one channel, give each set of ports
a different channel number. And, now, you will configure the whole channel
as 'interface port-channel 1')
To set the native VLAN on a trunk:
interface gi0/1
switchport trunk native vlan 1
To set the read-write SNMP community string to 'public':
snmp-server community public rw
To globally disable the Cisco Router Discovery (cdp) protocol:
no cdp run
#
# EMULAB-COPYRIGHT
# Copyright (c) 2002-2007 University of Utah and the Flux Group.
# All rights reserved.
#
#####
##### Setting up tbdb for a new boss node
#####
Note: we are working on better automating many of the procedures in this
document - for now, though, a few of them are still manual.
Note: steps labeled "Local Only" are only required when setting up a testbed
with local nodes - they can be skipped in a widearea-only testbed.
##### Step 1 - Setup users, projects, and experiments
In order to proceed, you should have the following working (from the boss and
ops setup documentation):
NFS mounts between boss and ops
Root ssh keys (so that root on boss can ssh to ops without a password)
The web interface
Make sure you can log into the web interface using the 'elabman' account.
The password for the elabman account is the same as the root password on
your boss node (see, we told you to remember it!).
This account is created as a testbed administrator, but there is one thing
you will need to do in order to use your admin powers. For the same reason
that you use 'su' and/or 'sudo' on your UNIX boxes instead of logging in as
root, you must explicitly enable admin privileges after you log in. When
logged in as a user who is allowed to become an admin, you will see a green
dot on the left side of the header above the main page content. The green
dot means that although you are allowed admin powers, they are currently
turned off, and you see the same web pages that a regular user sees, and
can use the same actions. If you click on the dot, it will turn red, and you
will have full administrator privileges; we call this 'going red dot'.
If you click on the dot again, it will go back to green, thus you can
easily flip back and forth between normal privs and admin privs. Note
that most of the procedures in this file require you to be in red dot mode.
Now, we will use the elabman user to bootstrap your first real account and
project. Note that while you will use the elabman account to do this, the
elabman account should not be considered a real account; it is intended to
help bootstrap only and as such, does not have the power to perform many
actions that are required later (such as adding nodes to the testbed).
Login as user 'elabman' if you have not already done so. Go into 'red dot'
mode by clicking on the green dot. You should see the "Start a New Testbed
Project" page, with a "Create First Project" link on the left menu under
"Experimentation".
Fill in your own information in the 'Project Head Information'
section. It is important that you provide a working email address! Select
your initial Project Name in the 'Project Information' section (we call
ours 'testbed', but you can call yours whatever you call your project or
research group). Also specify a *working* URL (it is required) for the
project. Submit the form using the Submit button at the bottom of the page.
The web interface will grind along for a minute or so. DO NOT CLICK THE
STOP BUTTON! When it is done, you will see a message that invites you to
login as the user you just created. Do this now so that you can continue
with setting up your testbed. Note that the elabman account has been
deactivated during this process to avoid problems later on (and potential
security breaches).
Before we continue, lets explain a few more important items:
* Project Membership: In addition to the project you just created, you have
automatically been added to the "emulab-ops" project with trust value
"group_root". This allows you to approve new members to that project as
well as your own project.
* Admin Mode: Your new account has been given "administrator" mode, as
described above. To change that value for other users after their
accounts are created, you can do this on boss:
echo 'update users set admin=1 where uid="<username>"' | mysql tbdb
* Shell on Boss: Give yourself the special ability to login to boss;
in contrast, most (normal) users have a restricted shell on boss,
and are not allowed to log in using a password. Login to boss as root,
and edit the password file using the 'vipw' command (FreeBSD requires
some special processing on the password file after editing, which vipw
does.). Give yourself a real shell (say, /bin/csh) and then exit the
editor. Then give yourself a password (in general, it is safer to have a
different password on boss then on ops!). Use this command:
passwd <your username>
NOTE: See doc/shelloboss.txt for important security issues w.r.t. giving
real shells on boss. Before you give a real shell to someone, it is a
good idea for them to read this file!
* Now logout and back in as yourself. In general, it is safer and better to
not do things as root. In fact, many testbed programs will complain if
you invoke them as root because it makes accounting and auditing more
difficult.
* Unix Group Membership: The Emulab account system manages both the
password file and the group file (/etc/group) on both boss and ops. If
you edit them directly, those changes will likely be lost. If you want to be
a member of any UNIX groups on boss, use our 'unixgroups' command. For
example, to add yourself to the "operator" group, do this on boss (as
yourself, not root):
withadminprivs unixgroups -a <username> operator
NOTE: Your initial account created above was already placed in the wheel
and tbadmin groups.
NOTE: Just as you need to go 'red dot' to use admin privileges on the web
interface, you must also explicitly enable them on the command line. To
do this, prefix the command you want to run with 'withadminprivs',
which can be abbreviated as 'wap'.
* Set your path: withadminprivs and many other admin-type commands live in
/usr/testbed/sbin - you'll want to put this and /usr/testbed/bin in your
$PATH.
Others at your site can now apply to join your project, or start their own.
##### Step "-1" - Undoing Step 1 if necessary
If something went wrong during Step 1, it can leave things in an inconsistent
state. Here's how to undo it without starting over from scratch. The goal is
to remove things that boss-install checks on, so it can be run again to put
the Emulab database and directories into initial conditions.
* Remove users, groups, and directories set up by boss-install.
pw userdel -n elabman -r
pw userdel -n elabckup -r
ssh -n ops pw userdel -n elabman -r
ssh -n ops pw userdel -n elabckup -r
ls -l /users /proj /groups
rm -r /proj/* /groups/*
mkdir /proj/cvsrepos
pw groupdel -n emulab-ops
ls -l /usr/testbed/{expwork,*/proj}
rm -r /usr/testbed/expwork/* /usr/testbed/*/proj/*
ls -l /usr/testbed/{expwork,*/proj}
* Remove user and group from the elabman "Create First Project" pages.
set me = *your-login*
set us = testbed
pw userdel -n $me -r
pw groupdel -n $us
ls -l /users /proj /groups
* Also remove any other users and groups you've created since then.
Otherwise you will get this message until you've cleared them:
'Error Creating Project: Transient error; please try again later.'
tail /etc/passwd
tail /etc/group
set him = *user*
set them = *group*
pw userdel -n $him -r
pw groupdel -n $them
* Kill the database.
mysql -e "drop database tbdb"
* Run boss-install, checking particularly on the success of these tasks:
. Setting up database
. Setting up initial user (elabman)
. Setting up checkup user (elabckup)
. Setting up system experiments
Now you can go back to Step 1 with the first login as 'elabman' and try again.
##### Step 2 - Setup web sql editor
Several of the steps below require you to add data to the database by hand. If
you're comfortable with SQL, you can do this directly with SQL queries, or you
can use the generic web-based SQL table editor provided with the testbed
software. If you plan to use the former method, you can skip this step.
********************************** WARNING **********************************
* Many tables depend on data in other tables, or depend on programs running *
* to effect a change. Thus, you should not edit tables other than the ones *
* described in this document. *
* You have been warned...... *
********************************** WARNING **********************************
First, you'll want to protect the webdb script from outside browsers. Because
of its flexibility, it would be quite dangerous if it were broken into. So, we
add an additional layer of protection by limiting the IP addresses it may be
used from. Open your httpd.conf file (located in /usr/local/etc/apache)
and find the 'Directory' directives. Add a section such as this:
<Directory /usr/testbed/www/webdb>
AllowOverride None
order deny,allow
deny from all
allow from 155.99.212.
</Directory>
If you installed the testbed tree somewhere other than /usr/testbed, fix the
directory. Change the 'allow from' line to match your IP subnet (note the '.'
on the end of the address, to match the entire subnet). You can have as
many 'allow' lines as you want. Restart apache:
sudo /usr/local/etc/rc.d/apache.sh stop
sudo /usr/local/etc/rc.d/apache.sh start
Next, you'll need to specify which users have the ability to edit the database.
This is done with the 'dbedit' column in the users table. You can turn on
a user's dbedit bit like so:
echo 'update users set dbedit=1 where uid="<username>"' | mysql tbdb
##### Step 3 - Setup switches
##### Local Only
1) Create node types for switches
Add entries to the node_types table for each type of switch you'll be
using. You can do this by talking to mysql directly, but is more easily
accomplished using the web interface. In 'red dot' mode, go to:
https://<yourbossnode>/editnodetype.php3?new_type=1
These are the switch types currently supported:
cisco{65xx,40xx,45xx,29xx,55xx}
For example, if you had a 6509, you'd enter 'cisco6509'
If your switch runs IOS (instead of CatOS), append '-ios' to the
type.
intel
The supported type is the 510T (but just put 'intel' in the type field)
nortel1100, nortel5510
foundry1500, foundry9604
(Note: Case sensitive!)
Set the "class" to 'switch' and set "processor" to whatever you used for the
"type" field.
Most of the other columns are not important for switches (so you can set
them to 0), but putting in "Max Interfaces" (if the switch is expandable)
can be useful for your own information.
2a) Create interface types for switch interconnects (if any)
If you'll be connecting the experimental switches together, you'll add