Commit a68e6097 authored by Robert Ricci's avatar Robert Ricci
Browse files

Add snmpit-internals.txt, a spiffy new document for people who have

to port snmpit to other switches.

Remove the old snmpit_doc.txt, which contained virtually no useful
information.
parent 0296500e
######################################################################
##### Some notes on the design and operation of snmpit
##### Robert Ricci <ricci@cs.utah.edu>
######################################################################
#####
##### File organization
#####
snmpit - The command line tool. Contains most of the database accesses, does
permissions checking, formats output, and figures out which device-specific
backends it needs to invoke. Most emulab-specific knowledge is embedded in
snmpit. If you're going to add another switch backend to snmpit, the only part
of snmpit itself that use should have to change is the part that loads the
device library - search for 'cisco' in the file.
snmpit_lib.pm - functions useful to snmpit and common to multiple device
backends. The most important functions to be aware of in here are
ReadTranlationTable() and the snmpitGetWarn() and snmpitGetFatal() functions.
The latter two wrap SNMP commands, retry in case of timeout, and send mail to
the site's testbed-ops list if they fail. The first simply warns the user and
continues execution, whereas the Fatal() version exit()s.
snmpit_cisco_stack.pm - Contains the knowledge required to handle a collection
of Cisco switches which share a common set of VLANs. Does not actually do any
SNMP itself, but has the knowledge of how to deal with multiple switches. For
example, it knows that (in some configurations) one switch acts as a 'VLAN (or
VTP) server', and in order to create a VLAN, you just have to talk to it. But,
to get a list of ports in VLANs, you have to talk to all of the switches. Also
has the job of aggregating information from switches - ie. doing a listVlans()
on the stack does a listVlans() on all individual switches and then collates
the results. snmpit itself calls into the stack module, which creates a
snmpit_cisco module for each switch, and calls into that to do the actual
work.
snmpit_cisco.pm - Contains the actual SNMP commands to deal with Cisco
switches, and deals with error checking, retries, etc. Writing versions of
this for other switches will be the hardest part of porting snmpit, because it
has a lot of knowledge about the quirkiness of Cisco's SNMP implementation.
snmpit_intel_stack.pm - Similar to snmpit_cisco_stack.pm . We haven't used it
in quite some time, so it's quite possibly bitrotted.
snmpit_intel.pm - Ditto - like snmpit_cisco.pm .
snmpit_apc.pm - For controlling APC-brand SNMP-controllable power controllers.
Don't be fooled by it's name - it's not actually part of snmpit anymore, it
just used to be. It's now used only by the 'power' command, and you can ignore
it.
#####
##### Design philosophy
#####
One of the key issues is that we trust switch state over database state. Thus,
when we go to remove a VLAN, we do not trust the database to list all the
ports in that VLAN for us - we get the list of member ports from the switch.
The idea here is twofold - first, we don't want to get too confused if people
have been manually manipulating VLANs on the switch, which does happen from
time to time, and usually happens for good reasons. Second, we don't want to
have to worry too much about getting out of sync, which can be a huge mess -
for example, if the database has to be restored from a backup, or if someone
messes around with the database by hand.
On the other hand, we don't trust the switches _too_ much - we usually verify
that set operations have succeeded before reporting success. ie. we'll set the
port speed to 100Mbps, then check the speed to make sure the change actually
took place. It's been our experience that, since we're pushing this stuff far
harder than it was intended (really, how many sites have made hundreds of
thousands of VLANs?), we do occasionally hit bugs in the switches.
Many of the API choices were made in order to enable high performance in the
backends. For example, we can supply a list of ports to setPortVlan() (in the
stack module), rather than just a single port, because it may be faster to
affect multiple ports at once than to do them serially. Whether or not you
choose to exploit this bit of API design is up to you.
#####
##### Switch stacks
#####
snmpit has a concept of stacks of switches. These are a set of switches on
which we can create a VLAN that spans potentially all switches. Thus, they are
connected by tunk links. Right now, snmpit does not support creating a VLAN
across multiple stacks (that would go against the definition of a stack), and
it does not support stacks that contain more than one type of switch. This
latter is something we will probably have to address soon - probably by
creating another level in the object hierarchy, above snmpit_cisco_stack which
calls into the stack modules for the various devices and has the glue to
create trunks between them.
Probably, all of your experimental-net switches will be in a single stack.
Stacks are, by convention, named after their leader. In a Cisco stack, the
leader is the one you talk to in order to create VLANs, etc.
#####
##### New switch backends
#####
Essentially, what you need to do to port snmpit to a new switch vendor is make
a new module that exports the same API as snmpit_cisco_stack.pm . It's up to
you whether you want to do this in two levels as I have with the Cisco support
or not, but I strongly recommend doing it this way. You could start out with a
stack module that's just wrappers into the backend module. Or you could start
with snmpit_cisco_stack.pm, and tweak it only where needed to conform to the
needs of your switches.
Looking at snmpit_cisco.pm to figure out the basics of switch configuration
with SNMP is not a bad idea, but keep in mind that there are several things
which make this module very complicated.
First, it supports two different switch operating systems, IOS and CatOS, so
there are some special cases for each.
Second, it supports some wacky features that you (hopefully) won't have to
worry about, like 'private VLANs'. This, in particular, leads to some complex
cases in setPortVlan() and createVlan().
Third, in different MIBs, ports are 'addressed' differently. In standardized
MIBs, ports tend to be referred to by an 'ifIndex', which is just an integers.
Cisco likes to refer to ports as a 'module.port' (at least, in CatOS), since
this is the 'native' way to name ports on their modular switches. So, we have
to convert back and forth between the two formats. So, keep in mind that
operations for which I convert the ports into $PORT_FORMAT_IFINDEX are ones
that _might_ be supported in your switch.
You can also look at snmpit_intel.pm for examples if you wish, but keep in
mind it's not actively maintained.
We don't, at the time being, ever automatically manipulate the control net.
So, you don't need to worry about having snmpit support for the switch(es) on
your control net.
#####
##### VLAN IDs and VLAN numbers
#####
This part causes a lot of confusion, sorry. There are two different ways we
refer to VLANs, which can get confusing, 'cause they're both integers. A VLAN
ID is a made-up number that we use to identify a VLAN - it's an auto-increment
value in the vlans table. The actual VLAN on the switch also has a number
associated with it.
Let's give an example. I create an experiment called testbed/threenodes, and I
put the three members, Dusty, Lucky, and Ned, into a LAN called 'amigos'.
Let's say this gets an ID of 314 in the VLAN table.
Now I need to create this VLAN on the switches. The snmpit_cisco_stack module
goes out and finds the list of VLAN _numbers_ that already exist on the
switch. This is the list you'd get from, say, a 'show vlan' command on a Cisco
switch. Let's say this list of VLANs is 1,2,3,4,5,7,9,10,12. The module finds
the first unused number (6), and uses that for the new VLAN. Note that there
are holes in the VLAN list, presumably from VLANs that were previously created
and then deleted.
So now we have a VLAN with ID 314 and number 6. We have to store this mapping
somewhere so that future invocations of snmpit will be able to find the VLAN.
For Cisco's we do this by setting the VLAN's 'name' field to the ID. So now
we have a VLAN with number 6 and the name '314'. Got it? This mapping may have
to be stored somewhere else, like in the database, for other switches, if you
can't set names for VLANs.
The upshot is, be careful as you're looking at the API to distinguish between
functions that take VLAN names and ones that take VLAN numbers. It should be
pretty clear from the comments and/or variable names. Send me mail if you find
any that aren't clear.
#####
##### Stack options
#####
The switch_stack types table holds options for each stack. They are:
stack_type: Used by snmpit to figure out which backend module to
load
supports_private: Cisco-specific, don't worry about it
single_domain: Whether all switches in the stack share a VLAN domain
(such as using VTP on Ciscos), or whether you need to
talk to each switch individually to create VLANs. You
can decide if it makes sense to implement this option
for your switches or not.
snmp_community: The SNMP community string that will be used for
read/write access to the switches. You should support
this option, and default to 'public' if no community
is given
min_vlan: The minimum VLAN number (remember, not ID) that your
module is allowed to create on the switch. It would be
good to implement this if possible. This can be useful
if the switch is being used for more than one purpose
- ie. someone else could be creating VLANs in the
range 1-500, and your Emulab could be using VLANs 501
- 1000.
max_vlan: Like min_vlan, silly!
#####
##### Misc. things to be aware of
#####
We want to disable ports that are not currently part of an experiment. So,
when we tear down a VLAN, we have to do something to the ports that were
previously in it. On Intel switches, you can actually have ports that are not
in any VLANs, so we disable them and remove them from their VLAN.
For Ciscos, however, ports are always in a VLAN - so we move them into VLAN 1,
and set them to 'disabled'.
In snmpit_cisco, we have two ways to create VLANs on switches - one is using
VTP, in which we create VLANs on a 'leader' and the switches do the job of
getting it created everywhere. The other scheme has some interesting
properties. Depending on what your switches support, you may have to deal with
some of the same issues. In it, we have to talk to all switches to create
VLANs. We have made a decision that, though poor from a performance
perspective, helps consistency. We create the VLAN on _all_ switches,
regardless of which ones actually have ports in it. This way, we do not have
to deal with issues that come from different switches having differing sets of
VLANs. Our locking protocol is such that when we create a VLAN, we do so on
the switches in lexicographically sorted order - when we delete a VLAN, we do
it in rever lexicographic order. This way, we don't have different concurrent
instances of snmpit pick the same VLAN name because they are getting VLAN
lists from different switches.
#####
##### Where to learn about new switch vendors and SNMP
#####
There are three ways in which we've found out which SNMP commands we needed to
invoke, in what order, to get this stuff to work.
Originally, we started by using the vendor's own SNMP configuration tools and
tcpdump-ing them. This can end up being a bit confusing, but it's the most
reliable source for discoving the switches' quirks.
We've also grabbed all the MIB files from the vendor (Cisco's are at:
ftp://ftp.cisco.com/pub/mibs/v2), and looking through the OID names and
comments. Luckily, the comments in Cisco MIB files are pretty good.
Finally, some vendors may provide documentation on how to perform some actions.
I wouldn't count on this one - Cisco's documentation, for example, is very
lacking in this area. I actually had a Cisco support person tell me that it was
not possible to do through SNMP something we'd been doing for years.
You can, of course, try out the OIDs we use in snmpit_cisco and snmpit_intel -
some of them may be supported on your switch.
Documentation for snmpit
------------------------
snmpit uses modules that implement the interface to a given switch
model (ie Cisco Catalyst 6509, Intel EhterExpress 510T, etc). They are
responsable for all communication (typically over SNMP) to the switch.
The organization is basically that snmpit itself deals with stack objects,
which deal with (possibly multiple) switch objects on the backend. So, snmpit
makes a snmpit_cisco_stack object, and gives it a list of switches, which
snmpit_cisco_stack uses to create a snmpit_cisco object for each switch. The
stack objects basically just know how to do things like the the VLAN lists from
all of their switches and collate them into one big list that they can return
to the caller.
The API for the stack objects is currently not documented - look at one of the
existing ones.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment