NOTES 15.2 KB
Newer Older
Leigh B Stoller's avatar
Leigh B Stoller committed
1
Setting up an InstaGeni Rack. First, we need the following info:
2

Leigh B Stoller's avatar
Leigh B Stoller committed
3
4
5
* We start here, after you have sent Utah "checklist", waiting for the
  images to be baked. Once you hear back from us, you may continue
  with these instructions.
6
7
8

  Waiting, waiting, waiting ...

Leigh B Stoller's avatar
Leigh B Stoller committed
9
* Power on the control switch (Procurve 2620 in the top slot).
10
11

* Attach a console to the control node and power on the node. You will
Leigh B Stoller's avatar
Leigh B Stoller committed
12
13
14
15
16
  wait a while for the "HP ProLiant" screen. Watch CAREFULLY, looking
  for the moment it says "press any key for Option ROM messages."
  Press any key!

  and then right after the screen
17
18
19
20
21
22
23
24
25
  switches, type F8 to get into the iLo configuration. Gotta be fast
  on this. If you miss it, power cycle. 

* Once the iLo screen comes up, right arrow to Network and choose the
  DHCP option. You want to make sure DHCP is off. F10 to save and then
  esc to go back. Then choose the NIC option. Fill in the iLo
  IP/Mask/Router, then F10 to save and then esc to exit. iLo will
  reset.

Leigh B Stoller's avatar
Leigh B Stoller committed
26
* Is the RAID array setup? It wasn't on Utah's rack. Need to fill this
Leigh B Stoller's avatar
Leigh B Stoller committed
27
  part in.
Leigh B Stoller's avatar
Leigh B Stoller committed
28
29
30
31
32
33
34
35
36
37

* At this point, Utah can do the rest of the rack setup without
  further intervention from you. Well, unless something gets wedged
  and Utah needs something to be physically power cycled. Or you can
  go back to your desk and proceed to setup the rack using the
  following instructions. Which do you prefer? I know you will make
  the correct choice.

* Now that you have decided, send email to Utah asking us to complete
  the installation while you go back to working on other projects. 
38

Leigh B Stoller's avatar
Leigh B Stoller committed
39
40
======================================================================

41
42
43
44
45
46
47
48
49
50
51
52
53
* Verify that the DNS records are being served properly from the
  parent domain.  For instance, if the rack is instageni.foo.edu, then
  try:

        $ host -t NS instageni.foo.edu
        instageni.foo.edu name server ns.instageni.foo.edu
        $ host -t A ns.instageni.foo.edu
        ns.instageni.foo.edu has address 123.123.123.123

  If you don't get positive responses to either query (if things are
  broken, then NXDOMAIN errors are a likely symptom), then stop and
  ask the local admin to fix it.

54
55
* Using your web browser, go to the iLo IP you set, and login using
  Administrator and the iLo password that is stamped on top of the
Leigh B Stoller's avatar
Leigh B Stoller committed
56
57
58
59
60
61
62
  control node, or in the data file you received. If using the
  datafile, look for the section that says:

	<u_location>U34</u_location>

  cause the control node is slot 34. Grab the lo_passwd from that
  section; that is you iLo password.
63

Leigh B Stoller's avatar
Leigh B Stoller committed
64
* In a shell window, you want to ssh over to the iLo:
65

Leigh B Stoller's avatar
Leigh B Stoller committed
66
67
68
69
70
71
	boss> ssh Administrator@iloIP

  This can be slow, so be patient. Once you get logged in, enter
  "textcons" at the command prompt. This will put you into a wacky
  text representation of the graphic console. To exit from the
  text console, use ESC-(
72

73
74
75
76
77
78
* Go to the Virtual Media tab, and then on the right hand side specify
  the url of the boot ISO image. This will be something like:
  
	http://155.98.32.70/downloads/genirack.iso
	
  which is Utah's web server. Check the box to boot from the CD on the
79
80
  next reset. The click on "Insert Media", and then after you get the
  confirmation that it attached okay, click the reset button.
81
82
83
84
  
* Wait for the node to boot. It should boot from the virtual CD drive
  since there is no other boot media, but if not you can hit F11 on
  the next go around, which will give you a list of options. Type
Leigh B Stoller's avatar
Leigh B Stoller committed
85
  whatever number is to the left of the CD choice.  
86

Leigh B Stoller's avatar
Leigh B Stoller committed
87
88
89
90
* The ISO will load and give you a boot prompt. This takes a while.
  You will eventually get a shell prompt after a lot of noisy
  output. This can take several minutes since it is demand loading the
  "CD" from Utah's web server. Be patient.
91
92
93
94
95
96
97
98
99
100
101
102

* Fire up the network:

	ifconfig eth0 inet Control_IP netmask Control_Mask
	route add default gw Gateway_IP

  The Control_IP is *NOT* the iLo IP you used above. It is the IP you
  have assigned to the control node itself.

* Transfer the control node image from Utah:

	cd /tmp
Leigh B Stoller's avatar
Leigh B Stoller committed
103
	wget http://155.98.32.70/downloads/genirack-1.ndz
104
105
106

  This is about a 1GB so it will take a while.

107
* Write the image file to the disk using the Emulab decompression tool:
108

Leigh B Stoller's avatar
Leigh B Stoller committed
109
	/usr/bin/imageunzip -o genirack-1.ndz /dev/cciss/c0d0
110

Leigh B Stoller's avatar
Leigh B Stoller committed
111
112
  This will take a little while. Watch the dots. There is a pause at
  the end while buffers are flushed to disk. Be patient. 
113
114
115
116
117
118
119
120

* Set the boot order so that the control node does not try to boot
  from the network, unless all else fails.

	cd /TOOLKIT
	./setbootorder floppy cdrom usb hd pxe

* Type "reboot" at the shell prompt. With any luck, the node will boot
121
  first time and you can ssh into the control node as elabman. You
Leigh B Stoller's avatar
Leigh B Stoller committed
122
123
  will need to add the key from /root/.ssh/elabman_dsa to your ssh agent,
  and the pass phrase is in boss:/usr/testbed/etc/elabman_dsa.pswd
124

Leigh B Stoller's avatar
Leigh B Stoller committed
125
126

  * Make sure all five of the experimental nodes are fully powered off;
Leigh B Stoller's avatar
Leigh B Stoller committed
127
  the ilo has to be off, and the easiest thing to do is just unplug
Leigh B Stoller's avatar
Leigh B Stoller committed
128
  them.
129

Leigh B Stoller's avatar
Leigh B Stoller committed
130
  Also make sure that the serial cable is connected to the 2620.
131

Leigh B Stoller's avatar
Leigh B Stoller committed
132
* Connect to the 2620 using this command:
133

Leigh B Stoller's avatar
Leigh B Stoller committed
134
	sudo screen /dev/ttyS0 19200
135

Leigh B Stoller's avatar
Leigh B Stoller committed
136
137
138
139
140
  It might not do anything when you carriage return; it is trying to sync
  up the speed (the switch does auto-sense). Wait 30 seconds, hit carriage
  return a few times again. If still not working, exit from screen ("^A \")
  and try again. Might take another iteration or two. When you have the
  prompt: 
Leigh B Stoller's avatar
Leigh B Stoller committed
141

Leigh B Stoller's avatar
Leigh B Stoller committed
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
  2620> config
  2620(config)> vlan 11
  2620(vlan-11)> name control-alternate
  2620(vlan-11)> untagged 24        XXXX Make sure about port number!
  2620(vlan-11)> ip address 10.2.1.253/24
  2620(vlan-11)> exit
  2620(config)> vlan 10
  2620(vlan-10)> name control-hardware
  2620(vlan-10)> untagged 23	   XXXX Make sure about port number!
  2620(vlan-10)> ip address 10.1.1.253/24
  2620(vlan-10)> exit
  2620(config)> vlan 1
  2620(vlan-1)> ip address 10.254.254.253/24  # IGMP querier requires this
  2620(vlan-1)> exit
  2620(config)> management-vlan 10
  2620(config)> ip default-gateway 10.1.1.254
  2620(config)> vlan 1 ip igmp
  2620(config)> vlan 1 ip igmp querier
  2620(config)> no web-management
  2620(config)> no snmp-server community public
  2620(config)> snmp-server community XXXXX manager unrestricted
  2620(config)> password all (type in same password for manager/operator)
  2620(config)> write memory
  2620(config)> reload
166

167
168
169
  UTAH: THE PASSWORD and COMMUNITY string come from variables.txt file
  in the rack subdir. Use the same for both switches.

170
171
172
  The switch will take moment to reset so you might lose your connection
  to the control node.

Leigh B Stoller's avatar
Leigh B Stoller committed
173
174
175
  Ping 10.1.1.253 and 10.2.1.253 to make sure things worked okay.
  Then telnet to 10.1.1.253 and make sure you can login using the
  switch password.
176

Leigh B Stoller's avatar
Leigh B Stoller committed
177
178
* Have the local site admin move the console cable to the 5406.
  Wait, wait, wait. 
179

Leigh B Stoller's avatar
Leigh B Stoller committed
180
181
182
* Connect to the 5406 using this command:

	sudo screen /dev/ttyS0 115200
183

Leigh B Stoller's avatar
Leigh B Stoller committed
184
185
186
187
188
189
  It might not do anything when you carriage return; it is trying to sync
  up the speed (the switch does auto-sense). Wait 30 seconds, hit carriage
  return a few times again. If still not working, exit from screen ("^A \")
  and try again. Might take another iteration or two. When you have the
  prompt: 
  
190
  5400> config
191
  5400(config)> no vlan 1 ip address
192
193
  5400(config)> vlan 10
  5400(vlan-10)> name control-hardware
Leigh B Stoller's avatar
Leigh B Stoller committed
194
  5400(vlan-10)> untagged A20	  XXXX Make sure about port number! E20?
Leigh B Stoller's avatar
Leigh B Stoller committed
195
  5400(vlan-10)> ip address 10.3.1.253/24
196
  5400(vlan-10)> exit 
197
  5400(config)> management-vlan 10
Leigh B Stoller's avatar
Leigh B Stoller committed
198
  5400(config)> ip default-gateway 10.3.1.254
199
200
201
  5400(config)> no web-management
  5400(config)> no snmp-server community public
  5400(config)> snmp-server community XXXXX manager unrestricted
Leigh B Stoller's avatar
Leigh B Stoller committed
202
  5400(config)> password all (type in same password for manager/operator)
203
204
  5400(config)> write memory
  5400(config)> reload
Leigh B Stoller's avatar
Leigh B Stoller committed
205
  
Leigh B Stoller's avatar
Leigh B Stoller committed
206
  USE THE SAME PASSWORD and COMMUNITY XXXXX as above (2620)
Leigh B Stoller's avatar
Leigh B Stoller committed
207

Leigh B Stoller's avatar
Leigh B Stoller committed
208
209
  Wait for the switch to reboot and confirm you can telnet to 10.3.1.253
  and log in using the switch password. 
210
  
Leigh B Stoller's avatar
Leigh B Stoller committed
211
212
* Create the 4th partition in the partition table:

213
	sudo fdisk /dev/sda
Leigh B Stoller's avatar
Leigh B Stoller committed
214

Leigh B Stoller's avatar
Leigh B Stoller committed
215
  Use the "n" option, primary partition type, default start, +950G, "w"
Leigh B Stoller's avatar
Leigh B Stoller committed
216
217
218
219
  to write it out.

  Then inform the kernel:

220
	sudo partprobe -s
Leigh B Stoller's avatar
Leigh B Stoller committed
221

222
223
* Initialize the LVM partition. We use LVMs for the boss/ops filesystems.

224
225
226
	sudo pvcreate /dev/sda4
	sudo vgcreate xen-vg /dev/sda4
    	sudo vgchange -a y xen-vg
227

228
229
230
* Create a filesystem to hold the boss/ops tarballs. These are pretty
  big but will be deleted after we copy the filesystems into their own
  lvms.
231

232
233
234
235
236
	sudo mkdir /scratch
	sudo /sbin/lvcreate -n scratch -L 75G xen-vg
	sudo mke2fs -j /dev/xen-vg/scratch
	sudo mount /dev/xen-vg/scratch /scratch
	sudo chmod 777 /scratch
237

238
239
* Copy the boss/ops tarfile to /scratch on the control node, and
  then unpack it. There will be two directories, ops and boss.
240

241
* Restore the VMs:
242

Leigh B Stoller's avatar
Leigh B Stoller committed
243
        mkdir ~elabman/boss ~elabman/ops
244
245
	sudo ~elabman/restorevm.pl -t ~elabman/boss boss /scratch/boss
	sudo ~elabman/restorevm.pl -t ~elabman/ops  ops /scratch/ops
246

247
248
  This creates a bunch of LVMs and rewrites the xm.conf in the
  boss/ops directories to reflect the new LVM paths, etc.
249

Leigh B Stoller's avatar
Leigh B Stoller committed
250
251
* Fire up the VMs. Ops has to be first, followed by boss.

Leigh B Stoller's avatar
Leigh B Stoller committed
252
	sudo xm create ~elabman/ops/xm.conf
Leigh B Stoller's avatar
Leigh B Stoller committed
253
	sleep 30
Leigh B Stoller's avatar
Leigh B Stoller committed
254
	sudo xm create ~elabman/boss/xm.conf
Leigh B Stoller's avatar
Leigh B Stoller committed
255

Leigh B Stoller's avatar
Leigh B Stoller committed
256
* It is possible that ops will hang on fixarp, because tmcd is not
Leigh B Stoller's avatar
Leigh B Stoller committed
257
  running on boss yet. Log into boss (as elabman) and do:
Leigh B Stoller's avatar
Leigh B Stoller committed
258
259
260
261
262

	sudo testbed-control boot

  which should get ops running.

263
264
265
266
* named setup does not handle reverse maps smaller then /24 cause of
  the delegation stuff. Needs to be defined as a partial map since
  that is what the upper subnet delegates. But we do not handle this
  in the named config scripts. So I had to edit /etc/namedb/named.conf
Leigh B Stoller's avatar
Leigh B Stoller committed
267
268
  and add this (edit IP of course) to both views. Be sure to delete the
  existing reverse zone (both views) since it is incorrect.
269
270
271
272
273
274

    zone "129/25.242.1.192.in-addr.arpa" in {
    	type master;
    	file "reverse/192.1.242.db";
    };

Leigh B Stoller's avatar
Leigh B Stoller committed
275
  Run named_setup. Tail /var/log/messages to look for errors. 
276

Leigh B Stoller's avatar
Leigh B Stoller committed
277
278
279
* Now it is time to power on the experimental nodes. If all goes well,
  they will boot up into FreeBSD MFS and be in the hwdown experiment.
  Before we release them, we want to change some settings on the ilo.
280
281
  The following will change the Admin password, create an elabman
  user, load its ssh key, change the boot order, etc, etc.
Leigh B Stoller's avatar
Leigh B Stoller committed
282

283
	sudo sh /usr/testbed/etc/initilo.sh
Leigh B Stoller's avatar
Leigh B Stoller committed
284

285
286
287
288
289
290
291
292
293
* NOTE: scripted ssh key upload broke in the new iLo, which the script
  above used to take care of. So now we need to upload the root ssh
  key to each node's ilo using the web interface instead. The above
  command happily changed all the passwords so they are now the same
  so that makes it a little easier. Use this key:

	/root/.ssh/id_dsa.pub

  Install the key into BOTH the ELABMAN and ADMINISTRATOR accounts!!!!!
Leigh B Stoller's avatar
Leigh B Stoller committed
294
  Do this for all five pcs *and* the control node. 
295

Leigh B Stoller's avatar
Leigh B Stoller committed
296
  After you have installed the keys, do this:
297

Leigh B Stoller's avatar
Leigh B Stoller committed
298
299
300
301
302
	sudo /usr/testbed/sbin/initilo.pl -b pc1
	sudo /usr/testbed/sbin/initilo.pl -b pc2
	sudo /usr/testbed/sbin/initilo.pl -b pc3
	sudo /usr/testbed/sbin/initilo.pl -b pc4
	sudo /usr/testbed/sbin/initilo.pl -b pc5
303
304

  DAMN YOU HP!
Leigh B Stoller's avatar
Leigh B Stoller committed
305

306
307
308
309
310
311
312
* The above command resets the ilo, so lets play the minute waltz, maybe
  twice.

	http://www.pianoparadise.com/downloadmp3/nocturne.wav

* Now we power on all of the nodes.

Leigh B Stoller's avatar
Leigh B Stoller committed
313
	sudo wap power on pc1 pc2 pc3 pc4 pc5
314
315
316
317

* If the nodes were actually off, it is going to take a couple of minutes
  before we can go on with the next step. Play the waltz a few more times.

Leigh B Stoller's avatar
Leigh B Stoller committed
318
319
320
321
322
323
324
325
326
327
* Free all the nodes up and lets hope they reload okay. Okay, lets
  just do one to start with.

	wap nfree emulab-ops hwdown pc1

  If that works and pc1 does indeed go into the free pool, then do the
  rest of the nodes:

  	wap nfree emulab-ops hwdown pc2 pc3 pc4 pc5

Leigh B Stoller's avatar
Leigh B Stoller committed
328
* Enable this site in Utah (run this on Utah Emulab boss).
329
330
331
332
333
334
335

	sudo cacontrol -c boss.XXX.XXX.XXX

* On the new boss, need to reload the bundles:

	sudo /usr/testbed/sbin/protogeni/getcacerts

Leigh B Stoller's avatar
Leigh B Stoller committed
336
* Arrange for the VMs to auto start (on the control node):
337
338

	cd /etc/xen/auto/
Leigh B Stoller's avatar
Leigh B Stoller committed
339
340
	sudo ln -s ~elabman/ops/xm.conf 1.ops.conf
	sudo ln -s ~elabman/boss/xm.conf 2.boss.conf
341
342

* Next we want to update the firmware on the data plane switch to the
Leigh B Stoller's avatar
Leigh B Stoller committed
343
  one that supports openflow. First copy the firmware from Utah to the
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
  local tftp directory on boss.

	cd /tftpboot
	sudo wget http://www.emulab.net/downloads/K_15_06_5008.swi

* Log into procurve2 using the password in /usr/testbed/etc/switch.pswd
  We then want to make a copy of the current config in case we have to
  revert back.

	5406> show config files
	5406> copy config config1 config config-save
       
* Now load the openflow firmware into the primary flash. First make
  sure the secondary has a copy of the primary. 

	5406> show flash
	5406> copy tftp flash 10.3.1.1 K_15_06_5008.swi

* And if that works, reboot the switch.

	5406> reload

Leigh B Stoller's avatar
Leigh B Stoller committed
366
367
368
369
* Wait for the switch to reboot, now lets see if snmpit works:

	boss> wap snmpit -l -l -O

370
371
372
373
374
375
  NOTE: You may see this warning:

	No such VLAN control-hardware in lans table

  No worries, you can ignore it. 

376
377
378
379
380
381
382
383
384
385
386
* Telnet to procurve2 (/usr/testbed/etc/switch.pswd):

	5400> config
	5400(config)# openflow
	5400(openFlow)# vlan 1750
	5400(openFlow vlan-1750)# enable
	5400(openFlow vlan-1750)# controller "tcp:10.3.1.7:6633" fail-secure on
	5400(openFlow vlan-1750)# exit
	5400(openFlow)# exit
	5400(config)> write memory

387
388
* Add the public IP space, if appropriate. See the site survey response.
  This needs to go into the image baking script.
Leigh B Stoller's avatar
Leigh B Stoller committed
389
390
391
392

	boss> wap addvpubaddr 192.1.242.150 192.1.242.179
	boss> wap addvpubaddr 192.1.242.190 192.1.242.250

393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
* Most sites will have the openflow vlan plumbed through to port 24 on
  the data switch. If this is indeed true, then we can go ahead and set
  that up now, so Nick can get started on the FOAM stuff.

	boss> wap addspecialdevice -t interconnect -s 100 XXXX-XXXX
        boss> wap addspecialiface -b 1Gb -s procurve2,1,24 XXXX-XXXX eth0

  where XXXX-XXXX is a pithy name like interconnect-campus. Ask the
  local site admin. If the rack is wired properly 1,24 is correct.
  If the module is in another slot, then this might be different. 

  Now create and share the openflow vlan:

	boss> wap snmpit_test --vlan_tag=1750 -m mesoscale-openflow \
		emulab-ops openflow-vlans XXXX-YYYY:eth0
	boss> wap sharevlan -o emulab-ops,openflow-vlans \
		mesoscale-openflow mesoscale-openflow
Leigh B Stoller's avatar
Leigh B Stoller committed
410

411
412
413
414
415
416
417
418
419
* Create some test experiments. Test the ProtoGeni API.

* Create the shared-pool experiment. Use this NS file:

	testbed/install/genirack/shared-exp.ns

  You will need to toggle the lockdown on the Show Experiment page first,
  and then toggle it back after the experiment has swapped in. Be sure to
  test shared node experiments.
Leigh B Stoller's avatar
Leigh B Stoller committed
420

421
422
* Need to reset the mailing lists to the local admin.

423
424
425
426
  All of the mailing lists are stored in ops:/etc/mail/lists. Add the local
  admin to the testbed-*.list files. Do not change the defs file, since we
  want the email to go to the local admin *and* Utah.

Leigh B Stoller's avatar
Leigh B Stoller committed
427
428
429
430
431
432
433
434
435
436
* Run register_resources

	boss> wap /usr/testbed/sbin/protogeni/register_resources

* Run update_sitevars and addservers to get boss/ops into the DB
  and to turn on arp lockdown.

	boss> wap /usr/testbed/sbin/update_sitevars 
	boss> wap /usr/testbed/sbin/addservers 

437
438
---
TODO:
439

Leigh B Stoller's avatar
Leigh B Stoller committed
440
routable ip space when baking the images.
441
Enable necessary features.
Leigh B Stoller's avatar
Leigh B Stoller committed
442
443
444

-----
SSH enable on the switches does not work.
445
446
447
448
449

ip ssh public-key manager "ssh-rsa AAA ..."
# aaa authentication ssh enable public-key
HP-E2620-24(config)# aaa authentication ssh login public-key