ControlRebuild.txt 5.58 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
Baking the control node image:

The control node image is currently baked from the Utah control node. We
have an extra disk on our control node that is a duplicate of the main
disk. Well, it was at one time, but we don't change it very often and when
I do, I try to remember to update the mirror as well.  Anyway, there are
just a few things that need to be changed on the control image for each
site.

The Utah control node is control.utah.geniracks.net. 

You will need to create a little text file that provides all of the
details. Log into the control node you want to rebuild, and gather the info
you need to create this file. Here is a sample. The root password is a bit
of a problem of course, but maybe a good idea to generate a new ones
anyway (it used for the ilo console).

	address=128.112.170.2
	netmask=255.255.255.0
	gateway=128.112.170.1
	domain="instageni.cs.princeton.edu"
	forwarders="128.112.136.10,128.112.136.12"
	hostname="princeton.control-nodes.geniracks.net"
	timezone='America/New_York'
	rootpswd="cleverpswd"
	adminuser="acb"

You also need the ssh pubkey for the admin user, from the existing control
node. 

Copy the text file and ssh pubkey file over to /tmp on Utah's control
node. Then log in and do this:

	sudo /usr/local/bin/bakectrl.pl /tmp/foo.txt /tmp/foo.pub

Now we need to grab imagezips of the two partitions. We cannot do a whole
disk image cause that would destroy the lvms in partition four.

	sudo umount /mnt/usr
	sudo umount /mnt
	sudo imagezip -o -s 1 /dev/sdb /scratch/sdb1.ndz
	sudo imagezip -o -s 2 /dev/sdb /scratch/sdb2.ndz

Copy the two ndz files to someplace on the public network that you can get
to with http or ftp (from the control node being rebuilt).

47
48
49
50
51
52
Also copy /var/tmp/boot.sdb from the Utah control node to the control
node you are rebuilding before you reboot it (see more below).

Might be a good idea to delete the data file from /tmp since it has a
password in it. But make sure you have stored a copy someplace safe and
encrypted to be safe.
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67

-----------

Save various things from the existing control node:

	/etc/xen/auto  (these might be symlinks, get the actual files).
	~elabman/boss/ (get the entire directory)
	~elabman/ops/  (get the entire directory)

These three might not exist. If not, skip the restore commands below.

	~elabman/openvpn/openvpn-server.pem
	~elabman/openvpn/openvpn-dh.pem
	~elabman/openvpn/emulab.pem

68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
Save the old boot blocks someplace safe (on another machine): 

	sudo dd if=/dev/sda of=mbr.old count=62

Grab the first part of new MBR (see above):

	sudo dd if=boot.sdb bs=1 count=444 > boot.new
	
Tack on the current partition table:

	sudo dd if=/dev/sda bs=1 skip=444 count=68 >> boot.new

Tack on the rest of the new boot code:

	sudo dd if=boot.sdb skip=1 count=61 >> boot.new

Copy boot.new to the same public server as the two ndz files above.

86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
Log into the boss VM and run

	boss> sudo testbed-control shutdown

Shut down all of the VMs.

	control> sudo xm shutdown -a

Wait for them to actually shutdown. Use "xm list" to make sure.

-----------

Installing the control node image:

Note that I assume you know how to log into the ilo web interface of the
control nodes.

* Go to the Virtual Media tab, and then on the right hand side specify
  the url of the boot ISO image:
  
	http://155.98.32.70/downloads/genirack.iso
	
  which is Utah's web server. Check the box to boot from the CD on the
  next reset. The click on "Insert Media", and then after you get the
  confirmation that it attached okay, click the reset button.
  
* Wait for the node to boot. It should boot from the virtual CD drive
  since there is no other boot media, but if not you can hit F11 on
  the next go around, which will give you a list of options. Type
  whatever number is to the left of the CD choice.  

* The ISO will load and give you a boot prompt. This takes a while.
  Just hit return. You will eventually get a shell prompt after a lot
  of noisy output. This can take several minutes since it is demand
  loading the "CD" from Utah's web server. Be patient.

* Fire up the network:

	ifconfig eth0 inet Control_IP netmask Control_Mask
	route add default gw Gateway_IP

  The Control_IP is *NOT* the iLo IP you used above. It is the IP you
  have assigned to the control node itself.

130
* Transfer the control node images from where ever you stashed them:
131
132
133
134

	cd /tmp
	wget http:/xxx.xxx.xxx.xxx/sdb1.ndz
	wget http:/xxx.xxx.xxx.xxx/sdb2.ndz
135
136
137
138
139
140
141
142
143
144
145
146
147
	wget http:/xxx.xxx.xxx.xxx/boot.new

* Run fdisk to see what the old partition table looks like. Save the
  output:

	fdisk -l /dev/cciss/c0d0

* Then write the new boot blocks to the disk:

	dd if=boot.new of=/dev/cciss/c0d0 count=62

* Then run fdisk again and confirm the partition tables are identical.  If
  so then proceed. If not, find the nearest bar and drown your sorrows.
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175

* Write the image files to the disk using the Emulab decompression tool:

	/usr/bin/imageunzip -o -s 1 sdb1.ndz /dev/cciss/c0d0
	/usr/bin/imageunzip -o -s 2 sbd2.ndz /dev/cciss/c0d0

  This will take a little while. Watch the dots. There is a pause at
  the end while buffers are flushed to disk. Be patient. 

* Type "reboot" at the shell prompt. With any luck, the node will boot
  first time and you can ssh back into the control node.

* Lets see if the lvms are still there:

	sudo lvs

  Still there? Yes, keep going. No, find the nearest bar and drown your
  sorrows.

* Restore the openvpn files to ~elabman/openvpn and start the server:

	control> cd /etc/openvpn
	control> sudo ln -s ~elabman/openvpn/openvpn.conf emulab.conf
	control> sudo /etc/rc3.d/S16openvpn start

* Restore the other files and directories saved up above.

* Start the VMs. You now use xl instead of xm.