Sysresccd-manual-en Manage remote windows linux servers using SystemRescueCd
From SystemRescueCd
Contents |
About
The most popular way of using SystemRescueCd is from a CDRom drive on a desktop in interactive mode. Recent SystemRescueCd versions also come with the support for network boot via PXE. The network configuration boot options (such as ethx=ip, gateway=ip, dns=ip, dodhcp) allow you to automatically configure the network access to SystemRescueCd at boot time. SystemRescueCd automatically starts an ssh server by default and you can define a static root password on the boot command line. That way you can get an ssh console to the server just by booting a customized SystemRescueCd. There is no need to configure anything to get it to work. It can be very useful for Disaster Recovery purposes, I mean to reinstall a backup of your operating system after a crash. You can also use it to make any other administration task on your server.
In other words, you can manage a windows or linux servers that is in a datacenter remotely, from your office. There is no need to be in front of the machine to insert a disc, configure a network interface, or set a root password. All you have to do is to prepare a network boot server (one or several servers running the following network services: dhcpd, tftpd, httpd). You can install these three services either on a dedicated physical/vmware server or on a production machine running other services.
There are two interesting ways of using it:
- You can prepare a pxe boot server so that you get an interactive ssh console to administrate/repair your server by hand. You can prefer the serial console that is also supported. If you need to run graphical programs such as GParted remotely, you can use the vncserver boot option (requires SystemRescueCd-1.0.2 or newer) so that the VNC server is automatically started on the remote machine.
- You can also configure SystemRescueCd in order to run your own autorun scripts to perform automatic tasks (backup, recovery, ...)
Since the autorun feature of SystemRescueCd allows you to execute scripts located on an nfs/samba/http server, there is no need to spend time to make a customized SystemRescueCd version. All you have to do is to setup the pxe boot server so that SystemRescueCd automatically boots, configure the network and the root password, download the autorun scripts, and execute it.
To understand this chapter, you have to first read the two previous chapters: PXE network booting with SystemRescueCd and Run your own scripts at start-up with autorun.
Examples of interesting things you can do
- Disaster Recovery:
- restore a broken windows system with ntfsclone/partimage
- restore a tarball of your linux operating system and reinstall grub
- Hard disk partitioning and administration tasks
- format the hard disk and reinstall a copy of the operating system
- resize your partitions
- reinstall the grub boot loader
- Fix a critical problem
- fix a boot problem (fsck fails at boot time)
- reset the root password of your windows system with the ntpass floppy disk image
- reset the root password of your linux system by chrooting on it
Example of how to use an automatic disaster recovery on remote servers
Overview
This is a complete example of how you can organize an automatic disaster recovery on a network based on three machines located in a remote datacenter. This example shows you what kind of things you can do with SystemRescueCd. The technical details and configuration steps are given in the next sections.
Example of a network in a datacenter
In your datacenter you have three servers:
- srv1 192.168.10.100 A Windows web server running IIS and MS SQL-Server
- srv2 192.168.10.101 A Linux web server running Apache and MySql
- srv3 192.168.10.102 A backup/recovery server running linux
Installing the Disaster Recovery system
You want to be able to restore the operating system on srv1 and srv2 in case there is a software problem or an hard disk failure. For instance, if windows fails to boot on srv1 because of a virus, you want to be able to restore the operating system on the hard disk by rebooting the server on a recovery script running SystemRescueCd. Here is how to install this disaster recovery system:
- install windows on srv1 with at least two partitions: C: for Windows, D: for your Data
- in the bios of srv1, define the boot order as "network, hard-disk"
- install an apache or thttpd web server on srv3 so that we can download a file thought http
- reboot on SystemRescueCd and make an image of the disk C using ntfsclone, save the image on disk D:
- copy the ntfsclone image to srv3 with ssh/sftp or ftp
- install the pxe boot services on srv3 (dhcp server, tftp server with pxelinux, http server, the SystemRescueCd files)
- write a shell script that restores the partition C: using ntfsclone and the backup you made
- upload the recovery script you wrote on the backup/recovery server in the web server data files (so that we can access
http://192.168.10.102/autorun1) - configure pxelinux on srv3 so that it boots SystemRescueCd and configure the network and runs
http://192.168.10.102/autorun1automatically. Here is an example of a boot command line you can use with pxelinux:append initrd=initram.igz ethx=192.168.10.100 rootpass=12345 ar_source=http://192.168.10.102 autoruns=1
Performing the automatic recovery
That way, in case of a critical problem on srv1, you can run the disaster recovery process in few minutes. Only the two first steps requires an action from the user:
- connect to the backup server and start the pxe boot services (dhcpd, tftp, thttpd, ...)
- use the management interface of the srv1 machine to reboot this server
- since there is a dhcp server running, srv1 will automatically boot from the network
- the pxelinux boot loader starts SystemRescueCd with the
ethx=192.168.10.100 rootpass=12345 ar_source=http://192.168.10.102 autoruns=1options - SystemRescueCd boots and automatically configures the network and the system root password
- since the autorun options were used (
ar_source and autoruns), SystemRescueCd downloads theautorun1script fromhttp://192.168.10.102 - the autorun1 script is executed on srv1, this script reads the ntfsclone image of the hard disk thought the network and it restores the hard drive
- You can also connect to SystemRescueCd using an ssh client and the root password that you defined previously, and you can run any command
After the Disaster Recovery is complete, you must must stop the dhcp service on srv3 so that srv1 cannot boot from the network again. You want srv1 to boot from the hard disk once it has been recovered by the autorun1 script.
What you need
- Since we want to reboot from the network via pxe after a problem, you must be able to remotely reboot the servers you want to manage. Most servers manufacturers provide management interfaces such as "HP ILO" (integrated LightsOut) or "IBM RSA" (Remote Supervisor Adapter). These management interfaces are often connected through ethernet, and they provide a web interface that let you reboot your server. If you are using a server provided by an host company, you may also have a specific web management interface developed by them that gives you the ability to reboot the server. In other cases, you may ask an engineer working in the datacenter to do that by hand.
- You also need another server on the same network in the datacenter that can acts as a pxe boot server. This is the backup/recovery server, it may either be a physical machine running linux, or a virtual machine running in VMWare for instance. It does not requires a lot of storage/power. A single backup/recovery server can recover all the other servers of your network.
- SystemRescueCd-1.0.0 or newer.
How to configure SystemRescueCd on your network
This section describes how to set-up the servers of your network in order to have them ready to perform automatic tasks when you boot from the network via pxe. There are two kind of servers in the network:
- The production servers that may be backed-up, recovered. They boot SystemRescueCd to perform administration tasks
- The single backup/recovery machine running Linux that provides the network-boot-services to the other servers of your network.
Device boot-order in the BIOS (production servers)
We want the productions servers to boot the normal operating system from the hard-disk when there is no problem, and they must boot SystemRescueCd from the network when we want to perform an administration task. The best way of doing that is to configure the device boot-order in the BIOS of your production servers, so that they first attempt to boot from the network, and then they boot from the hard-disk of they fail to boot from the network.
It will be necessary to start the DHCP service (involved in the PXE boot process) on the backup/recovery machine only when you want a server to boot on SystemRescueCd. When everything is ok, the DHCP server must be stopped, so that the server fail to get a dynamic IP address during the network boot, and then boot from the hard disk.
Another way of doing that is to always boot from the network using the pxelinux boot loader. In the pxelinux configuration file you can write localboot 0x80 in the default entry in order to force the server to boot from the hard disk anyway.
Autorun scripts (backup/recovery machine)
You may want your production server to boot SystemRescueCd to get an interactive ssh console to execute commands yourself. In that case you don't need any autorun script and you can skip this section. Read this section if you want your servers to boot SystemRescueCd to perform automatic tasks.
The autorun feature of SystemRescueCd allows you to automatically execute your own script on the production servers when SystemRescueCd boots. There is no need to be in front of the machine to setup the network for instance. The backup/recovery machine will deliver the autorun scripts to the production machine when it boots. You can use either NFS, Samba or HTTP to deliver this service to the production servers. Let's take HTTP as am example since it's easy to configure.
You have to setup an HTTP server on the backup/recovery machine. It can be apache httpd, thttp or any other web server. It must host the autorun scripts that you want the other servers to execute automatically when they boot. Here is an example of how you can organise your web server so that it provides 3 autorun scripts for each machine of your network. You could also use the same script for all the production boxes if you prefer.
http://192.168.10.102/srv1/autorun1backup script used by the first production serverhttp://192.168.10.102/srv1/autorun2recovery script used by the first production serverhttp://192.168.10.102/srv1/autorun3runs fsck on all the partitions of the first production serverhttp://192.168.10.102/srv2/autorun1backup script used by the second production serverhttp://192.168.10.102/srv2/autorun2recovery script used by the second production serverhttp://192.168.10.102/srv2/autorun3uns fsck on all the partitions of the second production server
It's important to notice that your autorun script must be named either autorun (single script), or autorun[0-9] (multiple scripts). You can't give another name such as backup.
dhcp server (backup/recovery machine)
The DHCP server is the first server contacted by the production machine trying to boot from the network. The DHCP service has to give the server a dynamic IP address, and other settings such as the IP of the DNS server, and the IP of the TFTP server used in the next stage of the boot process. Please read the Chapter about PXE network booting via PXE for more details about it.
Here is an example of /etc/dhcp/dhcpd.conf configuration file that can be either edited by hand or generated by the /etc/init.d/pxebootsrv service.
# DHCP Server Configuration file.
ddns-update-style interim;
ignore client-updates;
subnet 192.168.10.0 netmask 255.255.255.0
{
option routers 192.168.10.1;
option subnet-mask 255.255.255.0;
option domain-name-servers 192.168.10.1;
range dynamic-bootp 192.168.10.200 192.168.10.250;
default-lease-time 21600;
max-lease-time 43200;
host srv1
{
hardware ethernet 00:0C:29:57:D0:64;
fixed-address 192.168.10.100;
}
host srv2
{
hardware ethernet 00:0C:29:57:D0:74;
fixed-address 192.168.10.101;
}
}
allow booting;
allow bootp;
next-server 192.168.10.102; # IP addr of the TFTP server
class "pxeclients"
{
match if substring(option vendor-class-identifier, 0, 9) = "PXEClient";
filename "/pxelinux.0";
}
tftp server (backup/recovery machine)
The TFTP server is the second server contacted by the production machine trying to boot from the network. The TFTP service has to send the production server the pxelinux.0 file to be executed first. This is just the binary of the pxelinux boot loader. The TFTP server will also be used to send other files to the production server, such as the pxelinux configuration file, the kernel to boot, the initram.igz file, and it may also send sysrcd.dat and sysrcd.md5 that are necessary to complete the boot process. Please read the Chapter about PXE network booting via PXE for more details about this stage.
The TFTP server has to send most the SystemRescueCd files that are provided in the CD-ROM edition (pxelinux boot loader files, messages files for pxelinux, kernel and initramfs images). Since SystemRescueCd-0.4.4-beta, the SystemRescueCd filesystem (sysrcd.dat + sysrcd.md5) can be transferred by either the tftp server or an http server. If you want to load these files through http instead of tftp (it's faster), you should replace netboot=tftp://path/to/sysrcd.dat with netboot=http://path/to/sysrcd.dat.
The main difference between pxelinux and isolinux is that pxelinux needs a config file that is inside the pxelinux.cfg directory instead of an isolinux.cfg file. Anyway, the two kind of configuration files are very similar, so you can use the contents of isolinux.cfg file to make your customized pxelinux.cfg configuration. Here is an example of what files you may have on your hard disk:
| filepath | description |
|---|---|
| /tftpboot/pxelinux.0 | executable file of the pxelinux program |
| /tftpboot/sysrcd.dat | image of the filesystem |
| /tftpboot/sysrcd.md5 | check of the filesystem image |
| /tftpboot/f1boot.msg | message file displayed by pxelinux |
| /tftpboot/f2images.msg | message file displayed by pxelinux |
| /tftpboot/f3params.msg | message file displayed by pxelinux |
| /tftpboot/f4arun.msg | message file displayed by pxelinux |
| /tftpboot/f5troubl.msg | message file displayed by pxelinux |
| /tftpboot/f6pxe.msg | message file displayed by pxelinux |
| /tftpboot/f7net.msg | message file displayed by pxelinux |
| /tftpboot/memdisk | loads a floppy disk image into memory |
| /tftpboot/rescuecd | first kernel image file (default 32 bits kernel) |
| /tftpboot/rescue64 | second kernel image file (default 64 bits kernel) |
| /tftpboot/altker32 | third kernel image file (alternative 32 bits kernel) |
| /tftpboot/altker64 | fourth kernel image file (alternative 64 bits kernel) |
| /tftpboot/initram.igz | common initramfs image used by the kernels |
| /tftpboot/pxelinux.cfg/default | pxelinux.cfg default configuration file |
| /tftpboot/pxelinux.cfg/01-00-0c-29-57-d0-64 | config specific to server having mac=00:0C:29:57:D0:64 |
| /tftpboot/pxelinux.cfg/01-00-0c-29-57-d0-74 | config specific to server having mac=00:0C:29:57:D0:74 |
The most important file in the previous table is the pxelinux configuration file since you will have to edit to write the boot settings you want to use. There are two kinds of configuration files you can use:
- You can either use a single
/tftpboot/pxelinux.cfg/defaultif all the servers have the same pxelinux configuration - You can also use filename based on the mac address of the client if you want each server to have a specific pxelinux configuration file. For instance,
/tftpboot/pxelinux.cfg/01-00-0c-29-57-d0-64will be loaded by the server having00:0C:29:57:D0:64as a mac address.
Please read the next section to know what to write in the pxelinux configuration file.
pxelinux configuration (backup/recovery machine)
The pxelinux configuration file is similar to a grub or lilo configuration file since it's a configuration for a boot loader. It says to pxelinux which kernel and ramddisk file to load into memory, and what boot option to pass to the kernel (the parameters that we can read through /proc/cmdline once linux is loaded).
If you expect the server to boot automatically, it's important that you specify a default entry and a timeout so that pxelinux don't wait for a keyboard input from the user. Here is an example of a pxelinux configuration file.
Please notice there are only two lines for each entry (kernel and append). A line break has been inserted on the web site because the line is long but the line must not be broken in the configuration file.
default recovery
timeout 10
prompt 1
display f1boot.msg
F1 f1boot.msg
F2 f2images.msg
F3 f3params.msg
F4 f4arun.msg
F5 f5troubl.msg
F6 f6pxe.msg
F7 f7net.msg
label backup
kernel rescuecd
append initrd=initram.igz ar_source=http://192.168.10.102/srv1/ autoruns=1
ethx=192.168.10.100 netboot=tftp://192.168.10.103/sysrcd.dat cdroot
dns=192.168.10.2 gateway=192.168.10.1 setkmap=us
label recovery
kernel rescuecd
append initrd=initram.igz ar_source=http://192.168.10.102/srv1/ autoruns=2
ethx=192.168.10.100 netboot=tftp://192.168.10.103/sysrcd.dat cdroot
dns=192.168.10.2 gateway=192.168.10.1 setkmap=us
label fsck
kernel rescuecd
append initrd=initram.igz ar_source=http://192.168.10.102/srv1/ autoruns=3
ethx=192.168.10.100 netboot=tftp://192.168.10.103/sysrcd.dat cdroot
dns=192.168.10.2 gateway=192.168.10.1 setkmap=us
label ssh
kernel rescue64
append initrd=initram.igz autoruns=no ethx=192.168.10.100 rootpass=12345
netboot=tftp://192.168.10.103/sysrcd.dat dns=192.168.10.2 cdroot
gateway=192.168.10.1 setkmap=us
label serial
kernel rescuecd
append initrd=initram.igz autoruns=no console=ttyS0,9600 cdroot
netboot=tftp://192.168.10.103/sysrcd.dat dns=192.168.10.2
gateway=192.168.10.1 setkmap=us
label bootfromdisk
localboot 0x80
In that example the server will boot the recovery entry, so it will boot on the rescuecd kernel, and it will execute the script autorun2 downloaded from http://192.168.10.102/srv1/autorun2. The autorun2 script contains the instructions to perform an automatic recovery of the server.
Here is what the entries do:
- backup
- boots the
rescuecdkernel downloaded through tftp and useinitram.igzas initramfs - configures the network with the
192.168.10.100ip address - downloads the
http://192.168.10.102/srv1/autorun1script to a temporary file into the ram - executes the
autorun1script that performs a backup and reboots
- boots the
- recovery
- boots the
rescuecdkernel downloaded through tftp and useinitram.igzas initramfs - configures the network with the
192.168.10.100ip address - downloads the
http://192.168.10.102/srv1/autorun2script to a temporary file into the ram - executes the
autorun2script that performs a recovery of the system and reboots
- boots the
- fsck
- boots the
rescuecdkernel downloaded through tftp and useinitram.igzas initramfs - configures the network with the
192.168.10.100ip address - downloads the
http://192.168.10.102/srv1/autorun3script to a temporary file into the ram - executes the
autorun3script that performs an fsck of the filesystems and reboots
- boots the
- ssh
- boots the
rescue64kernel downloaded through tftp and useinitram.igzas initramfs - configures the network with the
192.168.10.100ip address - sets the root password of the SystemRescueCd system to
12345so that we can connect remotely through ssh - disables autorun
- boots the
- serial
- boots the
rescuecdkernel downloaded through tftp and useinitram.igzas initramfs with optionsconsole=ttyS0,9600so that we can work through the serial console - disables autorun
- boots the
- bootfromdisk
- boots from the first hard disk
Every time you want a server to execute the task, you just have to change the first line of the configuration file. For instance you can change default recovery to default bootfromdisk once the recovery is complete so that the server reboots on the hard-disk the next time. You can also stop the dhcp service on the backup/recovery server to force the attempt to boot the production server from the network to fail.
How to use SystemRescueCd once it's setup
Once your network is installed, using the SystemRescueCd to perform automatic or manual administration tasks remotely is very easy. Here is how to use these features.
Use SystemRescueCd to perform an automatic task
Let's take an example: the hard-disk of the srv1 machine (192.168.10.100) crashed and has just been replaced with a brand new empty disk. You want to perform the recovery job to restore the operating system on this machine.
- Connect on the backup/recovery server (srv3) and edit the pxelinux configuration file used by srv1 (eg:
/tftpboot/pxelinux.cfg/01-00-0c-29-57-d0-64) and write the name of the entry you want to boot in thedefaultsection:default recovery - Ensure the pxe boot services (dhcpd, tftpd, ...) are started on the backup/recovery server
- Use the management interface to reboot the production server on which you want to perform an administration task (srv1)
- Wait 3 minutes, just to be sure that the boot process on SystemRescueCd is complete on srv1
- Connect on the backup/recovery server (srv3) and edit the pxelinux configuration file used by srv1 (eg:
/tftpboot/pxelinux.cfg/01-00-0c-29-57-d0-64) and writebootfromdiskin thedefaultsection so that the server will boot from the hard-disk the next time:default bootfromdisk - If the
recoveryscript has been well designed, it should restart automatically after the recovery is complete, and srv1 boots on the production operating system.
Use SystemRescueCd to perform a task by hand
Let's take an example: You forgot the root password of the srv2 machine and you want to get an ssh connection on the SystemRescueCd to mount the root filesystem and edit the password file (usually /etc/shadow).
- Connect on the backup/recovery server (srv3) and edit the pxelinux configuration file used by srv2 (eg:
/tftpboot/pxelinux.cfg/01-00-0c-29-57-d0-74) and write the name of the entry you want to boot in thedefaultsection:default ssh - Ensure the pxe boot services (dhcpd, tftpd, ...) are started on the backup/recovery server
- Use the management interface to reboot the production server on which you want to perform an administration task (srv2)
- Wait 3 minutes, just to be sure that the boot process on SystemRescueCd is complete on srv2
- Use ssh to connect to srv2 from your office. You must use the root password that you gave on the command line on the pxelinux configuration file (eg:
12345) to connect to SystemRescueCd. Don't confuse with the root password of the system that you want to change, that is written in the /etc/shadow file on your hard-disk. Mount the root partition, and edit the file, or perform any other administration task by hand. - Connect on the backup/recovery server (srv3) and edit the pxelinux configuration file used by srv2 (eg:
/tftpboot/pxelinux.cfg/01-00-0c-29-57-d0-74) and writebootfromdiskin thedefaultsection so that the server will boot from the hard-disk the next time:default bootfromdisk - In the ssh console to srv2, type reboot. The linux system must restart with the new root password on srv2.
