spacer

Sysresccd-manual-en Manage remote windows linux servers using SystemRescueCd

History


Contents

About Network Booting SystemRescueCd

The most popular way of using SystemRescueCd is from a CDRom drive on a desktop in interactive mode.

This discussion details some of the ways to use network booting via PXE. The network configuration boot options (such as ethx=ip, gateway=ip, dns=ip, dodhcp) allow you to automatically configure the network access to SystemRescueCd at boot time. SystemRescueCd automatically starts an ssh server by default and you can define a static root password on the boot command line. That way you can get an ssh console to the server just by booting a customized SystemRescueCd. No need to configure anything. It can be very useful for Disaster Recovery, for example restoring a backup of your operating system after a crash. You can also use it to perform any other administration task on your server.

In other words, you can manage a windows or linux server that is in a datacenter remotely, from your office. There is no need to be in front of the machine to insert a disc, configure a network interface, or set a root password. All you have to do is to prepare a network boot server (one or several servers running the following network services: dhcpd, tftpd, httpd). You can install these three services either on a dedicated physical/vmware server or on a production machine running other services.

There are two interesting ways of using network boot:

  • prepare a pxe boot server which starts an interactive ssh console to administrate/repair the server by manually. You can also choose the serial console. To run graphical programs such as GParted remotely, use the vncserver boot option (requires SystemRescueCd-1.0.2 or newer) which starts the VNC server automatically.
  • configure SystemRescueCd to run autorun scripts to perform automatic tasks (backup, recovery, ...)

The autorun feature of SystemRescueCd allows you to execute scripts located on an nfs/samba/http server. No need to make a customized SystemRescueCd. All you have to do is to setup the pxe boot server so that SystemRescueCd automatically boots, configure the network and the root password, download the autorun scripts, and execute them.

To understand this chapter, first read: PXE network booting with SystemRescueCd and Run your own scripts at start-up with autorun.

Examples of interesting things you can do

  • Disaster Recovery:
    • restore a broken windows system using ntfsclone/partimage
    • restore a tarball of your linux operating system and reinstall grub
  • Hard disk partitioning and administration tasks
    • format the hard disk and reinstall a copy of the operating system
    • resize your partitions
    • reinstall the grub boot loader
  • Fix a critical problem
    • fix a boot problem (fsck fails at boot time)
    • reset the root password of your windows system with the ntpass floppy disk image
    • reset the root password of your linux system by chrooting on it

Example of how to implement an automatic disaster recovery on remote servers

Overview

This is a complete example of how you can organize an automatic disaster recovery on a network based on three machines located in a remote datacenter. This example shows you what kind of things you can do with SystemRescueCd.

Example of a network datacenter

In your datacenter you have three servers:

  1. WINDB 192.168.10.100 A Windows web server running IIS and MS SQL-Server
  2. WEB 192.168.10.101 A Linux web server running Apache and MySql
  3. BKUP 192.168.10.102 A backup/recovery server running linux

Installing the Disaster Recovery system

You want to be able to restore the operating system on WINDB or WEB in case there is a software problem or a hard disk failure. For instance, if windows fails to boot on WINDB because of a virus, you want to be able to restore the operating system on the hard disk by rebooting the server with a recovery script running SystemRescueCd. Here is how to install this disaster recovery system:

  1. install windows on WINDB with at least two partitions: C: for Windows, D: for Data
  2. in the BIOS of WINDB, define the boot order as "network, hard-disk"
  3. install apache or thttpd web server on WEB so that we can download a file via http
  4. boot SystemRescueCd. Using ntfsclone make an image of C: on volume D:
  5. copy the ntfsclone image to BKUP with ssh/sftp or ftp
  6. install the pxe boot services on BKUP (dhcp server, tftp server with pxelinux, http server, the SystemRescueCd files)
  7. write a shell script that restores the partition C: using ntfsclone and the image
  8. upload the recovery script to the backup/recovery server in the web server data files (so that we can access http://192.168.10.102/autorun1)
  9. configure pxelinux on BKUP to boot SystemRescueCd, configure the network and run http://192.168.10.102/autorun1 automatically. Here is an example of a boot command line for use with pxelinux: append initrd=initram.igz ethx=192.168.10.100 rootpass=SecRet ar_source=http://192.168.10.102 autoruns=1

Performing the automatic recovery

In case of a critical problem on WINDB, you can run the disaster recovery process. Only the two first steps require manual action:

1. connect to BKUP and start the pxe boot services (dhcpd, tftp, thttpd, ...)

2. use the management interface of WINDB to reboot this server

3. WINDB will find the DHCP server and boot from the network

4. the pxelinux boot loader starts SystemRescueCd with ethx=192.168.10.100 rootpass=12345 ar_source=http://192.168.10.102 autoruns=1

5. SystemRescueCd boots, configures the network and sets root's password

6. the autorun options were used (ar_source and autoruns), SystemRescueCd downloads the autorun1 script from http://192.168.10.102

7. the autorun1 script is executed on WINDB, this script reads the ntfsclone image of the hard disk through the network and restores the hard drive

If necessary login to SystemRescueCd using an ssh client as root and run any command.

After the Recovery is complete, stop the DHCP service on BKUP so WINDB cannot boot from the network and boots normally from the hard disk.

What you need

  • Most server manufacturers provide management interfaces such as "HP ILO" (integrated LightsOut) or "IBM RSA" (Remote Supervisor Adapter). These are often connected through ethernet, and provide an interface that lets you reboot the server. If you are using a server provided by a host company, you may also have a specific web management interface developed by them that gives you the ability to reboot the server. Otherwise ask someone in the datacenter to boot manually for you.
  • A system on the same network to be a pxe boot server. This can be the backup/recovery server, it may be a physical machine running linux, or a virtual machine running in VMWare. A single backup/recovery server can recover all the other servers of your network.
  • SystemRescueCd-1.0.0 or newer.

How to configure SystemRescueCd on your network

This section describes how to set-up the servers of your network in order to have them ready to perform automatic tasks when you boot from the network via pxe. There are two kinds of servers on the network:

  • The production servers that may be backed-up, recovered boot SystemRescueCd to perform administration tasks
  • backup/recovery machine running Linux that provides the network-boot-services to the other servers of your network.

Device boot-order in the BIOS (production servers)

Production servers boot the normal operating system from the hard-disk when there is no problem, and they must boot SystemRescueCd from the network when we want to perform an administration task. Configure the device boot-order in the BIOS so they first attempt to boot from the network. If that fails then they boot from the hard-disk.

It will be necessary to start the DHCP service (involved in the PXE boot process) on the backup/recovery machine only when you want a server to boot on SystemRescueCd. If everything is ok, the DHCP server must be stopped, so that the server will fail to get a dynamic IP address during the network boot, and then boot from the hard disk.

Another way of doing that is to always boot from the network using the pxelinux boot loader. In the pxelinux configuration file you can write localboot 0x80 in the default entry in order to force the server to boot from the hard disk anyway.

Autorun scripts (backup/recovery machine)

You may want your production server to boot SystemRescueCd to get an interactive ssh console to execute commands yourself. In that case you don't need any autorun scripts and you can skip this section. Read this section if you want your servers to boot SystemRescueCd to perform automatic tasks.

The autorun feature of SystemRescueCd allows you to automatically execute your own script on the production servers when SystemRescueCd boots. There is no need to be in front of the machine to setup the network for instance. The backup/recovery machine will deliver the autorun scripts to the production machine when it boots. You can use NFS, Samba or HTTP to deliver this service to the production servers. Let's take HTTP as an example since it's easy to configure.

You have to setup an HTTP server on the backup/recovery machine srv3. It can be apache httpd, thttp or any other web server. It must host the autorun scripts that you want the other servers to execute automatically when they boot. Here is an example of how you can organise your web server so that it provides 3 autorun scripts for each machine of your network. You could also use the same script for all the production boxes if you prefer.

It's important to notice that your autorun script must be named either autorun (single script), or autorun[0-9] (multiple scripts). You can't use another name such as backup.

DHCP server (backup/recovery machine)

The DHCP server is the first server contacted by the production machine trying to boot from the network. The DHCP service has to give the server a dynamic IP address, and other settings such as the IP of the DNS server, and the IP of the TFTP server used in the next stage of the boot process. Read the Chapter about PXE network booting via PXE for more details about this.

Here is an example of /etc/dhcp/dhcpd.conf that can be edited by hand or generated by the /etc/init.d/pxebootsrv service.

# DHCP Server Configuration file.

ddns-update-style interim;
ignore client-updates;

subnet 192.168.10.0 netmask 255.255.255.0
{
        option routers 192.168.10.1;
        option subnet-mask 255.255.255.0;
        option domain-name-servers 192.168.10.1;

        range dynamic-bootp 192.168.10.200 192.168.10.250;
        default-lease-time 21600;
        max-lease-time 43200;

        host WINDB
        {
                hardware ethernet      00:0C:29:57:D0:64;
                fixed-address          192.168.10.100;
        }

        host WEB
        {
                hardware ethernet      00:0C:29:57:D0:74;
                fixed-address          192.168.10.101;
        }
}

allow booting;
allow bootp;

next-server 192.168.10.102; # IP addr of the TFTP server

class "pxeclients"
{
   match if substring(option vendor-class-identifier, 0, 9) = "PXEClient";
   filename "/pxelinux.0";
}

tftp server (backup/recovery machine)

The TFTP server is the second server contacted by the production machine trying to boot from the network. The TFTP service has to send the production server the pxelinux.0 file to be executed first. This is just the binary of the pxelinux boot loader. The TFTP server will also be used to send other files to the production server, such as the pxelinux configuration file, the kernel to boot, the initram.igz file, and it may also send sysrcd.dat and sysrcd.md5 that are necessary to complete the boot process. Please read the Chapter about PXE network booting via PXE for more details about this stage.

The TFTP server has to send most of the SystemRescueCd files that are provided in the CD-ROM edition (pxelinux boot loader files, messages files for pxelinux, kernel and initramfs images). Since SystemRescueCd-0.4.4-beta, the SystemRescueCd filesystem (sysrcd.dat + sysrcd.md5) can be transferred by either the tftp server or an http server. If you want to load these files through http instead of tftp (it's faster), you should replace netboot=tftp://path/to/sysrcd.dat with netboot=http://path/to/sysrcd.dat.

The main difference between pxelinux and isolinux is that pxelinux needs a config file that is inside the pxelinux.cfg directory instead of an isolinux.cfg file. Anyway, the two kinds of configuration files are very similar, so you can use the contents of isolinux.cfg file to make your customized pxelinux.cfg configuration. Here is an example of what files you may have on your hard disk:

filepath description
/tftpboot/pxelinux.0 executable file of the pxelinux program
/tftpboot/sysrcd.dat image of the filesystem
/tftpboot/sysrcd.md5 check of the filesystem image
/tftpboot/f1boot.msg message file displayed by pxelinux
/tftpboot/f2images.msg message file displayed by pxelinux
/tftpboot/f3params.msg message file displayed by pxelinux
/tftpboot/f4arun.msg message file displayed by pxelinux
/tftpboot/f5troubl.msg message file displayed by pxelinux
/tftpboot/f6pxe.msg message file displayed by pxelinux
/tftpboot/f7net.msg message file displayed by pxelinux
/tftpboot/memdisk loads a floppy disk image into memory
/tftpboot/rescuecd first kernel image file (default 32 bits kernel)
/tftpboot/rescue64 second kernel image file (default 64 bits kernel)
/tftpboot/altker32 third kernel image file (alternative 32 bits kernel)
/tftpboot/altker64 fourth kernel image file (alternative 64 bits kernel)
/tftpboot/initram.igz common initramfs image used by the kernels
/tftpboot/pxelinux.cfg/default pxelinux.cfg default configuration file
/tftpboot/pxelinux.cfg/01-00-0c-29-57-d0-64 config specific to server having mac=00:0C:29:57:D0:64
/tftpboot/pxelinux.cfg/01-00-0c-29-57-d0-74 config specific to server having mac=00:0C:29:57:D0:74

The most important file in the previous table is the pxelinux configuration file since you will have to edit it to write the boot settings you want to use. There are two kinds of configuration files you can use:

  • You can either use a single /tftpboot/pxelinux.cfg/default if all the servers have the same pxelinux configuration
  • You can also use a filename based on the mac address of the client if you want each server to have a specific pxelinux configuration file. For instance, /tftpboot/pxelinux.cfg/01-00-0c-29-57-d0-64 will be loaded by the server having 00:0C:29:57:D0:64 as a mac address.

Please read the following section to find out what to write in the pxelinux configuration file.

pxelinux configuration (backup/recovery machine)

The pxelinux configuration file is similar to a grub or lilo configuration file since it's a configuration for a boot loader. It tells pxelinux which kernel and ramddisk file to load into memory, and what boot option to pass to the kernel (the parameters that we can read through /proc/cmdline once linux is loaded).

If you expect the server to boot automatically, it's important that you specify a default entry and a timeout so that pxelinux won't wait for a keyboard input from the user. Here is an example of a pxelinux configuration file.

There are only two lines for each entry (kernel and append). A line break has been inserted here because the line is long. The line must not be broken in the configuration file.

default recovery
timeout 10
prompt 1
display f1boot.msg
F1 f1boot.msg
F2 f2images.msg
F3 f3params.msg
F4 f4arun.msg
F5 f5troubl.msg
F6 f6pxe.msg
F7 f7net.msg
label backup
    kernel rescuecd
    append initrd=initram.igz ar_source=http://192.168.10.102/srv1/ autoruns=1 
           ethx=192.168.10.100 netboot=tftp://192.168.10.103/sysrcd.dat cdroot
           dns=192.168.10.2 gateway=192.168.10.1 setkmap=us
label recovery
    kernel rescuecd
    append initrd=initram.igz ar_source=http://192.168.10.102/srv1/ autoruns=2 
           ethx=192.168.10.100 netboot=tftp://192.168.10.103/sysrcd.dat cdroot
           dns=192.168.10.2 gateway=192.168.10.1 setkmap=us
label fsck
    kernel rescuecd
    append initrd=initram.igz ar_source=http://192.168.10.102/srv1/ autoruns=3
           ethx=192.168.10.100 netboot=tftp://192.168.10.103/sysrcd.dat cdroot
           dns=192.168.10.2 gateway=192.168.10.1 setkmap=us
label ssh
    kernel rescue64
    append initrd=initram.igz autoruns=no ethx=192.168.10.100 rootpass=12345
           netboot=tftp://192.168.10.103/sysrcd.dat dns=192.168.10.2 cdroot
           gateway=192.168.10.1 setkmap=us
label serial
    kernel rescuecd
    append initrd=initram.igz autoruns=no console=ttyS0,9600 cdroot
           netboot=tftp://192.168.10.103/sysrcd.dat dns=192.168.10.2 
           gateway=192.168.10.1 setkmap=us
label bootfromdisk
    localboot 0x80

In this example the server will boot the recovery entry, so it will boot the rescuecd kernel, and it will execute the script autorun2 downloaded from http://192.168.10.102/srv1/autorun2. The autorun2 script contains the instructions to perform an automatic recovery of the server.

Here is what the entries do:

  • backup
    • boots the rescuecd kernel downloaded through tftp and uses initram.igz as initramfs
    • configures the network with the 192.168.10.100 ip address
    • downloads the http://192.168.10.102/srv1/autorun1 script to a temporary file into the ram
    • executes the autorun1 script that performs a backup and reboots
  • recovery
    • boots the rescuecd kernel downloaded through tftp and uses initram.igz as initramfs
    • configures the network with the 192.168.10.100 ip address
    • downloads the http://192.168.10.102/srv1/autorun2 script to a temporary file into the ram
    • executes the autorun2 script that performs a recovery of the system and reboots
  • fsck
    • boots the rescuecd kernel downloaded through tftp and uses initram.igz as initramfs
    • configures the network with the 192.168.10.100 ip address
    • downloads the http://192.168.10.102/srv1/autorun3 script to a temporary file into the ram
    • executes the autorun3 script that performs an fsck of the filesystems and reboots
  • ssh
    • boots the rescue64 kernel downloaded through tftp and uses initram.igz as initramfs
    • configures the network with the 192.168.10.100 ip address
    • sets the root password of the SystemRescueCd system to 12345 so that we can connect remotely through ssh
    • disables autorun
  • serial
    • boots the rescuecd kernel downloaded through tftp and uses initram.igz as initramfs with options console=ttyS0,9600 so that we can work through the serial console
    • disables autorun
  • bootfromdisk
    • boots from the first hard disk

Every time you want a server to execute the task, you just have to change the first line of the configuration file. For instance you can change default recovery to default bootfromdisk once the recovery is complete so that the server reboots from the hard-disk the next time. You can also stop the dhcp service on the backup/recovery server to force the attempt to boot the production server from the network to fail.

How to use SystemRescueCd once it's setup

Once your network is installed, using the SystemRescueCd to perform automatic or manual administration tasks remotely is very easy. Here is how to use these features.

Use SystemRescueCd to perform an automatic task

Let's take as an example: the hard-disk of the srv1 machine (192.168.10.100) crashed and has just been replaced with a brand new empty disk. You now want to perform the recovery job to restore the operating system on this machine.

  1. Connect to the backup/recovery server (srv3) and edit the pxelinux configuration file used by srv1 (eg: /tftpboot/pxelinux.cfg/01-00-0c-29-57-d0-64) and write the name of the entry you want to boot in the default section: default recovery
  2. Ensure the pxe boot services (dhcpd, tftpd, ...) are started on the backup/recovery server
  3. Use the management interface to reboot the production server on which you want to perform an administration task (srv1)
  4. Wait 3 minutes, just to be sure that the boot process on SystemRescueCd is complete on srv1
  5. Connect to the backup/recovery server (srv3) and edit the pxelinux configuration file used by srv1 (eg: /tftpboot/pxelinux.cfg/01-00-0c-29-57-d0-64) and write bootfromdisk in the default section so that the server will boot from the hard-disk the next time: default bootfromdisk
  6. If the recovery script has been well designed, it should restart automatically after the recovery is complete, and srv1 will boot from the production operating system.

Use SystemRescueCd to perform a task by hand

Let's take as an example: You forgot the root password of the srv2 machine and you want to get an ssh connection to the SystemRescueCd to mount the root filesystem and edit the password file (usually /etc/shadow).

  1. Connect to the backup/recovery server (srv3) and edit the pxelinux configuration file used by srv2 (eg: /tftpboot/pxelinux.cfg/01-00-0c-29-57-d0-74) and write the name of the entry you want to boot in the default section: default ssh
  2. Ensure the pxe boot services (dhcpd, tftpd, ...) are started on the backup/recovery server
  3. Use the management interface to reboot the production server on which you want to perform an administration task (srv2)
  4. Wait 3 minutes, just to be sure that the boot process on SystemRescueCd is complete on srv2
  5. Use ssh to connect to srv2 from your office. You must use the root password that you gave on the command line on the pxelinux configuration file (eg: 12345) to connect to SystemRescueCd. Don't confuse this password with the root password of the system that you want to change, that is written in the /etc/shadow file on your hard-disk. Mount the root partition, and edit the file, or perform any other administration task by hand.
  6. Connect to the backup/recovery server (srv3) and edit the pxelinux configuration file used by srv2 (eg: /tftpboot/pxelinux.cfg/01-00-0c-29-57-d0-74) and write bootfromdisk in the default section so that the server will boot from the hard-disk the next time: default bootfromdisk
  7. In the ssh console to srv2, type reboot. The linux system must restart with the new root password on srv2.
spacer