Sysresccd-manual-en Manage remote windows linux servers using SystemRescueCd

From SystemRescueCd

Jump to: navigation, search

Contents

About

The most popular way of using SystemRescueCd is from a CDRom drive on a desktop in interactive mode. Recent SystemRescueCd versions also come with the support for network boot via PXE. The network configuration boot options (such as ethx=ip, gateway=ip, dns=ip, dodhcp) allow you to automatically configure the network access to SystemRescueCd at boot time. SystemRescueCd automatically starts an ssh server by default and you can define a static root password on the boot command line. That way you can get an ssh console to the server just by booting a customized SystemRescueCd. There is no need to configure anything to get it to work. It can be very useful for Disaster Recovery purposes, I mean to reinstall a backup of your operating system after a crash. You can also use it to make any other administration task on your server.

In other words, you can manage a windows or linux servers that is in a datacenter remotely, from your office. There is no need to be in front of the machine to insert a disc, configure a network interface, or set a root password. All you have to do is to prepare a network boot server (one or several servers running the following network services: dhcpd, tftpd, httpd). You can install these three services either on a dedicated physical/vmware server or on a production machine running other services.

There are two interesting ways of using it:

  • You can prepare a pxe boot server so that you get an interactive ssh console to administrate/repair your server by hand. You can prefer the serial console that is also supported. If you need to run graphical programs such as GParted remotely, you can use the vncserver boot option (requires SystemRescueCd-1.0.2 or newer) so that the VNC server is automatically started on the remote machine.
  • You can also configure SystemRescueCd in order to run your own autorun scripts to perform automatic tasks (backup, recovery, ...)

Since the autorun feature of SystemRescueCd allows you to execute scripts located on an nfs/samba/http server, there is no need to spend time to make a customized SystemRescueCd version. All you have to do is to setup the pxe boot server so that SystemRescueCd automatically boots, configure the network and the root password, download the autorun scripts, and execute it.

To understand this chapter, you have to first read the two previous chapters: PXE network booting with SystemRescueCd and Run your own scripts at start-up with autorun.

Examples of interesting things you can do

  • Disaster Recovery:
    • restore a broken windows system with ntfsclone/partimage
    • restore a tarball of your linux operating system and reinstall grub
  • Hard disk partitioning and administration tasks
    • format the hard disk and reinstall a copy of the operating system
    • resize your partitions
    • reinstall the grub boot loader
  • Fix a critical problem
    • fix a boot problem (fsck fails at boot time)
    • reset the root password of your windows system with the ntpass floppy disk image
    • reset the root password of your linux system by chrooting on it

Example of how to use an automatic disaster recovery on remote servers

Overview

This is a complete example of how you can organize an automatic disaster recovery on a network based on three machines located in a remote datacenter. This example shows you what kind of things you can do with SystemRescueCd. The technical details and configuration steps are given in the next sections.

Example of a network in a datacenter

In your datacenter you have three servers:

  1. srv1 192.168.10.100 A Windows web server running IIS and MS SQL-Server
  2. srv2 192.168.10.101 A Linux web server running Apache and MySql
  3. srv3 192.168.10.102 A backup/recovery server running linux

Installing the Disaster Recovery system

You want to be able to restore the operating system on srv1 and srv2 in case there is a software problem or an hard disk failure. For instance, if windows fails to boot on srv1 because of a virus, you want to be able to restore the operating system on the hard disk by rebooting the server on a recovery script running SystemRescueCd. Here is how to install this disaster recovery system:

  1. install windows on srv1 with at least two partitions: C: for Windows, D: for your Data
  2. in the bios of srv1, define the boot order as "network, hard-disk"
  3. install an apache or thttpd web server on srv3 so that we can download a file thought http
  4. reboot on SystemRescueCd and make an image of the disk C using ntfsclone, save the image on disk D:
  5. copy the ntfsclone image to srv3 with ssh/sftp or ftp
  6. install the pxe boot services on srv3 (dhcp server, tftp server with pxelinux, http server, the SystemRescueCd files)
  7. write a shell script that restores the partition C: using ntfsclone and the backup you made
  8. upload the recovery script you wrote on the backup/recovery server in the web server data files (so that we can access http://192.168.10.102/autorun1)
  9. configure pxelinux on srv3 so that it boots SystemRescueCd and configure the network and runs http://192.168.10.102/autorun1 automatically. Here is an example of a boot command line you can use with pxelinux: append initrd=initram.igz ethx=192.168.10.100 rootpass=12345 ar_source=http://192.168.10.102 autoruns=1

Performing the automatic recovery

That way, in case of a critical problem on srv1, you can run the disaster recovery process in few minutes. Only the two first steps requires an action from the user:

  1. connect to the backup server and start the pxe boot services (dhcpd, tftp, thttpd, ...)
  2. use the management interface of the srv1 machine to reboot this server
  3. since there is a dhcp server running, srv1 will automatically boot from the network
  4. the pxelinux boot loader starts SystemRescueCd with the ethx=192.168.10.100 rootpass=12345 ar_source=http://192.168.10.102 autoruns=1 options
  5. SystemRescueCd boots and automatically configures the network and the system root password
  6. since the autorun options were used (ar_source and autoruns), SystemRescueCd downloads the autorun1 script from http://192.168.10.102
  7. the autorun1 script is executed on srv1, this script reads the ntfsclone image of the hard disk thought the network and it restores the hard drive
  8. You can also connect to SystemRescueCd using an ssh client and the root password that you defined previously, and you can run any command

After the Disaster Recovery is complete, you must must stop the dhcp service on srv3 so that srv1 cannot boot from the network again. You want srv1 to boot from the hard disk once it has been recovered by the autorun1 script.

What you need

  • Since we want to reboot from the network via pxe after a problem, you must be able to remotely reboot the servers you want to manage. Most servers manufacturers provide management interfaces such as "HP ILO" (integrated LightsOut) or "IBM RSA" (Remote Supervisor Adapter). These management interfaces are often connected through ethernet, and they provide a web interface that let you reboot your server. If you are using a server provided by an host company, you may also have a specific web management interface developed by them that gives you the ability to reboot the server. In other cases, you may ask an engineer working in the datacenter to do that by hand.
  • You also need another server on the same network in the datacenter that can acts as a pxe boot server. This is the backup/recovery server, it may either be a physical machine running linux, or a virtual machine running in VMWare for instance. It does not requires a lot of storage/power. A single backup/recovery server can recover all the other servers of your network.
  • SystemRescueCd-1.0.0 or newer.

How to configure SystemRescueCd on your network

This section describes how to set-up the servers of your network in order to have them ready to perform automatic tasks when you boot from the network via pxe. There are two kind of servers in the network:

  • The production servers that may be backed-up, recovered. They boot SystemRescueCd to perform administration tasks
  • The single backup/recovery machine running Linux that provides the network-boot-services to the other servers of your network.

Device boot-order in the BIOS (production servers)

We want the productions servers to boot the normal operating system from the hard-disk when there is no problem, and they must boot SystemRescueCd from the network when we want to perform an administration task. The best way of doing that is to configure the device boot-order in the BIOS of your production servers, so that they first attempt to boot from the network, and then they boot from the hard-disk of they fail to boot from the network.

It will be necessary to start the DHCP service (involved in the PXE boot process) on the backup/recovery machine only when you want a server to boot on SystemRescueCd. When everything is ok, the DHCP server must be stopped, so that the server fail to get a dynamic IP address during the network boot, and then boot from the hard disk.

Another way of doing that is to always boot from the network using the pxelinux boot loader. In the pxelinux configuration file you can write localboot 0x80 in the default entry in order to force the server to boot from the hard disk anyway.

Autorun scripts (backup/recovery machine)

You may want your production server to boot SystemRescueCd to get an interactive ssh console to execute commands yourself. In that case you don't need any autorun script and you can skip this section. Read this section if you want your servers to boot SystemRescueCd to perform automatic tasks.

The autorun feature of SystemRescueCd allows you to automatically execute your own script on the production servers when SystemRescueCd boots. There is no need to be in front of the machine to setup the network for instance. The backup/recovery machine will deliver the autorun scripts to the production machine when it boots. You can use either NFS, Samba or HTTP to deliver this service to the production servers. Let's take HTTP as am example since it's easy to configure.

You have to setup an HTTP server on the backup/recovery machine. It can be apache httpd, thttp or any other web server. It must host the autorun scripts that you want the other servers to execute automatically when they boot. Here is an example of how you can organise your web server so that it provides 3 autorun scripts for each machine of your network. You could also use the same script for all the production boxes if you prefer.

It's important to notice that your autorun script must be named either autorun (single script), or autorun[0-9] (multiple scripts). You can't give another name such as backup.

dhcp server (backup/recovery machine)

The DHCP server is the first server contacted by the production machine trying to boot from the network. The DHCP service has to give the server a dynamic IP address, and other settings such as the IP of the DNS server, and the IP of the TFTP server used in the next stage of the boot process. Please read the Chapter about PXE network booting via PXE for more details about it.

Here is an example of /etc/dhcp/dhcpd.conf configuration file that can be either edited by hand or generated by the /etc/init.d/pxebootsrv service.

# DHCP Server Configuration file.

ddns-update-style interim;
ignore client-updates;

subnet 192.168.10.0 netmask 255.255.255.0
{
        option routers 192.168.10.1;
        option subnet-mask 255.255.255.0;
        option domain-name-servers 192.168.10.1;

        range dynamic-bootp 192.168.10.200 192.168.10.250;
        default-lease-time 21600;
        max-lease-time 43200;

        host srv1
        {
                hardware ethernet      00:0C:29:57:D0:64;
                fixed-address          192.168.10.100;
        }

        host srv2
        {
                hardware ethernet      00:0C:29:57:D0:74;
                fixed-address          192.168.10.101;
        }
}

allow booting;
allow bootp;

next-server 192.168.10.102; # IP addr of the TFTP server

class "pxeclients"
{
   match if substring(option vendor-class-identifier, 0, 9) = "PXEClient";
   filename "/pxelinux.0";
}

tftp server (backup/recovery machine)

The TFTP server is the second server contacted by the production machine trying to boot from the network. The TFTP service has to send the production server the pxelinux.0 file to be executed first. This is just the binary of the pxelinux boot loader. The TFTP server will also be used to send other files to the production server, such as the pxelinux configuration file, the kernel to boot, the initram.igz file, and it may also send sysrcd.dat and sysrcd.md5 that are necessary to complete the boot process. Please read the Chapter about PXE network booting via PXE for more details about this stage.

The TFTP server has to send most the SystemRescueCd files that are provided in the CD-ROM edition (pxelinux boot loader files, messages files for pxelinux, kernel and initramfs images). Since SystemRescueCd-0.4.4-beta, the SystemRescueCd filesystem (sysrcd.dat + sysrcd.md5) can be transferred by either the tftp server or an http server. If you want to load these files through http instead of tftp (it's faster), you should replace netboot=tftp://path/to/sysrcd.dat with netboot=http://path/to/sysrcd.dat.

The main difference between pxelinux and isolinux is that pxelinux needs a config file that is inside the pxelinux.cfg directory instead of an isolinux.cfg file. Anyway, the two kind of configuration files are very similar, so you can use the contents of isolinux.cfg file to make your customized pxelinux.cfg configuration. Here is an example of what files you may have on your hard disk:

filepath description
/tftpboot/pxelinux.0 executable file of the pxelinux program
/tftpboot/sysrcd.dat image of the filesystem
/tftpboot/sysrcd.md5 check of the filesystem image
/tftpboot/f1boot.msg message file displayed by pxelinux
/tftpboot/f2images.msg message file displayed by pxelinux
/tftpboot/f3params.msg message file displayed by pxelinux
/tftpboot/f4arun.msg message file displayed by pxelinux
/tftpboot/f5troubl.msg message file displayed by pxelinux
/tftpboot/f6pxe.msg message file displayed by pxelinux
/tftpboot/f7net.msg message file displayed by pxelinux
/tftpboot/memdisk loads a floppy disk image into memory
/tftpboot/rescuecd first kernel image file (default 32 bits kernel)
/tftpboot/rescue64 second kernel image file (default 64 bits kernel)
/tftpboot/altker32 third kernel image file (alternative 32 bits kernel)
/tftpboot/altker64 fourth kernel image file (alternative 64 bits kernel)
/tftpboot/initram.igz common initramfs image used by the kernels
/tftpboot/pxelinux.cfg/default pxelinux.cfg default configuration file
/tftpboot/pxelinux.cfg/01-00-0c-29-57-d0-64 config specific to server having mac=00:0C:29:57:D0:64
/tftpboot/pxelinux.cfg/01-00-0c-29-57-d0-74 config specific to server having mac=00:0C:29:57:D0:74

The most important file in the previous table is the pxelinux configuration file since you will have to edit to write the boot settings you want to use. There are two kinds of configuration files you can use:

  • You can either use a single /tftpboot/pxelinux.cfg/default if all the servers have the same pxelinux configuration
  • You can also use filename based on the mac address of the client if you want each server to have a specific pxelinux configuration file. For instance, /tftpboot/pxelinux.cfg/01-00-0c-29-57-d0-64 will be loaded by the server having 00:0C:29:57:D0:64 as a mac address.

Please read the next section to know what to write in the pxelinux configuration file.

pxelinux configuration (backup/recovery machine)

The pxelinux configuration file is similar to a grub or lilo configuration file since it's a configuration for a boot loader. It says to pxelinux which kernel and ramddisk file to load into memory, and what boot option to pass to the kernel (the parameters that we can read through /proc/cmdline once linux is loaded).

If you expect the server to boot automatically, it's important that you specify a default entry and a timeout so that pxelinux don't wait for a keyboard input from the user. Here is an example of a pxelinux configuration file.

Please notice there are only two lines for each entry (kernel and append). A line break has been inserted on the web site because the line is long but the line must not be broken in the configuration file.

default recovery
timeout 10
prompt 1
display f1boot.msg
F1 f1boot.msg
F2 f2images.msg
F3 f3params.msg
F4 f4arun.msg
F5 f5troubl.msg
F6 f6pxe.msg
F7 f7net.msg
label backup
    kernel rescuecd
    append initrd=initram.igz ar_source=http://192.168.10.102/srv1/ autoruns=1 
           ethx=192.168.10.100 netboot=tftp://192.168.10.103/sysrcd.dat cdroot
           dns=192.168.10.2 gateway=192.168.10.1 setkmap=us
label recovery
    kernel rescuecd
    append initrd=initram.igz ar_source=http://192.168.10.102/srv1/ autoruns=2 
           ethx=192.168.10.100 netboot=tftp://192.168.10.103/sysrcd.dat cdroot
           dns=192.168.10.2 gateway=192.168.10.1 setkmap=us
label fsck
    kernel rescuecd
    append initrd=initram.igz ar_source=http://192.168.10.102/srv1/ autoruns=3
           ethx=192.168.10.100 netboot=tftp://192.168.10.103/sysrcd.dat cdroot
           dns=192.168.10.2 gateway=192.168.10.1 setkmap=us
label ssh
    kernel rescue64
    append initrd=initram.igz autoruns=no ethx=192.168.10.100 rootpass=12345
           netboot=tftp://192.168.10.103/sysrcd.dat dns=192.168.10.2 cdroot
           gateway=192.168.10.1 setkmap=us
label serial
    kernel rescuecd
    append initrd=initram.igz autoruns=no console=ttyS0,9600 cdroot
           netboot=tftp://192.168.10.103/sysrcd.dat dns=192.168.10.2 
           gateway=192.168.10.1 setkmap=us
label bootfromdisk
    localboot 0x80

In that example the server will boot the recovery entry, so it will boot on the rescuecd kernel, and it will execute the script autorun2 downloaded from http://192.168.10.102/srv1/autorun2. The autorun2 script contains the instructions to perform an automatic recovery of the server.

Here is what the entries do:

  • backup
    • boots the rescuecd kernel downloaded through tftp and use initram.igz as initramfs
    • configures the network with the 192.168.10.100 ip address
    • downloads the http://192.168.10.102/srv1/autorun1 script to a temporary file into the ram
    • executes the autorun1 script that performs a backup and reboots
  • recovery
    • boots the rescuecd kernel downloaded through tftp and use initram.igz as initramfs
    • configures the network with the 192.168.10.100 ip address
    • downloads the http://192.168.10.102/srv1/autorun2 script to a temporary file into the ram
    • executes the autorun2 script that performs a recovery of the system and reboots
  • fsck
    • boots the rescuecd kernel downloaded through tftp and use initram.igz as initramfs
    • configures the network with the 192.168.10.100 ip address
    • downloads the http://192.168.10.102/srv1/autorun3 script to a temporary file into the ram
    • executes the autorun3 script that performs an fsck of the filesystems and reboots
  • ssh
    • boots the rescue64 kernel downloaded through tftp and use initram.igz as initramfs
    • configures the network with the 192.168.10.100 ip address
    • sets the root password of the SystemRescueCd system to 12345 so that we can connect remotely through ssh
    • disables autorun
  • serial
    • boots the rescuecd kernel downloaded through tftp and use initram.igz as initramfs with options console=ttyS0,9600 so that we can work through the serial console
    • disables autorun
  • bootfromdisk
    • boots from the first hard disk

Every time you want a server to execute the task, you just have to change the first line of the configuration file. For instance you can change default recovery to default bootfromdisk once the recovery is complete so that the server reboots on the hard-disk the next time. You can also stop the dhcp service on the backup/recovery server to force the attempt to boot the production server from the network to fail.

How to use SystemRescueCd once it's setup

Once your network is installed, using the SystemRescueCd to perform automatic or manual administration tasks remotely is very easy. Here is how to use these features.

Use SystemRescueCd to perform an automatic task

Let's take an example: the hard-disk of the srv1 machine (192.168.10.100) crashed and has just been replaced with a brand new empty disk. You want to perform the recovery job to restore the operating system on this machine.

  1. Connect on the backup/recovery server (srv3) and edit the pxelinux configuration file used by srv1 (eg: /tftpboot/pxelinux.cfg/01-00-0c-29-57-d0-64) and write the name of the entry you want to boot in the default section: default recovery
  2. Ensure the pxe boot services (dhcpd, tftpd, ...) are started on the backup/recovery server
  3. Use the management interface to reboot the production server on which you want to perform an administration task (srv1)
  4. Wait 3 minutes, just to be sure that the boot process on SystemRescueCd is complete on srv1
  5. Connect on the backup/recovery server (srv3) and edit the pxelinux configuration file used by srv1 (eg: /tftpboot/pxelinux.cfg/01-00-0c-29-57-d0-64) and write bootfromdisk in the default section so that the server will boot from the hard-disk the next time: default bootfromdisk
  6. If the recovery script has been well designed, it should restart automatically after the recovery is complete, and srv1 boots on the production operating system.

Use SystemRescueCd to perform a task by hand

Let's take an example: You forgot the root password of the srv2 machine and you want to get an ssh connection on the SystemRescueCd to mount the root filesystem and edit the password file (usually /etc/shadow).

  1. Connect on the backup/recovery server (srv3) and edit the pxelinux configuration file used by srv2 (eg: /tftpboot/pxelinux.cfg/01-00-0c-29-57-d0-74) and write the name of the entry you want to boot in the default section: default ssh
  2. Ensure the pxe boot services (dhcpd, tftpd, ...) are started on the backup/recovery server
  3. Use the management interface to reboot the production server on which you want to perform an administration task (srv2)
  4. Wait 3 minutes, just to be sure that the boot process on SystemRescueCd is complete on srv2
  5. Use ssh to connect to srv2 from your office. You must use the root password that you gave on the command line on the pxelinux configuration file (eg: 12345) to connect to SystemRescueCd. Don't confuse with the root password of the system that you want to change, that is written in the /etc/shadow file on your hard-disk. Mount the root partition, and edit the file, or perform any other administration task by hand.
  6. Connect on the backup/recovery server (srv3) and edit the pxelinux configuration file used by srv2 (eg: /tftpboot/pxelinux.cfg/01-00-0c-29-57-d0-74) and write bootfromdisk in the default section so that the server will boot from the hard-disk the next time: default bootfromdisk
  7. In the ssh console to srv2, type reboot. The linux system must restart with the new root password on srv2.
Personal tools