Punnarao Kommineni, Linux & VMware Administrator: The kdump Crash Recovery Service

When the kdump crash dumping mechanism is enabled, the system is booted from the context of another kernel. This second kernel reserves a small amount of memory and its only purpose is to capture the core dump image in case the system crashes.

Being able to analyze the core dump significantly helps to determine the exact cause of the system failure, and it is therefore strongly recommended to have this feature enabled.

Installing the kdump Service:

In order to use the kdump service on your system, make sure we have the kexec-tools package installed. To do so, type the following at a shell prompt as root.

Configuring the memory usage:

Memory reserved for the kdump kernel is always reserved during system boot, which means that the amount of memory is specified in the system's boot loader configuration. To configure the amount of memory to be reserved for thee kdump kernel, edit the /boot/grub/grub.conf file and add crashkernel=<size>M or crashkernel=auto to the list of kernel options as shown in following /boot/grub/grub.conf file. The crashkernel=auto option only reserves the memory if the physical memory of the system is equal to or greater than:

2 GB on 32-bit and 64-bit x86 architectures;
2 GB on PowerPC if the page size is 4KB, or 8GB otherwise;
4 GB on IBM S/390

When the kdump crash recovery is enabled, the minimum memory requirements increase by the amount of memory reserved for it. This value is determined by the user, and defaults to 128MB plus 64MB for each TB of physical memory (that is, a total of 192 MB for a system with 1 TB of physical memory). The memory can be attempted up to the maximum of 896 MB if required. This is recommended especially in large environments, for example in systems with a large number of Logical Unit Numbers (LUNs).

Configuring the Target Type:

When a kernel crash is captured, the core dump can be either stored as a file in a local file system, written directly to a device, or sent over a network using the NFS (Network File System) or SSH (Secure Shell) protocol. Only one of these options can be set at the moment, and the default option is to store the vmcore file in the /var/crash directory of the local file system. To change this, as root , open the /etc/kdump.conf configuration file in a text editor and edit the options.

To change the local directory in which the core dump is to be saved, remove the hash sign (#) from the beginning of the #path /var/crash line, and replace the value with a desired directory path.

Optionally, if you want to write the file to a different partition, follow the same procedure with the #ext4 /dev/sda3 line as well, and change both the file system type and the device ( a device name, a filesystem label, and UUID are all supported) accordingly.

To write the dump directly to a device, remove the hash sign (#) from the beginning of the #raw /dev/sda5 line, and replace the value with a desired device name.

To write the dump to a remote machine using the NFS protocol, remove the has sign (#) from the beginning of the #net my.server.com:/export/tmp line, and replace the value with a valid host name and directory path.

To store the dump to a remote machine using SSH protocol, remove the hash sign (#) from the beginning of the #net user@my.server.com line, and replace the value with a valid user name and host name.

The vmcore.flat file must be converted, when transferring a core file to a remote target over SSH, the core file needs to be serialized for the tarnsfer. This creates a vmcore.flat file in the /var/crash directory on the target system, which is unreadable by the crash utility. To convert vmcore.flat to a dump file that is readable by crash, run the following command as root on the target system:

Configuring the core collector:

To reduce the size of the vmcore dump file, kdump allows you to specify an external application (that is, a core collector) to compress the data, and optionally leave out all irrelevant information. Currently, the only fully supported core collector is makedumpfile.

To enable the core collector, as root, open the /etc/kdump.conf configuration file in a text editor, remove the hash sign (#) from the beginning of the #core_collector makedumpfile -c --message-level 1 -d 31 line, and edit the command-line options as described below.

To enable the dump file compression, add the -c parameter.

To remove certain pages from dump, add the -d value parameter, where value is a sum of values of pages you want to omit as described in following table. To remove both zero and free pages.

Supported filtering levels:

Changing the Default Action:

By default, when kdump fails, to create a core dump, the root file system is mounted and /sbin/init is run. To change this behavior, as root, open the /etc/kdump.conf configuration file in a text editor, remove the hash sign (#) from the beginning of the #default shell line, and replace the value with a desired action as described in following table.

Supported Actions:

Enabling the services:

To start the kdump daemon at boot time, type the following at a shell prompt as root.

To start the service in the current session, using the following command as root:

Testing the configuration:

To test the configuration, reboot the system with kdump enabled, and make sure that the service is running

Then type the following commands at a shell prompt.

This will force the Linux kernel to crash, and the address-YYYY-MM-DD-HH:MM:SS/vmcore file will be copied to the location which we have selected in the configuration.