|
|||
|
LinuxConf Europe 2007 Conference and Tutorials
Sunday 2nd - Wednesday 5th September University Arms Hotel, Cambridge, England |
|
Fernando Luis Vázquez Cao - NTT Open Source Software CenterGenerating a White List for Hardware which Works with Kexec/KdumpThe mainstream Linux kernel lacked a crash dumping mechanism from its inception until the recent adoption of Kdump. This, despite the fact that there were several solutions available out-of-tree and some of them were even included in major distributions. However concerns about their intrusiveness and reliability prevented them from making it into the mainstream (vanilla) kernel, the main argument being that relying on the resources of a crashing kernel to capture a dump, as they did, is inherently dangerous. The appearance of Eric Biederman's Kexec patches and their subsequent inclusion in the kernel as a new system call paved the way for the implementation of an idea that had been floating around for some time: the use of a memory-preserving soft-booted kernel to capture the crash dump. This was the approach adopted by Kdump, which made it possible to achieve high reliability by isolating the crash dumping process from the crashed kernel. In theory, Kdump's approach constitutes the most reliable way of capturing a dump. Even though testing proved the theory right (i.e. Kdump is much more robust and reliable than in-kernel crash dumping solutions), some deficiencies in Kdump were revealed too. Kernel crash dumping is a multi-stage process which involves three basic operations: detecting the crash, a minimal shutdown of the previously running system (i.e. the crashed kernel), and, finally, the capture of the crash dump. Kdump is very good at the first two but there are still some issues when the dump capture kernel takes control of the system. In particular the new kernel may fail to initialize the underlying devices which, in turn, is likely to lead to a kernel panic or an oops. The underlying problem is that the state of the devices during a kdump boot is not predictable because no device shutdown is performed in the crashed kernel (it cannot be trusted), and the firmware stage of the standard boot process is skipped (the dump capture kernel is a soft-booted kernel after all). In other words, the inherent assumption that the firmware (known as the BIOS on some systems) is always there to do the dirty work is not valid anymore. The Linux Kernel in general and drivers need to be improved so that they are able to boot in potentially unreliably environments, which with the advent of soft-reboot mechanisms such as kexec is likely to become a common scenario. But this is bound to be a painstaking and never-ending task, which requires the creation of a white-list that is updated as bugs are fixed and new hardware appears. This paper discusses possible ways of fixing the aforementioned reliability problems and an automated testing method that can be used to create a white list for hardware that works with kdump. Submitted paperPaper (PDF) and Paper (tgz) . |
| G O L D S P O N S O R | S I L V E R S P O N S O R |
|---|---|
Intel |
|
| S P O N S O R S | ||||
|---|---|---|---|---|
Bytemark |
Sun |
Novell |
The Positive Internet Company |
collabora |
| M E D I A S P O N S O R S | ||
|---|---|---|
Linux User & Developer |
Linux Magazine |
The USENIX Association |
| For more information please contact UKUUG | Problems? e-mail webmaster |
| © Copyright 2007 UKUUG Ltd | |