paul_venezia
Senior Contributing Editor

Linux v2.6 scales the enterprise

analysis
Jan 30, 200416 mins

Bigger, stronger kernel sizzles in our performance tests

If commercial Unix vendors weren’t already worried about Linux, they should be now. Linux has seen wide deployment in datacenters, generally as a Web server or a file server, or to handle network tasks such as DNS and DHCP, but not as a platform for running mission-critical enterprise applications. Solaris, AIX, or HP/UX typically get the nod when an application demands the highest levels of performance and scalability. The recent release of a new Linux kernel, v2.6, promises to change that.

The v2.6 kernel ushers in a new era of support for big iron with big workloads, opening the door for Linux to handle the most demanding tasks that are currently handled by Solaris, AIX, or HP/UX. The new kernel not only supports greater amounts of RAM and a higher processor count, but the core of device management has changed. Previous to this kernel there were limits within the kernel that could constrain large systems, such as a 65,536 process limit before rollover, and 256 devices per chain. The v2.6 kernel moves well beyond these limitations, and it includes support for some of the largest server architectures around.

Will the new Linux really perform in the same league as the big boys? To find out, I put the v2.6.0 kernel through several real-world performance tests, comparing its file server, database server, and Web server performance with a recent v2.4 series kernel, v2.4.23.

Linux Meets Big Iron

A primary focus of the v2.6 kernel is large server architectures. Support for up to 64GB RAM in paged mode, the ability to address file systems larger than 2TB, and support for 64 CPUs in x86-based SMP systems brings this kernel and Linux into the more rarified air of truly mission-critical systems. The included support for NUMA (Non-Uniform Memory Access) systems; a next-generation SMP architecture; and PAE (Physical Address Extensions), providing support for up to 64GB of RAM on 32-bit systems, is also new.

There is much more to v2.6 than just bigger numbers in processor and RAM counts, however. This kernel breaks apart some of the artificial limitations that have been present in Linux from the beginning, such as the number of addressable devices and total available PIDs (Processor Identifiers). The v2.4 kernel supported 255 major devices with 255 minor numbers. (For example, a volume on a SCSI disk located at /dev/sda3 has a major number of 8, since it’s a SCSI device, and a minor number of 3.) On servers with a large number of real or virtual devices, device allocation can become problematic. The v2.6 kernel addresses these issues in a big way, moving to 4,096 major devices with more than one million subdevices per major device. For most users, these numbers are well beyond practical limits, but for enterprise systems with a need to address many devices, it’s a major step.

Also new in v2.6 is NPTL (Native POSIX Threading Library) in lieu of v2.4’s LinuxThreads. NPTL brings enterprise-class threading support to Linux, far surpassing the performance offered by LinuxThreads. As of October 2003, NPTL support was merged into the GNU C library, glibc, and Red Hat first implemented NPTL within Red Hat Linux 9 using a customized v2.4 kernel.

Also introduced in the v2.6 kernel is a new approach to devices. The v2.4 kernel’s devfs-based device handler has a companion in the v2.6 kernel. The newcomer is udev and is an implementation of devfs, but in userspace. Using udev, the system is able to follow devices as they move around on connected busses, with the device identifier remaining static. For instance, the first-seen SCSI device will remain as device sda, using the serial number of the device as an identifier regardless of the order in which it’s found during a later boot. The use of udev is a significant change at the core of the kernel and the cause of some consternation among Linux kernel developers, with solid arguments provided by both sides. It looks like udev/sysfs will be the standard in the future, deprecating devfs, but both are present in the v2.6 kernel and are likely to remain for some time.

And yet another significant change to the v2.6 kernel is the merging of the uClinux project into the core kernel. The uClinux project has been focused on Linux kernel development for embedded devices. The main drive for this functionality is support of processors lacking MMUs (Memory Management Units), commonly found in microcontrollers for embedded systems such as fire alarm controllers or PDAs. The list of embedded controllers that v2.6 supports is quite long, including common processors manufactured by Hitachi, NEC, and Motorola. This definitely shows a separation from the roots of the Linux kernel, as all prior kernels were more or less subject to the limitations of the Intel x86 architecture.

Built for Speed

Prior to the release of the v2.6 kernel, Linux performed tasks on a first-come, first-served basis; interrupting the kernel midtask to handle another process or function was not in the cards. The v2.6 kernel, however, can be pre-empted when needed, and can allocate resources for a process that requires immediate attention, then resume processing on the interrupted task. These interruptions are measured in fractions of a second, and are not generally noticeable, but rather lend an overall feeling of smoothness to system performance. The v2.6 kernel does not bring Linux to the point of being an real-time operating system, but it goes a long way toward assuring that tasks are addressed and completed when required.

At the core of these enhancements is a new process scheduler. The process scheduler in the kernel divides CPU resources among system processes. The performance of the scheduler directly impacts system responsiveness and process latency. In the v2.6 kernel, the new 0(1) scheduler incorporates new algorithms that can substantially increase system performance, especially interactive tasks. The 0(1) scheduler can penalize CPU-hogging processes, improves process prioritization, and provides consistent performance across all processes. Also new in v2.6 are two new I/O schedulers. The scheduler used in the v2.6 kernel by default, the anticipatory scheduler, brings much improved handling of I/O scheduling, ensuring that processes get I/O time when necessary, without unnecessary queuing. Also present is the deadline scheduler, which assigns an expiration to requests using three queues, while anticipatory scheduler attempts to anticipate process I/O requests before they are actually requested.

There has been much debate over the scheduler used in this kernel, and there is support for both schedulers, defined at boot time with options passed to the kernel. The importance of scheduler performance cannot be overstressed. My tests show that the anticipatory scheduler in v2.6 surpasses the v2.4 scheduler handily. Some of my tests show a tenfold performance increase. For instance, a simple read of a 500MB file during a streaming write with a 1MB block size on my Xeon-based test system took 37 seconds with v2.4.23, and 3.9 seconds with v2.6. The deadline scheduler also performs quite well, but may not be as fluid for certain workloads as the anticipatory scheduler. Either way, the new process and I/O schedulers blow v2.4’s schedulers out of the water.

In addition to the new scheduler, v2.6 has plenty of other major architectural changes. The module handling code has been completely rewritten, requiring a new set of userspace module utilities and mkinitrd packages to function. These can be found as updates to most major Linux distributions or via download. The new modutils and module kernel code is much smoother than that found in v2.4, and permits a kernel to be compiled without support for module unloading to ensure the integrity of the production kernel.

Clocking the New Kernel

To test the new kernel, I opted for scenarios that would be most appropriate for real-world users. Testing individual portions of the kernel, such as disk I/O, memory management, and so on could be interesting, but what does it mean for the overall system performance? In order to get the big picture, I selected a few tests representative of expected server workloads and used them to compare the performance of the v2.6 and v2.4 kernels.

Tests were run on three separate hardware platforms: Intel Xeon (x86), Intel Itanium (IA-64), and AMD Opteron (x86_64). The x86 tests were conducted on an IBM eServer x335 1U rack-mount server with dual 3.06GHz P4 Xeon processors and 2GB of RAM. The Itanium tests were run on an IBM eServer x450 3U rack-mount server with dual 1.5GHz Itanium2 processors and 2GB of RAM. And the Opteron tests were run on a Newisys 4300 3U rack-mount server with dual 2.2GHz Opteron 848 processors and 2GB of RAM.

On the Xeon system, the v2.4 kernel pushed 38.85MBps on average, and the v2.6 kernel pushed 67.30MBps — a 73 percent improvement. The Itanium tests show similar performance differences between the kernels, giving v2.6 a 52 percent gain, albeit with smaller overall figures. And on the Opteron system, which really showed its muscle in this test, the results were a respectable 49.37MBps on the v2.4 kernel and an impressive 72.92MBps under v2.6, an increase of roughly 48 percent.

The performance gains seen in the Samba tests are likely related to the vastly improved scheduler and I/O subsystem in the v2.6 kernel. Disk I/O and network I/O form the core of this test, and the performance improvements in the v2.6 kernel are very visible here.

Across the board, the v2.6 kernel outperformed the v2.4 kernel in the database tests, especially on the Itanium box, where it posted a speed increase of 23 percent (a 519-second lead) over the v2.4 kernel. On the Xeon platform, v2.6 showed almost a 13 percent gain (a 200-second lead) over v2.4. And on Opteron, it registered a 29 percent speed increase (a 415-second lead) over v2.4. The most impressive individual test was table inserts, showing the v2.6 kernel providing a 10 percent performance increase (with a 100-second lead) over v2.4 on Xeon, with even better results found on the Opteron and Itanium platforms.

The Web server tests also showed significant improvement. The static page test used a 21.5KB HTML page with two 25KB images served by Apache 2.0.48. The test was measured in requests per second using Apache’s ab benchmarking tool. The Xeon tests show the v2.6 kernel outperforming v.2.4 by just under 1,000 requests per second, a 40 percent increase. The Itanium tests showed v2.6 providing a 47 percent performance increase, while the Opteron tests showed a 7 percent increase. It should be noted that the Opteron system outperformed the other two servers by more than 1,000 requests per second with the v2.4 kernel, and the smaller increase may be due to network bandwidth constraints imposed on the server. In retrospect, I believe that if I upped the network connectivity of the Newisys box with bonded Gigabit Ethernet NICs, I could push it even faster.

My Web application tests were conducted using a custom CGI script written in Perl, referencing a MySQL database running on the same system. The script ran a single select on a column in the database, returning 97 rows of eight columns, including one image. Again, Apache’s ab was used to measure performance. The overall numbers showed smaller performance increases than the static tests, with the exception of the Opteron tests, but the 14 percent to 22 percent performance increases across all platforms are stellar.

My tests were geared to show the performance differences between the two kernels on each hardware platform, not to compare the platforms. That said, the Opteron’s performance was outstanding; both the v2.4 and v2.6 kernels posted impressive results across all tests but most dramatically in the MySQL tests, showcasing the 64-bit support in v2.6. Overall, the v2.6 kernel shows very impressive performance gains over v2.4, itself a well-performing kernel.

While I didn’t run into many problems with the v2.6 kernel, there are a few notable issues with the initial release. For example, the drivers for LSI Logic’s Fusion-MPT RAID controllers have some serious I/O problems in a RAID1 configuration. When drives are addressed individually, there are no issues, but this is a significant hindrance to v2.6 adopters running with Fusion-MPT RAID controllers. These RAID modules are also problematic in the v2.6 kernel for Opteron, causing a panic unless iommu=merge is passed to the kernel at boot.

Further, on the Xeon platform, the v2.6 kernel compiles straight from the official source without a hitch, but not so on Itanium and Opteron. Although support for these platforms is present in the kernel, patches from specific platform development efforts are required to compile v2.6. Once built, the kernel boots normally, but requires the updated mkinitrd and modutils packages to fully function. Other than the driver-related problems, the v2.6 kernel compiled, booted, and ran without problems on all three platforms, handling with aplomb every test I threw its way.

Where From Here?

Today, the vast majority of production Linux systems run a version of the v2.4 kernel. Those satisfied with the performance and functionality of this kernel are not likely to make any sudden changes. If it ain’t broke, don’t fix it. IT shops running big databases and other mission-critical applications on v2.4 shouldn’t necessarily jump on the bandwagon immediately but should definitely begin testing v2.6. The v2.6 kernel is the new boss, and it behooves any IT department to become familiar with its capabilities and plan for adoption.

And what of the v2.4 kernel? Marcelo Tosatti, the Brazil-based maintainer of the v.2.4 kernel, has announced on the LKML (Linux Kernel Mailing List) that once v2.6 is officially released, v2.4 will indeed enter maintenance mode, without further revision or major modification following the imminent release of v2.4.25. This stance has been met with some derision within the kernel development community and also amongst major corporate Linux sponsors. At the crux of the issue are the major changes in the v2.6 kernel and the fact that many manufacturers that continue to release binary-only hardware drivers have been extremely slow to produce drivers for current v2.4 branch kernels, to say nothing of the nascent v2.6 branch.

Also at issue are the fundamental changes in the core of the v2.6 kernel. Most applications that function on v2.4 kernels will continue to do so on v2.6. However, a few of the major changes could affect currently deployed applications. For this reason, Red Hat, the dominant Linux distribution in the United States, has decided to forego official v2.6 kernel support in its recent Advanced Server and Enterprise Server products, opting to stay with its highly customized v2.4.21 derivative kernel. However, Red Hat has back-ported several key elements of the v2.6 kernel into its v2.4.21 Enterprise Linux kernel, such as support for up to 64GB of RAM, 16 CPUs, IPSec, and NPTL. In this fashion, Red Hat is able to maintain application compatibility while providing what it considers to be the most desired features of the v2.6 kernel.

When building server architectures that could make use of the enhancements of the v2.6 kernel, admins will need to configure and build custom kernels tuned to their specific workloads. The problem with distribution-specific kernels is that they tend to differ greatly from the official kernel releases, both in the default option selections and the patches they include.

On the upside, these kernels are generally very broad in their hardware support, as they are configured and built with nearly every module that could possibly be used to ensure hardware compatibility for target systems. They also tend to include patches that can either increase or decrease performance, depending on the server workload. Admins who run these servers are generally best served to patch, configure, and build a custom kernel for their servers, both to ensure hardware compatibility and to squeeze out performance increases when possible. The base distribution running the server may require some modifications to accept a v2.6 kernel, such as the addition of the new modutils and mkinitrd tools, but should otherwise function normally with a new kernel.

As with any major development effort, bugs remain in the v2.6 kernel, and are being actively pursued by the kernel developers. As of this writing, kernel v2.6.2rc1 is available for download from kernel.org, and it includes various bug fixes and enhancements over the v2.6 kernel released just a few weeks ago. The process continues; those considering a move to v2.6 would be well-advised to test the new kernel thoroughly before any production implementation.

The Linux kernel has come a long way since Linus Torvalds’ announcement of v0.1 in 1991. The v2.6 kernel boasts many new features as well as major performance improvements over the v2.4 kernel and is poised to take Linux into the next stage of the game: true enterprise adoption. To continue making inroads into the datacenter, Linux must grow with the needs of the established user base, as well as navigate previously uncharted waters to appeal to those still looking in from outside. The v2.6 kernel appears to be up to the task.