A while back I was tasked with prototyping a system for transferring large amounts of data across the Internet to a wide array of nodes without making any assumptions about how they were connected. This took some research on my part as I hadn’t really designed anything network-wise which was to hold up under extreme load or service a huge amount of simultaneous connections. During the investigative period I found a couple of links which I found to be particularly well written which I would now like to share with you.
The first one is “High-Performance Server Architecture” by Jeff Darcy. This is a good introduction into the subject and mainly covers how to manage resources. It will help you avoid the most common mistakes.
After that we have “The C10K problem” by Dan Kegel. This article digs a little deeper and offers many recommendations on how to manage the problem of handling tens of thousands of requests by leveraging existing solutions present in many of the largest *NIX operating systems. This is a typical don’t reinvent the wheel scenario where the OS already has several solutions canned and ready for you as long as you know where to look.
Finally I consulted CiteSeer and found a couple of really good articles on a bit more scientific level which handed me the last pieces of the puzzle. As I can’t divulge too much about our system in particular I’m going to leave the more specific articles out of this blog post.
To top it all off I want to share this excellent but unrelated link to “Capturing that Special Moment“.
Based on the encouragement I received to my previous post I installed OpenBSD on the 250 again and this time I compiled a multi processor enabled kernel from current and it worked! So now I’m back on OpenBSD again and it feels great.
I also found that the AR5212 WiFi chipset is one of the supported chipsets in OpenBSD and as it happens I bought a D-Link DWL-G520 a couple of years ago that hasn’t been doing any good ever so I decided to install it in the 250. A huge Sun machine with a small WiFi antenna on the back looks kind of cool in my opinion. Sadly the ath is not as stable as I had hoped so it will have to be left disabled for the time being. So no replacing the Linksys just yet.
There is of course also the possibility that it is caused by a problem in -current, I’ll just have to wait and see.
MySQL seems to require a significant amount of processing power as it is a constant bottleneck when servicing pages from Wordpress. There is a very noticable latency whenever I load anything dynamic that requires data from the db whereas other pages come up instantly. I guess I’ll have to dig through the MySQL documentation on how to optimize it. Especially considering that at the time being it is very memory conservative, much more than it need be.
I installed OpenBSD 4.2 on the Sun Enterprise 250 two days ago but after having fiddled around with it a bit I realized that it didn’t come with SMP support for SPARC64. That is a huge shame because I really like OpenBSD and it felt like the perfect fit for this machine but I can’t have one CPU sitting there unutilized.
So then I went on to install Solaris 10 on it which turned out to not work at all, probably due to the Permedia Raptor (GFX-8P) not being supported. I downloaded the Solaris 9 distribution instead, thank god it hasn’t reached eol yet.. Solaris 9 worked better but after installing a bare system I realized it pretty much expects you to make a full install in order to get the management console and what not. Why do I need X11, CDE and a whole other bunch of crap just to run a web server?
After this slight disappointment I decided to give Linux a quick spin. I really don’t want to run Linux on this machine, I already have plenty of Linux boxes around but at least it comes with SMP support. Same story as Solaris 10, Linux did not agree with the Raptor and all I got was a black screen with little green men running around the screen.
As a final cause of action I tried both FreeBSD and NetBSD. Turns out NetBSD doesn’t have SMP support either and apparently it doesn’t support keyboards as well as it didn’t respond to mine at all. FreeBSD suffered from the evil Permedia curse. Now I’m back installing Solaris 9 and longing for the day when OpenBSD support SMP on SPARC64. Maybe that would be an interesting future project to take on.. *evil grin*
Everyone that knows me know about the troubles I’ve had with my HP Pavilion zv6148EA laptop and the built-in ATI XPRESS 200M GPU. The chipset has 128MB dedicated RAM but you can configure it to use up to 128MB of system RAM for a total of 256MB RAM. The first issue with this scheme is that there is a problem with either the GPU, the BIOS or the Video BIOS because the device always reports that it has 256MB RAM. The HP-branded ATI drivers that came with the Windows install seems to handle this just fine but when you use ATIs drivers on Linux (you can’t use the official ATI drivers on Windows as they refuse to install) the machine will deadlock unless you assign the GPU 128MB of system RAM so it actually totals 256MB. My assumption here being that HP modified ATIs drivers to properly detect the actual amount of RAM rather than fixing this damn bug properly. I have no use whatsoever for 256MB of video RAM so throwing away 128MB of valuable system RAM sucks big time.
I recently found this blog created by someone with the exact same problem as me which not only confirmed some of my fears but also asserts that this is not an ATI driver problem but rather a video BIOS bug.
If you want to know more about the Linux O(1) scheduler I can highly recommend reading Inside the Linux scheduler by Tim Jones. It is an excellent introduction into the world of scheduling in the Linux kernel without unnecessary filler clogging your brain.
If you are like me and think that Apple’s Exposé is the most important desktop feature since multiple desktops then you have got to install Reveal in your favorite Mozilla-based browser today! Trust me, it’s worth the effort.

I needed a simple benchmark for testing transfer speeds when reading multiple files in parallel from a single block device but none of the available once quite did what I wanted them to so of course I had to write one myself.
You will need a half-decent C++-compiler and the invaluable Boost C++ Libraries in order to compile the benchmark. Pass the files to be read as arguments to the applications, each file will be assigned its own thread of execution.
Download the benchmark source code here. Constructive criticism ranging from validity of the benchmark to coding style is always welcome!
Real-time capabilities of the Linux kernel is not in itself a new concept. Projects such as RTAI and FSMLabs’ RTLinux have existed for quite some time and they perform very well. The problem with previous methods have been that the Linux kernel has not been able to offer the deterministic latencies that most real-time systems need so most implementations have been based on a concept of abstraction. By letting interrupts be trapped by a nano kernel, instead of Linux itself, it has been possible to add the notion of deterministic real-time threads without having to perform major surgery on Linux itself. In the world of RTAI ADEOS is used for this very purpose. The problem with these approaches, at least in the free real-time implementations, was that real-time threads had to be implemented in kernel space, making it difficult to work with since it wasn’t always possible to use existing code written for user-space deployment.
Things started changing when MontaVista released the preempt patch for the 2.4-series kernel. The preemption patches for 2.4 allowed kernel threads to voluntarily yield execution or force preemption on sleep or interrupts. This was of course not the perfect implementation because true preemption should be allowed at any time except when executing sensitive regions of code which must be allowed to complete in order not to leave the system in an invalid or partially invalid state, however the 2.4 kernel was not quite ready for this yet.
Two other important changes were Ingó Molnár’s O(1)-scheduler and low-latency patches. The O(1)-scheduler is a very fast scheduler based on table lookups. It is important to have a fast scheduling algorithm when using preemption, especially when running at a high frequency like 1000Hz, since the scheduler is going to be called a lot and must not present a bottleneck. The low-latency patch modifies large, protected, loops and similar bottlenecks in the kernel so that the scheduler can intervene at a safe place if the loop runs for a long time, a sort of voluntary yield of execution if necessary.
All these concepts were merged in the 2.5 development series of Linux and the concept of preemption was further developed in order to implement true preemption instead of a semi-voluntary one. The next big hurdle to overcome was lock breaking. Critical sections in the kernel are protected by locks so that in a multiprocessor system two, or more, processors cannot access, and possibly corrupt, the same resource at once. Similar rules apply to preemption, it is not always appropriate to preempt the current kernel thread and therefore locks are also used to prevent preemption. The problem with this scheme is that if a lock is held for too long it will delay the scheduler from being executed and consequently increasing latency. Therefore the lock-breaking done by the low-latency patch was very valuable since the work done there had already resulted in tracking down many of the bottlenecks caused by holding locks for too long and finding points where these locks could safely be preempted.
All these changes meant that Linux was suddenly capable of worst-case response times of under 10ms for real-time threads, which is good, but not good enough.
When the rest of us had given up on deterministic real-time scheduling in user space Ingó Molnár returned to save the day yet another time. Ingó’s patch uses mutexes instead of spinlocks to protect critical sections. Because of that critical sections can now be preempted much like the same way a user-space thread can be preempted at any time. He has also introduced a priority inheritance scheduling algorithm to avoid potential priority inversion problems that could occur when a critical section is preempted. Another really nice feature is that IRQs are presented as schedulable entities in the system and you can modify their priority by using chrt from Robert Love’s schedutils.
These measurements were made on two similar 2.4GHz Intel Pentium 4 systems using Mark Hounschell’s rt-exec (1.0.3) test for finding the “deterministic real-time capabilities of a computer“. The realtime-preempt tests were run at both 250 and 1000Hz.
2.6.16 stock kernel with Gentoo patch set at 250Hz
┌──────────────────────────────────────────────────────────────────────────────┐ │ Run 00:10:36:237 NonHR-clk 14:40:22 Work:200 CPU:04 Avg:04 Max:18 Pg:0 │ │ DataPool:SHM Exec Heart Beat Rate:250Hz Exec Revision:1.0.3-1 │ ┌──────────────────────────────────────────────────────────────────────────────┐ │ Task Sched Cpu Intr Late -Interrupt Latencies (usec)- │ │ Taskname Type Pr P Mask Cnt Cnt Spare Current Best Worst Determ│ │ exec hrt 5 F 1 159237 0 3767 3 3 15 12│ │ task1 dth 29 F 1 39810 0 3989 15 14 82 68│ │ task2 dth 28 F 1 39809 0 3988 15 14 82 68│ │ task3 sth 25 F 1 79619 0 1987 13 13 82 69│ │ task4 sth 24 F 1 79618 0 1987 13 13 78 65│ │ task5 sem 23 F 1 159237 0 988 6 6 61 55│ │ task6 sem 22 F 1 79619 0 1988 5 5 34 29│ │ task7 sig 19 F 1 79619 0 1988 6 5 35 30│ │ task8 sig 18 F 1 79618 0 1987 6 6 38 32│ │ task9 hrt 17 R 1 159308 0 1988 1977 1566 2151 585│ │ task10 hrt 17 R 1 159301 0 1987 1976 1586 2150 564│ │ task11 hrn 14 R 1 159295 0 988 2981 2450 3167 717│ │ task12 hrn 14 R 1 159287 0 988 2981 2560 3157 597│ │ task13 hru 11 R 1 159280 0 988 2981 2338 3186 848│ │ task14 hru 11 R 1 159272 0 987 2981 2227 3187 960│ │ task15 bth 7 R 1 159237 0 988 12 11 75 64│ │ task16 bth 7 R 1 159237 0 988 36 35 203 168│ └──────────────────────────────────────────────────────────────────────────────┘
2.6.17 kernel with Gentoo patch set and realtime-preempt at 250Hz
┌──────────────────────────────────────────────────────────────────────────────┐ │ Run 00:12:56:243 Posix-hrt 15:34:51 Work:200 CPU:05 Avg:06 Max:07 Pg:0 │ │ DataPool:SHM Exec Heart Beat Rate:250Hz Exec Revision:1.0.3-1 │ ┌──────────────────────────────────────────────────────────────────────────────┐ │ Task Sched Cpu Intr Late -Interrupt Latencies (usec)- │ │ Taskname Type Pr P Mask Cnt Cnt Spare Current Best Worst Determ│ │ exec hrt 5 F 1 194243 0 3837 1 0 174 174│ │ task1 dth 29 F 1 48561 0 15993 40 18 450 432│ │ task2 dth 28 F 1 48561 0 15991 35 17 522 505│ │ task3 sth 25 F 1 97122 0 7993 24 14 300 286│ │ task4 sth 24 F 1 97121 0 7992 19 14 311 297│ │ task5 sem 23 F 1 194243 0 3994 9 3 294 291│ │ task6 sem 22 F 1 97122 0 7993 7 3 193 190│ │ task7 sig 19 F 1 97122 0 7992 14 7 227 220│ │ task8 sig 18 F 1 97121 0 7993 14 7 227 220│ │ task9 hrt 17 R 1 96035 0 7982 29 14 261 247│ │ task10 hrt 17 R 1 96032 0 7992 43 16 330 314│ │ task11 hrn 14 R 1 191591 0 3994 31 9 340 331│ │ task12 hrn 14 R 1 191582 0 3984 21 10 350 340│ │ task13 hru 11 R 1 191570 0 3993 29 9 308 299│ │ task14 hru 11 R 1 191562 0 3983 20 9 343 334│ │ task15 bth 7 R 1 194243 0 3992 31 15 386 371│ │ task16 bth 7 R 1 194243 0 3991 18 16 355 339│ └──────────────────────────────────────────────────────────────────────────────┘
2.6.17 kernel with Gentoo patch set and realtime-preempt at 1000Hz
┌──────────────────────────────────────────────────────────────────────────────┐ │ Run 00:07:43:731 Posix-hrt 14:37:05 Work:200 CPU:24 Avg:24 Max:26 Pg:0 │ │ DataPool:SHM Exec Heart Beat Rate:1000Hz Exec Revision:1.0.3-1 │ ┌──────────────────────────────────────────────────────────────────────────────┐ │ Task Sched Cpu Intr Late -Interrupt Latencies (usec)- │ │ Taskname Type Pr P Mask Cnt Cnt Spare Current Best Worst Determ│ │ exec hrt 5 F 1 463732 0 831 2 0 263 263│ │ task1 dth 29 F 1 115933 0 3992 62 19 381 362│ │ task2 dth 28 F 1 115933 0 3993 50 18 417 399│ │ task3 sth 25 F 1 231866 0 1992 28 14 330 316│ │ task4 sth 24 F 1 231866 0 1992 34 14 307 293│ │ task5 sem 23 F 1 463732 0 993 15 3 311 308│ │ task6 sem 22 F 1 231866 0 1992 10 3 243 240│ │ task7 sig 19 F 1 231866 0 1990 15 7 268 261│ │ task8 sig 18 F 1 231866 0 1990 17 7 271 264│ │ task9 hrt 17 R 1 224848 0 1989 52 13 222 209│ │ task10 hrt 17 R 1 224581 0 1991 34 15 208 193│ │ task11 hrn 14 R 1 445313 0 992 20 10 217 207│ │ task12 hrn 14 R 1 444804 0 991 23 10 306 296│ │ task13 hru 11 R 1 443863 0 993 39 11 329 318│ │ task14 hru 11 R 1 443400 0 980 23 10 252 242│ │ task15 bth 7 R 1 463732 0 992 36 16 357 341│ │ task16 bth 7 R 1 463732 0 991 22 15 309 294│ └──────────────────────────────────────────────────────────────────────────────┘
Notice how scheduling latencies in the FIFO scheduler have been sacrificed in order to improve latencies in the real-time scheduler and make them deterministic. Without realtime-preempt the difference between worst case latencies and deterministic latencies in the real-time scheduler is about 1:3 whereas with realtime-preempt it’s 1:1. Even though FIFO latencies are worse deterministic scheduling latencies overall are what makes this patch so important for a real-time system since it is now possible to predict precise behavior. This in turn means that you can deduce whether Linux is an appropriate tool for your real-time application without having to use unnecessarily overpowered hardware in order to guarantee deadlines.
Things are starting to look good in the future. The timing couldn’t be better considering several mobile phone manufacturers are evaluating Linux as a next-gen platform for their devices. Another field where Linux is seeing growth is in multimedia applications, anything ranging from portable media players to set top boxes seem to be Linux-powered these days and deterministic latencies are really important in these applications.
You can get Ingó Molnár’s realtime-preempt patch from his website at RedHat, http://people.redhat.com/mingo/realtime-preempt/. He updates it quite often so check in regularly.
Paul E. McKenney has written an excellent summary on most of the different approaches to real-time adaptations of the Linux kernel. You can find it at Kerneltrap under the title Linux: Realtime Approaches.
genkernel is a nice little Gentoo tool designed to make your everyday life with Gentoo as pleasurable as possible, especially if compiling the kernel is something that sends shivers down your spine. It works by taking the latest kernel source stored in /usr/src/linux, combine that with a specified kernel configuration file and compile a fully working kernel with the user not having to go through the tedious process of configuring Linux manually. Kernel configurations known to work are supplied with genkernel, but you can also specify your own configuration file if you wish to build a custom kernel.
So why am I blogging about genkernel when I’m not even a genkernel developer? For the simple reason that the imminent release of version 3.2.0 adds a long awaited feature, support for the Pegasos PowerPC platform!
If you are an eager beaver and want to try it out right away I suggest you edit /etc/portage/package.unmask, create it if it doesn’t already exist, and add “>sys-kernel/genkernel-3.1.9″ to unmask the 3.2.0-prereleases. Then emerge –ask –verbose ‘>=sys-kernel/genkernel-3.2.0_pre18′ in order to install the latest Pegasos-compatible genkernel. To build a kernel simply execute:
genkernel --genzimage --kernel-config=/usr/share/genkernel/ppc/Pegasos all
Whatever kernel /usr/src/linux points to will be compiled and installed into /boot as kernelz-<kernel version>, for instance, gentoo-sources-2.6.12-r4 will be called kernelz-2.6.12-gentoo-r4. In order to boot this kernel from SmartFirmware on the Pegasos, assuming /boot is on /dev/hda1 and your Gentoo root is on /dev/hda2 issue the following command:
boot hd:0 kernelz-2.6.12-gentoo-r4 root=/dev/ram0 init=/linuxrc real_root=/dev/hda2
If you have a Radeon you might want to append something like “video=radeonfb:800×600-16″ as well. Please note that genkernel will also install a yaboot-compatible kernel called kernel-genkernel-<kernel version> and an initramfs file called initramfs-genkernel-ppc-<kernel version>, you can safely remove these on the Pegasos since it does not support yaboot at this time.
I have tried genkernel with pegasos-sources-2.6.11-r5, gentoo-sources-2.6.12 and gentoo-sources-2.6.12-r4 and it produced working kernels for all of them. If however you run into any problems please report them to me.
A special thank you goes out to Tim Yamin (aka plasmaroo) for taking my genkernel-3.1.x patch and updating it to support 3.2.0 without me even asking him.
Considering what I do for a living it’s high time that I blog about something appropriate for my “programming” category. Today I came upon an appropriate subject for just that purpose, a set of power tools any C++ programmer shouldn’t live without.
Today I read this article which was linked to via OSnews which explains how to use the serialization feature of the Boost C++ libraries. For those of you who didn’t already know it, serialization is the method of converting data into a binary string for storage on a hard drive or transmission over a network medium. In the world of distributed systems the process of serialization is frequently referred to as “marshalling”. This is one of the things CORBA will do for you in order to make your life easier, if you ever choose to use it.
I have not had a chance yet to use Boost’s serialization in one of my own projects. In fact, I didn’t know about it until I read the article mentioned above, so recently I wrote my own serialization routines which were far from being as clean as these are. I have however used Spirit, which is an “object-oriented recursive-descent parser generator framework”. Spirit integrates beautifully with C++ and gives an awesome Extended Backus-Normal Form-representation using nothing but standard ANSI C++ code. You have to see it to believe it.
So what are the advantages of using Boost’s serialization over writing your own routines? There are several advantages, for instance, it has seamless support for serializing STL containers. It is also very easy to use, especially if you are a C++ novice. But the biggest advantage of using Boost, at least in my opinion, is that it really increases readability of your code. The Boost classes that I have used took extremely good advantage of the powers of C++ in order to blend into your code in ways you didn’t think possible.
My advice is that if you haven’t tried Boost already give it a go. It contains powerful tools which will make your life much easier, it is peer-reviewed and comes with plenty of unit tests to maintain high quality and it integrates with your C++ code in a way you didn’t think possible (unless you are really skilled with templates in which case I salute you).