In my October 29, 2001, Byte.com column
I described the new VM written by kernel hacker Andrea Arcangeli. In
that article, I promised I would come back to my FreeBSD versus Linux
comparison that I ran in my February 2001 column. Many people say FreeBSD
has a very good virtual memory manager. As it turns out, I pretty much
proved them right in that article.
Now, with the new VM engine in Linux as of 2.4.10, things might look
different, so I prepared a new test environment for the benchmark.
I would like to point out that my benchmarks are neither scientific nor
formal. Nor do I claim to be an expert in benchmarking. I just run
programs we all use in our daily routine repeatedly against both
operating systems, then calculate the simple average. Therefore, these
benchmarks are not to be taken as a full-out measure of things, but
rather as an indication of how we have progressed in the last eight to nine
months and to determine whether the new Linux VM performs better against FreeBSD than
Linux 2.4.0 did.
The Benchmarking Environment
This time, I organized things a bit better. After all, I have been
working on this benchmarking project ever since 2.4.10 was released, and
therefore I had enough time to get the necessary components.
The server was a ProLiant ML570 that I received from Compaq for this
benchmark. The machine came with 3 GB of RAM and two Xeon 900-MHz
processors, but for this benchmark, I reduced the memory to 512 MB of
RAM. This was necessary because I was not benchmarking the server but
the VMs of the two operating systems. Storage went through an Integrated
Ultra-2 SCSI controller onto four 36-GB platters. This machine is nicely
supported by both FreeBSD and Linux.
On the client side, I used a Mosix
cluster of four nodes, one a dual Xeon 500, and three IBM Netfinity 3500
with 733-MHz processors. All nodes have 1 GB of RAM.
Here is the output from the Mosix cluster monitor during the stress
testing (as you can see, Mosix correctly pushes more work to the SMP
machine in the cluster):
40789 |
|
|
|
|
30592 |
S |
|
P |
|
E 20394 |
|
E |
| 2
D | 2
10197 | 2 1
| 2 1
| 2 1 1
| 2 1 1
| 2 1 1 1
| 2 1 1 1
0 ------------------------------------------------
Node # 1 2 3 4
The local Mosix client cluster is on a 1-Gbit switched network and connected from
there to the server.
The networking gear was all by Linksys, which is quickly turning out
to be a very serious competitor to all other network gear producers. The
company has top-quality products and down-to-earth
prices. The switch I used for this test was the 10/100/1000 24-Port
Managed Gigaswitch. The Gigabit functions require external modules that
are very easy to install. If you want good and solid high-performance
networking from a serious contender without having to pay the high price
for Intel or Cisco gear, go with Linksys.
The NICs were a mix of Alteon and Intel Gigabit for the clients. The
server was running on an Alteon NIC, which is supported under both Linux
and FreeBSD. The Alteon card is interrupt savvy, which means it
does not unnecessarily raise interrupts when processing packets.
System Tuning
The FreeBSD OS can be made significantly faster by turning on
softupdates in the filesystem. Softupdates, implemented by Marshall
McKusick (a core FreeBSD developer), is an extension to the internal
filesystem code that keeps track of metadata operations. The softupdate
function reorders the I/O calls to the filesystem and always writes
metadata out in an order that guarantees that the filesystem is never
left in an undefined state. This is just about as fast as the async file
operation of ext2 (the Linux standard filesystem).
I also increased the maxusers value. This file sets the size of a number
of important system tables. Setting maxusers to 4 lets you have up to 84
simultaneous processes, which is hardly enough for today's busy servers.
I increased this value to 20.
Finally, I also disabled timestamping for inodes. These are the
last accessed times for each file, which are really not needed for most
operations. The same can be done with Linux, and I did so in this
benchmark and for most of my servers, too.
On the Linux side, I attached all interrupts coming from the network
adaptor to one CPU. With the new TCP/IP stack in the 2.4 kernels this
really becomes necessary. Otherwise, you might find the incoming packets
arranged out of order, because later interrupts are serviced (on another
CPU) before earlier ones, thus requiring a reordering further down the
handling layers.
Let's Stress the Systems!
Because this benchmark is about VM and the other main subsystems,
I built a series of tests that stress them. Therefore, I ran tests
against networking (Sendmail and MySQL tests), process build-up and
tear-down (the CGI tests), and against the VM (all tests combined, under
memory shortage).
I wrote a simple HTML page just displaying "hello, world" for the
static HTML benchmark. For the dynamic pages, I wrote two CGI handlers,
one in C and one in Perl. Here is the sample C CGI handler:
#include
int main(){
printf("Content-Type: text/html\n\n");
printf("Hello, world ");
}
And this is the Perl handler:
package Apache::Bench;
sub handler {
my($r) = shift;
$r->content_type('text/html');
$r->send_http_header();
$r->print('Hello, world ');
200;
}
For the MySQL part, I set up a MySQL database with 27 million addresses
generated by a simple filler Perl script before the benchmark. Then, I
repeatedly let the clients run a series of transactions against it. I
downloaded MySQL 3.23.36 (yes, I know that 4.0 is out) from
http://www.mysql.com/ and recompiled it locally under both OSs. I
configured it with the following parameters:
[mysqld]
big-tables
skip-locking
skip-name-resolve
skip-networking
set-variable = max_allowed_packet=1M
set-variable = thread_stack=128K
set-variable = back_log=256
set-variable = key_buffer=30M
set-variable = table_cache=64
set-variable = sort_buffer=5M
set-variable = record_buffer=5M
set-variable = max_connections=4000
set-variable = join_buffer=5M
skip-thread-priority
For the benchmarking, I largely used the MySQL-bundled benchmarking
suite, partially modified to take into account things like SMP.
Finally, for the mail handler, all involved clients in the LAN were
sending MIME-encoded attachments to a 1.90-KB message. This time I chose
a smaller size 17 KB instead of last time's 32 KB to stress the MTA
more than the network. The Sendmail used was the standard 8.12.1 (last time
we had 8.11.1) available from http://www.sendmail.org/, rebuilt on each
platform. No special tuning was done and no antispamming measures were
enabled. There was just one mail queue on under both OSs, and the
Sendmail-typical load-adaptive throttles were disabled to make use of
the full bandwidth and system power.
And Here Are the Results
These are the results for the web benchmarking, with both systems
running the exact same source code of Apache 2.0.18, obtained from
http://www.apache.org/ and recompiled locally on each OS. Apache was
configured with
1. # disable DNS lookups: PHP scripts only get the IP address
2. HostnameLookups off
3.
4. # disable htaccess checks
5. <Directory />
6. AllowOverride none
7. </Directory>
and then I turned on FollowSymLinks and turned off SymLinksIfOwnerMatch
to prevent additional lstat() system calls. I also increased
SendBufferSize to the size of the static web page. There is a whole lot
to the subject of tuning for performance, especially in recent Apache
versions: Go find out about tuning options at
http://httpd.apache.org/docs/misc/perf-tuning.html.
The static page in this test had a size of 1587 bytes (just to make sure that it was bigger
than a single packet). I let the test run for 4 minutes, counted the
served pages, and then divided by 240 seconds.
These results show that Linux is better at handling I/O cache than
FreeBSD, and that FreeBSD is more efficient at building up and tearing
down processes.
Under Linux, the repeated (10 runs each) and averaged results for the
same benchmark were:
Totals per operation:
Operation seconds
alter_table_add 236
alter_table_drop 146
connect 3
count 47
count_on_key 762
create+drop 4
create_index 36
insert 13
order_by 165
order_by_key 156
select_distinct 33
update_with_key 101
TOTALS 1702
And under FreeBSD:
Operation seconds
alter_table_add 263
alter_table_drop 158
connect 4
count 54
count_on_key 801
create+drop 5
create_index 29
insert 18
order_by 198
order_by_key 122
select_distinct 33
update_with_key 147
TOTALS 1826
It seems that particularly in the I/O area FreeBSD is now at a
disadvantage. This might be either in the physical I/O
handling itself or in the buffer cache area of the OS.
The Sendmail benchmarking showed results slightly more favorable to
Linux. It is important to state that no Procmail was used for the
further handling of e-mails. In order to let Sendmail wait less for I/Os,
I also deleted the fsync() system call, which forces the full writing of
each message on the filesystem. By deleting that system call from the
sources, I let Sendmail defer the actual writing of the inode of each
message to a later point in time. This is, obviously, against the RFC
and
should not be done in production-grade MTAs. Once you eliminate the
fsync() call, more RAM will nicely scale up the number of e-mails being
handled, which in turn better reflects the performance of I/O caching in
the OS.
All tests were repeated 10 times, as with the MySQL benchmarks, and the
averaged:
|
Linux |
FreeBSD |
| Incoming e-mails: |
362 mails/sec |
318 mails/sec |
| Mail relaying: |
167 mails/sec |
192 mails/sec |
| With Procmail 3.22 (just a .forward) |
172 mails/sec |
171 mails/sec |
To go that extra mile, I then ran all these tests combined. Obviously,
all values were much lower and it is not the issue here to actually
measure them. What, however, was much more interesting were values like
load level, interrupts handled per second, and context switches per
second.
For this final benchmark, I ran the Apache/MySQL/Sendmail tests at the
same time, waited about 20 minutes after starting, recorded the
results over a two-hour period, and finally calculated the average:
|
Linux |
FreeBSD |
| Average User-Land Runnable Processes |
236 |
221 |
| Average Idle Percentage |
5.3% |
1.3% |
| Average Context Switches per sec |
8021 |
9871 |
| Average Free Pages |
201 |
278 |
| Average Interrupts per sec |
6051 |
8210 |
| Average Blocks Out per sec |
2021 |
1996 |
| Average Load Level |
17.86 |
17.41 |
| Average Swapped Set Size |
312 MB |
302 MB |
Qualitative Results
In the benchmark in February, after going through hours of stress
testing, there were several instances where the Linux box with 2.4.0 would not let me log in
with a simple rlogin or rsh anymore. In those cases, I had to go through
a standard telnet session to access the server. This odd behavior was
not reproducible, but occurred always after several hours of load
testing. The FreeBSD OS never showed any odd behavior. This time around,
that problem has vanished. Both systems never crashed and always
remained responsive. Swapping, this time, was much more fluid than in
the previous test and even when running all tests combined, the
interactive behavior of both operating systems was satisfactory (though
obviously the load could be felt).
Overall, I believe we have now reached a very stable and well-performing
Linux. Needless to say, FreeBSD is as stable and fast (or faster) as
ever. The 2.4 Linux kernel has grown out of its early problems and there
is no reason anymore to hold on to the old 2.2 kernels, especially on
new servers. The TCP/IP subsystem (with zero-copy and other very
important features) takes the last out of your Linux server.
Before you fire up your e-mail program to contest the results or
suggest some neat trick to get even more out of either the Linux
benchmark server or the FreeBSD server, remember what I said at the
beginning of this review: This was not a scientific benchmark in a
professional benchmarking lab. All results are only valid within my own
environment and you are certainly bound to see a different result on
your machines. The benchmark was only about finding out how well Linux
handles stress loads compared to FreeBSD, and I do not claim that one OS
is better than the other one.
In fact, I love both.
Moshe Bar is a systems administrator and OS researcher who started learning UNIX on a PDP-11 with AT&T UNIX Release 6, back in 1981. Moshe has a M.Sc and a Ph.D. in computer science and writes UNIX-related books.
For more of Moshe's columns, visit the Serving With LinuxIndex Page.