Contents Previous Next

Performance of FreeS/WAN

The performance of FreeS/WAN is adequate for most applications.

In normal operation, the main concern is the overhead for encryption, decryption and authentication of the actual IPsec ( ESP and/or AH) data packets. Tunnel setup and rekeying occur so much less frequently than packet processing that, in general, their overheads are not worth worrying about.

At startup, however, tunnel setup overheads may be significant. If you reboot a gateway and it needs to establish many tunnels, expect some delay. This and other issues for large gateways are discussed below.

Published material

The University of Wales at Aberystwyth has done quite detailed speed tests and put their results on the web.

Davide Cerri's thesis (in Italian) includes performance results for FreeS/WAN and for TLS. He posted an English summary on the mailing list.

Steve Bellovin used one of AT&T Research's FreeS/WAN gateways as his data source for an analysis of the cache sizes required for key swapping in IPsec. Available as text or PDF slides for a talk on the topic.

See also the NAI work mentioned in the next section.

Estimating CPU overheads

We can come up with a formula that roughly relates CPU speed to the rate of IPsec processing possible. It is far from exact, but should be usable as a first approximation.

An analysis of authentication overheads for high-speed networks, including some tests using FreeS/WAN, is on the NAI Labs site. In particular, see figure 3 in this PDF document. Their estimates of overheads, measured in Pentium II cycles per byte processed are:

IPsecauthenticationencryption cycles/byte
Linux IP stack alonenonono 5
IPsec without cryptoyesnono 11
IPsec, authentication onlyyesSHA-1no24
IPsec with encryptionyesyesyesnot tested

Overheads for IPsec with encryption were not tested in the NAI work, but Antoon Bosselaers' web page gives cost for his optimised Triple DES implementation as 928 Pentium cycles per block, or 116 per byte. Adding that to the 24 above, we get 140 cycles per byte for IPsec with encryption.

At 140 cycles per byte, a 140 MHz machine can handle a megabyte -- 8 megabits -- per second. Speeds for other machines will be proportional to this. To saturate a link with capacity C megabits per second, you need a machine running at C * 140/8 = C * 17.5 MHz.

However, that estimate is not precise. It ignores the differences between:

and does not account for some overheads you will almost certainly have:

so we suggest using C * 25 to get an estimate with a bit of a built-in safety factor.

This covers only IP and IPsec processing. If you have other loads on your gateway -- for example if it is also working as a firewall -- then you will need to add your own safety factor atop that.

This estimate matches empirical data reasonably well. For example, Metheringham's tests, described below, show a 733 topping out between 32 and 36 Mbit/second, pushing data as fast as it can down a 100 Mbit link. Our formula suggests you need at least an 800 to handle a fully loaded 32 Mbit link. The two results are consistent.

Some examples using this estimation method:

InterfaceMachine speed in MHz
TypeMbit per
second
Estimate
Mbit*25
Minimum IPSEC gatewayMinimum with other load

(e.g. firewall)

DSL125 MHz whatever you have133, or better if you have it
cable modem375 MHz
any link, light load 5125 MHz133200+, almost any surplus machine
Ethernet10250 MHz surplus 266 or 300500+
fast link, moderate load 20500 MHz500800+, any current off-the-shelf PC
T3 or E3451125 MHz 12001500+
fast Ethernet100 2500 MHz// not feasible with 3DES in software on current machines //
OC31553875 MHz

Such an estimate is far from exact, but should be usable as minimum requirement for planning. The key observations are:

Higher performance alternatives

AES is a new US government block cipher standard, designed to replace the obsolete DES. If FreeS/WAN using 3DES is not fast enough for your application, the AES patch may help.

To date (March 2002) we have had only one mailing list report of measurements with the patch applied. It indicates that, at least for the tested load on that user's network, AES roughly doubles IPsec throughput. If further testing confirms this, it may prove possible to saturate an OC3 link in software on a high-end box.

Also, some work is being done toward support of hardware IPsec acceleration which might extend the range of requirements FreeS/WAN could meet.

Other considerations

CPU speed may be the main issue for IPsec performance, but of course it isn't the only one.

You need good ethernet cards or other network interface hardware to get the best performance. See this ethernet information page and this Linux network driver page.

The current FreeS/WAN kernel code is largely single-threaded. It is SMP safe, and will run just fine on a multiprocessor machine ( discussion), but the load within the kernel is not shared effectively. This means that, for example to saturate a T3 -- which needs about a 1200 MHz machine -- you cannot expect something like a dual 800 to do the job.

On the other hand, SMP machines do tend to share loads well so -- provided one CPU is fast enough for the IPsec work -- a multiprocessor machine may be ideal for a gateway with a mixed load.

Many tunnels from a single gateway

FreeS/WAN allows a single gateway machine to build tunnels to many others. There may, however, be some problems for large numbers as indicated in this message from the mailing list:

Subject: Re: Maximum number of ipsec tunnels?
   Date: Tue, 18 Apr 2000
   From: "John S. Denker" <jsd@research.att.com>

Christopher Ferris wrote:

>> What are the maximum number ipsec tunnels FreeS/WAN can handle??

Henry Spencer wrote:

>There is no particular limit.  Some of the setup procedures currently
>scale poorly to large numbers of connections, but there are (clumsy)
>workarounds for that now, and proper fixes are coming.

1) "Large" numbers means anything over 50 or so.  I routinely run boxes
with about 200 tunnels.  Once you get more than 50 or so, you need to worry
about several scalability issues:

a) You need to put a "-" sign in syslogd.conf, and rotate the logs daily
not weekly.

b) Processor load per tunnel is small unless the tunnel is not up, in which
case a new half-key gets generated every 90 seconds, which can add up if
you've got a lot of down tunnels.

c) There's other bits of lore you need when running a large number of
tunnels.  For instance, systematically keeping the .conf file free of
conflicts requires tools that aren't shipped with the standard freeswan
package.

d) The pluto startup behavior is quadratic.  With 200 tunnels, this eats up
several minutes at every restart.   I'm told fixes are coming soon.

2) Other than item (1b), the CPU load depends mainly on the size of the
pipe attached, not on the number of tunnels.

It is worth noting that item (1b) applies only to repeated attempts to re-key a data connection (IPsec SA, Phase 2) over an established keying connection (ISAKMP SA, Phase 1). There are two ways to reduce this overhead using settings in ipsec.conf(5):

The overheads for establishing keying connections (ISAKMP SAs, Phase 1) are lower because for these Pluto does not perform expensive operations before receiving a reply from the peer.

A gateway that does a lot of rekeying -- many tunnels and/or low settings for tunnel lifetimes -- will also need a lot of random numbers from the random(4) driver.

Low-end systems

Even a 486 can handle a T1 line, according to this mailing list message:

Subject: Re: linux-ipsec: IPSec Masquerade
   Date: Fri, 15 Jan 1999 11:13:22 -0500
   From: Michael Richardson 

. . . A 486/66 has been clocked by Phil Karn to do
10Mb/s encryption.. that uses all the CPU, so half that to get some CPU,
and you have 5Mb/s. 1/3 that for 3DES and you get 1.6Mb/s....

and a piece of mail from project technical lead Henry Spencer:

Oh yes, and a new timing point for Sandy's docs...  A P60 -- yes, a 60MHz
Pentium, talk about antiques -- running a host-to-host tunnel to another
machine shows an FTP throughput (that is, end-to-end results with a real
protocol) of slightly over 5Mbit/s either way.  (The other machine is much
faster, the network is 100Mbps, and the ether cards are good ones... so
the P60 is pretty definitely the bottleneck.)

From the above, and from general user experience as reported on the list, it seems clear that a cheap surplus machine -- a reasonable 486, a minimal Pentium box, a Sparc 5, ... -- can easily handle a home office or a small company connection using any of:

If available, we suggest using a Pentium 133 or better. This should ensure that, even under maximum load, IPsec will use less than half the CPU cycles. You then have enough left for other things you may want on your gateway -- firewalling, web caching, DNS and such.

Measuring KLIPS

Here is some additional data from the mailing list.

Subject: FreeSWAN (specically KLIPS) performance measurements
   Date: Thu, 01 Feb 2001
   From: Nigel Metheringham <Nigel.Metheringham@intechnology.co.uk>

I've spent a happy morning attempting performance tests against KLIPS 
(this is due to me not being able to work out the CPU usage of KLIPS so 
resorting to the crude measurements of maximum throughput to give a 
baseline to work out loading of a box).

Measurements were done using a set of 4 boxes arranged in a line, each 
connected to the next by 100Mbit duplex ethernet.  The inner 2 had an 
ipsec tunnel between them (shared secret, but I was doing measurements 
when the tunnel was up and running - keying should not be an issue 
here).  The outer pair of boxes were traffic generators or traffic sink.

The crypt boxes are Compaq DL380s - Uniprocessor PIII/733 with 256K 
cache.  They have 128M main memory.  Nothing significant was running on 
the boxes other than freeswan.  The kernel was a 2.2.19pre7 patched 
with freeswan and ext3.

Without an ipsec tunnel in the chain (ie the 2 inner boxes just being 
100BaseT routers), throughput (measured with ttcp) was between 10644 
and 11320 KB/sec

With an ipsec tunnel in place, throughput was between 3268 and 3402 
KB/sec

These measurements are for data pushed across a TCP link, so the 
traffic on the wire between the 2 ipsec boxes would have been higher 
than this....

vmstat (run during some other tests, so not affecting those figures) on 
the encrypting box shows approx 50% system & 50% idle CPU - which I 
don't believe at all.  Interactive feel of the box was significantly 
sluggish.

I also tried running the kernel profiler (see man readprofile) during 
test runs.

A box doing primarily decrypt work showed basically nothing happening - 
I assume interrupts were off.
A box doing encrypt work showed the following:-
 Ticks Function                                   Load
 ~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~    ~~~~~~
   956 total                                      0.0010
   532 des_encrypt2                               0.1330
   110 MD5Transform                               0.0443
    97 kmalloc                                    0.1880
    39 des_encrypt3                               0.1336
    23 speedo_interrupt                           0.0298
    14 skb_copy_expand                            0.0250
    13 ipsec_tunnel_start_xmit                    0.0009
    13 Decode                                     0.1625
    11 handle_IRQ_event                           0.1019
    11 .des_ncbc_encrypt_end                      0.0229
    10 speedo_start_xmit                          0.0188
     9 satoa                                      0.0225
     8 kfree                                      0.0118
     8 ip_fragment                                0.0121
     7 ultoa                                      0.0365
     5 speedo_rx                                  0.0071
     5 .des_encrypt2_end                          5.0000
     4 _stext                                     0.0140
     4 ip_fw_check                                0.0035
     2 rj_match                                   0.0034
     2 ipfw_output_check                          0.0200
     2 inet_addr_type                             0.0156
     2 eth_copy_and_sum                           0.0139
     2 dev_get                                    0.0294
     2 addrtoa                                    0.0143
     1 speedo_tx_buffer_gc                        0.0024
     1 speedo_refill_rx_buf                       0.0022
     1 restore_all                                0.0667
     1 number                                     0.0020
     1 net_bh                                     0.0021
     1 neigh_connected_output                     0.0076
     1 MD5Final                                   0.0083
     1 kmem_cache_free                            0.0016
     1 kmem_cache_alloc                           0.0022
     1 __kfree_skb                                0.0060
     1 ipsec_rcv                                  0.0001
     1 ip_rcv                                     0.0014
     1 ip_options_fragment                        0.0071
     1 ip_local_deliver                           0.0023
     1 ipfw_forward_check                         0.0139
     1 ip_forward                                 0.0011
     1 eth_header                                 0.0040
     1 .des_encrypt3_end                          0.0833
     1 des_decrypt3                               0.0034
     1 csum_partial_copy_gener
mailing list.

Steve Bellovin used one of AT&T Research's FreeS/WAN gateways as his data source for an analysis of the cache sizes required for key swapping in IPsec. Available as text or PDF slides for a talk on the topic.

See also the NAI work mentioned in the next section.

Estimating CPU overheads

We can come up with a formula that roughly relates CPU speed to the rate of IPsec processing possible. It is far from exact, but should be usable as a first approximation.

An analysis of authentication overheads for high-speed networks, including some tests using FreeS/WAN, is on the NAI Labs site. In particular, see figure 3 in this PDF document. Their estimates of overheads, measured in Pentium II cycles per byte processed are:

IPsecauthenticationencryption cycles/byte
Linux IP stack alonenonono 5
IPsec without cryptoyesnono 11
IPsec, authentication onlyyesSHA-1no24
IPsec with encryptionyesyesyesnot tested

Overheads for IPsec with encryption were not tested in the NAI work, but Antoon Bosselaers' web page gives cost for his optimised Triple DES implementation as 928 Pentium cycles per block, or 116 per byte. Adding that to the 24 above, we get 140 cycles per byte for IPsec with encryption.

At 140 cycles per byte, a 140 MHz machine can handle a megabyte -- 8 megabits -- per second. Speeds for other machines will be proportional to this. To saturate a link with capacity C megabits per second, you need a machine running at C * 140/8 = C * 17.5 MHz.

However, that estimate is not precise. It ignores the differences between:

and does not account for some overheads you will almost certainly have:

so we suggest using C * 25 to get an estimate with a bit of a built-in safety factor.

This covers only IP and IPsec processing. If you have other loads on your gateway -- for example if it is also working as a firewall -- then you will need to add your own safety factor atop that.

This estimate matches empirical data reasonably well. For example, Metheringham's tests, described below, show a 733 topping out between 32 and 36 Mbit/second, pushing data as fast as it can down a 100 Mbit link. Our formula suggests you need at least an 800 to handle a fully loaded 32 Mbit link. The two results are consistent.

Some examples using this estimation method:

</
InterfaceMachine speed in MHz
TypeMbit per
second
Estimate
Mbit*25
Minimum IPSEC gatewayMinimum with other load

(e.g. firewall)

DSL125 MHz whatever you have133, or better if you have it
cable modem375 MHz
any link, light load 5125 MHz133200+, almost any surplus machine
Ethernet10250 MHz surplus 266 or 300500+
fast link, moderate load 20500 MHz500800+, any current off-the-shelf PC
T3 or E3451125 MHz 12001500+
fast Ethernet100 2500 MHz// not feasible with 3DES in software on current machines //