FreeBSD performance tuning. Sysctls, loader.conf, kernel

Posted by SaveTheRbtz on Server Fault See other posts from Server Fault or by SaveTheRbtz
Published on 2009-09-10T23:52:13Z Indexed on 2011/11/16 10:00 UTC
Read the original article Hit count: 4342

I wanted to share knowledge of tuning FreeBSD via sysctl.conf/loader.conf/KENCONF. It was initially based on Igor Sysoev's (author of nginx) presentation about FreeBSD tuning up to 100,000-200,000 active connections.

Tunings are for FreeBSD-CURRENT. Since 7.2 amd64 some of them are tuned well by default. Prior 7.0 some of them are boot only (set via /boot/loader.conf) or does not exist at all.

sysctl.conf:

# No zero mapping feature
# May break wine
# (There are also reports about broken samba3)
#security.bsd.map_at_zero=0

# If you have really busy webserver with apache13 you may run out of processes
#kern.maxproc=10000
# Same for servers with apache2 / Pound
#kern.threads.max_threads_per_proc=4096

# Max. backlog size
kern.ipc.somaxconn=4096

# Shared memory // 7.2+ can use shared memory > 2Gb
kern.ipc.shmmax=2147483648

# Sockets
kern.ipc.maxsockets=204800

# Can cause this on older kernels:
#    http://old.nabble.com/Significant-performance-regression-for-increased-maxsockbuf-on-8.0-RELEASE-tt26745981.html#a26745981 )
kern.ipc.maxsockbuf=10485760

# Mbuf 2k clusters (on amd64 7.2+ 25600 is default)
# For such high value vm.kmem_size must be increased to 3G
kern.ipc.nmbclusters=262144

# Jumbo pagesize(_SC_PAGESIZE) clusters
# Used as general packet storage for jumbo frames
# can be monitored via `netstat -m`
#kern.ipc.nmbjumbop=262144

# Jumbo 9k/16k clusters
# If you are using them
#kern.ipc.nmbjumbo9=65536
#kern.ipc.nmbjumbo16=32768

# For lower latency you can decrease scheduler's maximum time slice
# default: stathz/10 (~ 13)
#kern.sched.slice=1

# Increase max command-line length showed in `ps` (e.g for Tomcat/Java)
# Default is PAGE_SIZE / 16 or 256 on x86
# This avoids commands to be presented as [executable] in `ps`
# For more info see: http://www.freebsd.org/cgi/query-pr.cgi?pr=120749
kern.ps_arg_cache_limit=4096

# Every socket is a file, so increase them
kern.maxfiles=204800
kern.maxfilesperproc=200000
kern.maxvnodes=200000

# On some systems HPET is almost 2 times faster than default ACPI-fast
# Useful on systems with lots of clock_gettime / gettimeofday calls
# See http://old.nabble.com/ACPI-fast-default-timecounter,-but-HPET-83--faster-td23248172.html
# After revision 222222 HPET became default: http://svnweb.freebsd.org/base?view=revision&revision=222222
kern.timecounter.hardware=HPET


# Small receive space, only usable on http-server, on file server this 
# should be increased to 65535 or even more
#net.inet.tcp.recvspace=8192

# This is useful on Fat-Long-Pipes
#net.inet.tcp.recvbuf_max=10485760
#net.inet.tcp.recvbuf_inc=65535

# Small send space is useful for http servers that serve small files 
# Autotuned since 7.x
net.inet.tcp.sendspace=16384

# This is useful on Fat-Long-Pipes
#net.inet.tcp.sendbuf_max=10485760
#net.inet.tcp.sendbuf_inc=65535

# Turn off receive autotuning
# You can play with it.
#net.inet.tcp.recvbuf_auto=0
#net.inet.tcp.sendbuf_auto=0

# This should be enabled if you going to use big spaces (>64k)
# Also timestamp field is useful when using syncookies
net.inet.tcp.rfc1323=1
# Turn this off on high-speed, lossless connections (LAN 1Gbit+)
# If you set it there is no need in TCP_NODELAY sockopt (see man tcp)
net.inet.tcp.delayed_ack=0

# This feature is useful if you are serving data over modems, Gigabit Ethernet, 
# or even high speed WAN links (or any other link with a high bandwidth delay product), 
# especially if you are also using window scaling or have configured a large send window.
# Automatically disables on small RTT ( http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet/tcp_subr.c?#rev1.237 )
# This sysctl was removed in 10-CURRENT:
# See: http://www.mail-archive.com/[email protected]/msg06178.html
#net.inet.tcp.inflight.enable=0

# TCP slowstart algorithm tunings
# We assuming we have very fast clients
#net.inet.tcp.slowstart_flightsize=100
#net.inet.tcp.local_slowstart_flightsize=100


# Disable randomizing of ports to avoid false RST
# Before usage check SA here www.bsdcan.org/2006/papers/ImprovingTCPIP.pdf
# (it's also says that port randomization auto-disables at some conn.rates, but I didn't checked it thou)
#net.inet.ip.portrange.randomized=0

# Increase portrange
# For outgoing connections only. Good for seed-boxes and ftp servers.
net.inet.ip.portrange.first=1024
net.inet.ip.portrange.last=65535

#
# stops route cache degregation during a high-bandwidth flood
# http://www.freebsd.org/doc/en/books/handbook/securing-freebsd.html
#net.inet.ip.rtexpire=2
net.inet.ip.rtminexpire=2
net.inet.ip.rtmaxcache=1024

# Security
net.inet.ip.redirect=0
net.inet.ip.sourceroute=0
net.inet.ip.accept_sourceroute=0
net.inet.icmp.maskrepl=0
net.inet.icmp.log_redirect=0
net.inet.icmp.drop_redirect=1
net.inet.tcp.drop_synfin=1
# 
# There is also good example of sysctl.conf with comments:
# http://www.thern.org/projects/sysctl.conf
#
# icmp may NOT rst, helpful for those pesky spoofed 
# icmp/udp floods that end up taking up your outgoing
# bandwidth/ifqueue due to all that outgoing RST traffic.
#
#net.inet.tcp.icmp_may_rst=0

# Security
net.inet.udp.blackhole=1
net.inet.tcp.blackhole=2

# IPv6 Security
# For more info see http://www.fosslc.org/drupal/content/security-implications-ipv6
# Disable Node info replies
# To see this vulnerability in action run `ping6 -a sglAac ::1` or `ping6 -w ::1` on unprotected node
net.inet6.icmp6.nodeinfo=0
# Turn on IPv6 privacy extensions
# For more info see proposal http://unix.derkeiler.com/Mailing-Lists/FreeBSD/net/2008-06/msg00103.html
net.inet6.ip6.use_tempaddr=1
net.inet6.ip6.prefer_tempaddr=1
# Disable ICMP redirect
net.inet6.icmp6.rediraccept=0
# Disable acceptation of RA and auto linklocal generation if you don't use them
#net.inet6.ip6.accept_rtadv=0
#net.inet6.ip6.auto_linklocal=0

# Increases default TTL, sometimes useful
# Default is 64
net.inet.ip.ttl=128

# Lessen max segment life to conserve resources
# ACK waiting time in miliseconds
# (default: 30000. RFC from 1979 recommends 120000)
net.inet.tcp.msl=5000

# Max bumber of timewait sockets
net.inet.tcp.maxtcptw=200000
# Don't use tw on local connections
# As of 15 Apr 2009. Igor Sysoev says that nolocaltimewait has some buggy realization.
# So disable it or now till get fixed
#net.inet.tcp.nolocaltimewait=1

# FIN_WAIT_2 state fast recycle
net.inet.tcp.fast_finwait2_recycle=1

# Time before tcp keepalive probe is sent
# default is 2 hours (7200000)
#net.inet.tcp.keepidle=60000

# Should be increased until net.inet.ip.intr_queue_drops is zero
net.inet.ip.intr_queue_maxlen=4096

# Interrupt handling via multiple CPU, but with context switch.
# You can play with it. Default is 1;
#net.isr.direct=0

# This is for routers only
#net.inet.ip.forwarding=1
#net.inet.ip.fastforwarding=1

# This speed ups dummynet when channel isn't saturated
net.inet.ip.dummynet.io_fast=1
# Increase dummynet(4) hash
#net.inet.ip.dummynet.hash_size=2048
#net.inet.ip.dummynet.max_chain_len

# Should be increased when you have A LOT of files on server 
# (Increase until vfs.ufs.dirhash_mem becomes lower)
vfs.ufs.dirhash_maxmem=67108864

# Note from commit http://svn.freebsd.org/base/head@211031 :
# For systems with RAID volumes and/or virtualization envirnments, where
# read performance is very important, increasing this sysctl tunable to 32
# or even more will demonstratively yield additional performance benefits.
vfs.read_max=32


# Explicit Congestion Notification (see http://en.wikipedia.org/wiki/Explicit_Congestion_Notification)
net.inet.tcp.ecn.enable=1

# Flowtable - flow caching mechanism
# Useful for routers
#net.inet.flowtable.enable=1
#net.inet.flowtable.nmbflows=65535

# Extreme polling tuning
#kern.polling.burst_max=1000
#kern.polling.each_burst=1000
#kern.polling.reg_frac=100
#kern.polling.user_frac=1
#kern.polling.idle_poll=0

# IPFW dynamic rules and timeouts tuning
# Increase dyn_buckets till net.inet.ip.fw.curr_dyn_buckets is lower
net.inet.ip.fw.dyn_buckets=65536
net.inet.ip.fw.dyn_max=65536
net.inet.ip.fw.dyn_ack_lifetime=120
net.inet.ip.fw.dyn_syn_lifetime=10
net.inet.ip.fw.dyn_fin_lifetime=2
net.inet.ip.fw.dyn_short_lifetime=10
# Make packets pass firewall only once when using dummynet
# i.e. packets going thru pipe are passing out from firewall with accept
#net.inet.ip.fw.one_pass=1

# shm_use_phys Wires all shared pages, making them unswappable
# Use this to lessen Virtual Memory Manager's work when using Shared Mem.
# Useful for databases
#kern.ipc.shm_use_phys=1

# ZFS
# Enable prefetch. Useful for sequential load type i.e fileserver.
# FreeBSD sets vfs.zfs.prefetch_disable to 1 on any i386 systems and 
# on any amd64 systems with less than 4GB of avaiable memory
# For additional info check this nabble thread http://old.nabble.com/Samba-read-speed-performance-tuning-td27964534.html
#vfs.zfs.prefetch_disable=0

# On highload servers you may notice following message in dmesg:
# "Approaching the limit on PV entries, consider increasing either the
# vm.pmap.shpgperproc or the vm.pmap.pv_entry_max tunable"   
vm.pmap.shpgperproc=2048

loader.conf:

# Accept filters for data, http and DNS requests
# Useful when your software uses select() instead of kevent/kqueue or when you under DDoS
# DNS accf available on 8.0+
accf_data_load="YES" 
accf_http_load="YES"
accf_dns_load="YES"

# Async IO system calls
aio_load="YES"

# Linux specific devices in /dev
# As for 8.1 it only /dev/full 
#lindev_load="YES"

# Adds NCQ support in FreeBSD
# WARNING! all ad[0-9]+ devices will be renamed to ada[0-9]+
# 8.0+ only
#ahci_load="YES"
#siis_load="YES"

# FreeBSD 8.2+
# New Congestion Control for FreeBSD
# http://caia.swin.edu.au/urp/newtcp/tools/cc_chd-readme-0.1.txt
# http://www.ietf.org/proceedings/78/slides/iccrg-5.pdf
# Initial merge commit message http://www.mail-archive.com/[email protected]/msg31410.html
#cc_chd_load="YES"


# Increase kernel memory size to 3G. 
#
# Use ONLY if you have KVA_PAGES in kernel configuration, and you have more than 3G RAM 
# Otherwise panic will happen on next reboot!
#
# It's required for high buffer sizes: kern.ipc.nmbjumbop, kern.ipc.nmbclusters, etc
# Useful on highload stateful firewalls, proxies or ZFS fileservers
# (FreeBSD 7.2+ amd64 users: Check that current value is lower!)
#vm.kmem_size="3G"

# If your server has lots of swap (>4Gb) you should increase following value
# according to http://lists.freebsd.org/pipermail/freebsd-hackers/2009-October/029616.html
# Otherwise you'll be getting errors
# "kernel: swap zone exhausted, increase kern.maxswzone"
# kern.maxswzone="256M" 

# Older versions of FreeBSD can't tune maxfiles on the fly
#kern.maxfiles="200000"

# Useful for databases 
# Sets maximum data size to 1G
# (FreeBSD 7.2+ amd64 users: Check that current value is lower!)
#kern.maxdsiz="1G"

# Maximum buffer size(vfs.maxbufspace)
# You can check current one via vfs.bufspace
# Should be lowered/upped depending on server's load-type
# Usually decreased to preserve kmem
# (default is 10% of mem)
#kern.maxbcache="512M"

# Sendfile buffers
# For i386 only
#kern.ipc.nsfbufs=10240

# FreeBSD 9+
# HPET "legacy route" support. It should allow HPET to work per-CPU
# See http://www.mail-archive.com/[email protected]/msg03603.html
#hint.atrtc.0.clock=0
#hint.attimer.0.clock=0
#hint.hpet.0.legacy_route=1 

# syncache Hash table tuning
net.inet.tcp.syncache.hashsize=1024
net.inet.tcp.syncache.bucketlimit=512
net.inet.tcp.syncache.cachelimit=65536

# Increased hostcache
# Later host cache can be viewed via net.inet.tcp.hostcache.list hidden sysctl
# Very useful for it's RTT RTTVAR
# Must be power of two
net.inet.tcp.hostcache.hashsize=65536
# hashsize * bucketlimit (which is 30 by default)
# It allocates 255Mb (1966080*136) of RAM
net.inet.tcp.hostcache.cachelimit=1966080

# TCP control-block Hash table tuning
net.inet.tcp.tcbhashsize=4096

# Disable ipfw deny all
# Should be uncommented when there is a chance that
# kernel and ipfw binary may be out-of sync on next reboot
#net.inet.ip.fw.default_to_accept=1

#
# SIFTR (Statistical Information For TCP Research) is a kernel module that
# logs a range of statistics on active TCP connections to a log file.
# See prerelease notes http://groups.google.com/group/mailing.freebsd.current/browse_thread/thread/b4c18be6cdce76e4
# and man 4 sitfr
#siftr_load="YES"

# Enable superpages, for 7.2+ only
# Also read http://lists.freebsd.org/pipermail/freebsd-hackers/2009-November/030094.html
vm.pmap.pg_ps_enabled=1

# Usefull if you are using Intel-Gigabit NIC
#hw.em.rxd=4096
#hw.em.txd=4096
#hw.em.rx_process_limit="-1"
# Also if you have ALOT interrupts on NIC - play with following parameters
# NOTE: You should set them for every NIC
#dev.em.0.rx_int_delay: 250
#dev.em.0.tx_int_delay: 250
#dev.em.0.rx_abs_int_delay: 250
#dev.em.0.tx_abs_int_delay: 250
# There is also multithreaded version of em/igb drivers can be found here:
# http://people.yandex-team.ru/~wawa/
#
# for additional em monitoring and statistics use 
# sysctl dev.em.0.stats=1 ; dmesg
# sysctl dev.em.0.debug=1 ; dmesg
# Also after r209242 (-CURRENT) there is a separate sysctl for each stat variable;   
# Same tunings for igb
#hw.igb.rxd=4096
#hw.igb.txd=4096
#hw.igb.rx_process_limit=100

# Some useful netisr tunables. See sysctl net.isr
#net.isr.maxthreads=4
#net.isr.defaultqlimit=4096
#net.isr.maxqlimit: 10240
# Bind netisr threads to CPUs
#net.isr.bindthreads=1

#
# FreeBSD 9.x+
# Increase interface send queue length
# See commit message http://svn.freebsd.org/viewvc/base?view=revision&revision=207554
#net.link.ifqmaxlen=1024

# Nicer boot logo =)
loader_logo="beastie"

And finally here is KERNCONF:

# Just some of them, see also
# cat /sys/{i386,amd64,}/conf/NOTES

# This one useful only on i386
#options         KVA_PAGES=512

# You can play with HZ in environments with high interrupt rate (default is 1000) 
# 100 is for my notebook to prolong it's battery life
#options         HZ=100
# Polling is goot on network loads with high packet rates and low-end NICs
# NB! Do not enable it if you want more than one netisr thread
#options         DEVICE_POLLING

# Eliminate datacopy on socket read-write
# To take advantage with zero copy sockets you should have an MTU >= 4k
# This req. is only for receiving data.
# Read more in man zero_copy_sockets
# Also this epic thread on kernel trap:
#    http://kerneltrap.org/node/6506
# Here Linus says that "anybody that does it that way (FreeBSD) is totally incompetent"
#options         ZERO_COPY_SOCKETS

# Support TCP sign. Used for IPSec
options         TCP_SIGNATURE
# There was stackoverflow found in KAME IPSec stack:
# See http://secunia.com/advisories/43995/
# For quick workaround you can use `ipfw add deny proto ipcomp`
options         IPSEC

# This ones can be loaded as modules. They described in loader.conf section     
#options         ACCEPT_FILTER_DATA
#options         ACCEPT_FILTER_HTTP

# Adding ipfw, also can be loaded as modules
options         IPFIREWALL
# On 8.1+ you can disable verbose to see blocked packets on ipfw0 interface.
# Also there is no point in compiling verbose into the kernel, because
# now there is net.inet.ip.fw.verbose tunable.
#options         IPFIREWALL_VERBOSE
#options         IPFIREWALL_VERBOSE_LIMIT=10
options         IPFIREWALL_FORWARD
# Adding kernel NAT
options         IPFIREWALL_NAT
options         LIBALIAS
# Traffic shaping
options         DUMMYNET          
# Divert, i.e. for userspace NAT
options         IPDIVERT

# This is for OpenBSD's pf firewall
device          pf
device          pflog
# pf's QoS - ALTQ
options         ALTQ
options         ALTQ_CBQ        # Class Bases Queuing (CBQ)
options         ALTQ_RED        # Random Early Detection (RED)
options         ALTQ_RIO        # RED In/Out
options         ALTQ_HFSC       # Hierarchical Packet Scheduler (HFSC)
options         ALTQ_PRIQ       # Priority Queuing (PRIQ)
options         ALTQ_NOPCC      # Required for SMP build

# Pretty console 
# Manual can be found here http://forums.freebsd.org/showthread.php?t=6134
#options         VESA
#options         SC_PIXEL_MODE

# Disable reboot on Ctrl Alt Del
#options         SC_DISABLE_REBOOT
# Change normal|kernel messages color
options         SC_NORM_ATTR=(FG_GREEN|BG_BLACK)
options         SC_KERNEL_CONS_ATTR=(FG_YELLOW|BG_BLACK)
# More scroll space
options         SC_HISTORY_SIZE=8192

# Adding hardware crypto device
device          crypto
device          cryptodev

# Useful network interfaces
device          vlan
device          tap                     #Virtual Ethernet driver
device          gre                     #IP over IP tunneling
device          if_bridge               #Bridge interface
device          pfsync                  #synchronization interface for PF
device          carp                    #Common Address Redundancy Protocol
device          enc                     #IPsec interface
device          lagg                    #Link aggregation interface
device          stf                     #IPv4-IPv6 port

# Also for my notebook, but may be used with Opteron
device         amdtemp
# Same for Intel processors
device         coretemp

# man 4 cpuctl
device         cpuctl                   # CPU control pseudo-device

# Support for ECMP. More than one route for destination
# Works even with default route so one can use it as LB for two ISP
# For now code is unstable and panics (panic: rtfree 2) on route deletions.
#options         RADIX_MPATH

# Multicast routing
#options         MROUTING
#options         PIM

# Debug & DTrace
options        KDB                     # Kernel debugger related code
options        KDB_TRACE               # Print a stack trace for a panic
options        KDTRACE_FRAME           # amd64-only(?)
options        KDTRACE_HOOKS           # all architectures - enable general DTrace hooks
#options        DDB
#options        DDB_CTF                 # all architectures - kernel ELF linker loads CTF data

# Adaptive spining in lockmgr (8.x+)
# See http://www.mail-archive.com/[email protected]/msg10782.html
options         ADAPTIVE_LOCKMGRS

# UTF-8 in console (8.x+) 
#options         TEKEN_UTF8

# FreeBSD 8.1+
# Deadlock resolver thread 
# For additional information see http://www.mail-archive.com/[email protected]/msg18124.html 
# (FYI: "resolution" is panic so use with caution)
#options         DEADLKRES

# Increase maximum size of Raw I/O and sendfile(2) readahead
#options MAXPHYS=(1024*1024)
#options MAXBSIZE=(1024*1024)

# For scheduler debug enable following option.
# Debug will be available via `kern.sched.stats` sysctl
# For more information see http://svnweb.freebsd.org/base/head/sys/conf/NOTES?view=markup
#options SCHED_STATS

If you are tuning network for maximum performance you may wish to play with ifconfig options like:

# You can list all capabilities via `ifconfig -m`
ifconfig [-]rxcsum [-]txcsum [-]tso [-]lro mtu

In case you've enabled DDB in kernel config, you should edit your /etc/ddb.conf and add something like this to enable automatic reboot (and textdump as bonus):

script kdb.enter.panic=textdump set; capture on; show pcpu; bt; ps; alltrace; capture off; call doadump; reset
script kdb.enter.default=textdump set; capture on; bt; ps; capture off; call doadump; reset

And do not forget to add ddb_enable="YES" to /etc/rc.conf

Since FreeBSD 9 you can select to enable/disable flowcontrol on your NIC:

# See http://en.wikipedia.org/wiki/Ethernet_flow_control and
# http://www.mail-archive.com/[email protected]/msg07927.html for additional info
ifconfig bge0 media auto mediaopt flowcontrol

PS. Also most of FreeBSD's limits can be monitored by

# vmstat -z

and

# limits  

PPS. variety of network counters can be monitored via

# netstat -s

In FreeBSD-9 netstat's -Q option appeared, try following command to display netisr stats

# netstat -Q

PPPS. also see

# man 7 tuning

PPPPS. I wanted to thank FreeBSD community, especially author of nginx - Igor Sysoev, nginx-ru@ and FreeBSD-performance@ mailing lists for providing useful information about FreeBSD tuning.

FreeBSD WIP
* Whats cooking for FreeBSD 7?
* Whats cooking for FreeBSD 8?
* Whats cooking for FreeBSD 9?

So here is the question:
What tunings are you using on yours FreeBSD servers?

You can also post your /etc/sysctl.conf, /boot/loader.conf, kernel options, etc with description of its' meaning (do not copy-paste from sysctl -d). Don't forget to specify server type (web, smb, gateway, etc)

Let's share experience!

© Server Fault or respective owner

Related posts about Performance

Related posts about freebsd