I wanted to share knowledge of tuning FreeBSD via sysctl.conf/loader.conf/KENCONF. It was initially based on Igor Sysoev's (author of nginx) presentation about FreeBSD tuning up to 100,000-200,000 active connections.
Tunings are for FreeBSD-CURRENT. Since 7.2 amd64 some of them are tuned well by default.
Prior 7.0 some of them are boot only (set via /boot/loader.conf) or does not exist at all.
sysctl.conf:
# No zero mapping feature
# May break wine
# (There are also reports about broken samba3)
#security.bsd.map_at_zero=0
# If you have really busy webserver with apache13 you may run out of processes
#kern.maxproc=10000
# Same for servers with apache2 / Pound
#kern.threads.max_threads_per_proc=4096
# Max. backlog size
kern.ipc.somaxconn=4096
# Shared memory // 7.2+ can use shared memory > 2Gb
kern.ipc.shmmax=2147483648
# Sockets
kern.ipc.maxsockets=204800
# Can cause this on older kernels:
# http://old.nabble.com/Significant-performance-regression-for-increased-maxsockbuf-on-8.0-RELEASE-tt26745981.html#a26745981 )
kern.ipc.maxsockbuf=10485760
# Mbuf 2k clusters (on amd64 7.2+ 25600 is default)
# For such high value vm.kmem_size must be increased to 3G
kern.ipc.nmbclusters=262144
# Jumbo pagesize(_SC_PAGESIZE) clusters
# Used as general packet storage for jumbo frames
# can be monitored via `netstat -m`
#kern.ipc.nmbjumbop=262144
# Jumbo 9k/16k clusters
# If you are using them
#kern.ipc.nmbjumbo9=65536
#kern.ipc.nmbjumbo16=32768
# For lower latency you can decrease scheduler's maximum time slice
# default: stathz/10 (~ 13)
#kern.sched.slice=1
# Increase max command-line length showed in `ps` (e.g for Tomcat/Java)
# Default is PAGE_SIZE / 16 or 256 on x86
# This avoids commands to be presented as [executable] in `ps`
# For more info see: http://www.freebsd.org/cgi/query-pr.cgi?pr=120749
kern.ps_arg_cache_limit=4096
# Every socket is a file, so increase them
kern.maxfiles=204800
kern.maxfilesperproc=200000
kern.maxvnodes=200000
# On some systems HPET is almost 2 times faster than default ACPI-fast
# Useful on systems with lots of clock_gettime / gettimeofday calls
# See http://old.nabble.com/ACPI-fast-default-timecounter,-but-HPET-83--faster-td23248172.html
# After revision 222222 HPET became default: http://svnweb.freebsd.org/base?view=revision&revision=222222
kern.timecounter.hardware=HPET
# Small receive space, only usable on http-server, on file server this
# should be increased to 65535 or even more
#net.inet.tcp.recvspace=8192
# This is useful on Fat-Long-Pipes
#net.inet.tcp.recvbuf_max=10485760
#net.inet.tcp.recvbuf_inc=65535
# Small send space is useful for http servers that serve small files
# Autotuned since 7.x
net.inet.tcp.sendspace=16384
# This is useful on Fat-Long-Pipes
#net.inet.tcp.sendbuf_max=10485760
#net.inet.tcp.sendbuf_inc=65535
# Turn off receive autotuning
# You can play with it.
#net.inet.tcp.recvbuf_auto=0
#net.inet.tcp.sendbuf_auto=0
# This should be enabled if you going to use big spaces (>64k)
# Also timestamp field is useful when using syncookies
net.inet.tcp.rfc1323=1
# Turn this off on high-speed, lossless connections (LAN 1Gbit+)
# If you set it there is no need in TCP_NODELAY sockopt (see man tcp)
net.inet.tcp.delayed_ack=0
# This feature is useful if you are serving data over modems, Gigabit Ethernet,
# or even high speed WAN links (or any other link with a high bandwidth delay product),
# especially if you are also using window scaling or have configured a large send window.
# Automatically disables on small RTT ( http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet/tcp_subr.c?#rev1.237 )
# This sysctl was removed in 10-CURRENT:
# See: http://www.mail-archive.com/
[email protected]/msg06178.html
#net.inet.tcp.inflight.enable=0
# TCP slowstart algorithm tunings
# We assuming we have very fast clients
#net.inet.tcp.slowstart_flightsize=100
#net.inet.tcp.local_slowstart_flightsize=100
# Disable randomizing of ports to avoid false RST
# Before usage check SA here www.bsdcan.org/2006/papers/ImprovingTCPIP.pdf
# (it's also says that port randomization auto-disables at some conn.rates, but I didn't checked it thou)
#net.inet.ip.portrange.randomized=0
# Increase portrange
# For outgoing connections only. Good for seed-boxes and ftp servers.
net.inet.ip.portrange.first=1024
net.inet.ip.portrange.last=65535
#
# stops route cache degregation during a high-bandwidth flood
# http://www.freebsd.org/doc/en/books/handbook/securing-freebsd.html
#net.inet.ip.rtexpire=2
net.inet.ip.rtminexpire=2
net.inet.ip.rtmaxcache=1024
# Security
net.inet.ip.redirect=0
net.inet.ip.sourceroute=0
net.inet.ip.accept_sourceroute=0
net.inet.icmp.maskrepl=0
net.inet.icmp.log_redirect=0
net.inet.icmp.drop_redirect=1
net.inet.tcp.drop_synfin=1
#
# There is also good example of sysctl.conf with comments:
# http://www.thern.org/projects/sysctl.conf
#
# icmp may NOT rst, helpful for those pesky spoofed
# icmp/udp floods that end up taking up your outgoing
# bandwidth/ifqueue due to all that outgoing RST traffic.
#
#net.inet.tcp.icmp_may_rst=0
# Security
net.inet.udp.blackhole=1
net.inet.tcp.blackhole=2
# IPv6 Security
# For more info see http://www.fosslc.org/drupal/content/security-implications-ipv6
# Disable Node info replies
# To see this vulnerability in action run `ping6 -a sglAac ::1` or `ping6 -w ::1` on unprotected node
net.inet6.icmp6.nodeinfo=0
# Turn on IPv6 privacy extensions
# For more info see proposal http://unix.derkeiler.com/Mailing-Lists/FreeBSD/net/2008-06/msg00103.html
net.inet6.ip6.use_tempaddr=1
net.inet6.ip6.prefer_tempaddr=1
# Disable ICMP redirect
net.inet6.icmp6.rediraccept=0
# Disable acceptation of RA and auto linklocal generation if you don't use them
#net.inet6.ip6.accept_rtadv=0
#net.inet6.ip6.auto_linklocal=0
# Increases default TTL, sometimes useful
# Default is 64
net.inet.ip.ttl=128
# Lessen max segment life to conserve resources
# ACK waiting time in miliseconds
# (default: 30000. RFC from 1979 recommends 120000)
net.inet.tcp.msl=5000
# Max bumber of timewait sockets
net.inet.tcp.maxtcptw=200000
# Don't use tw on local connections
# As of 15 Apr 2009. Igor Sysoev says that nolocaltimewait has some buggy realization.
# So disable it or now till get fixed
#net.inet.tcp.nolocaltimewait=1
# FIN_WAIT_2 state fast recycle
net.inet.tcp.fast_finwait2_recycle=1
# Time before tcp keepalive probe is sent
# default is 2 hours (7200000)
#net.inet.tcp.keepidle=60000
# Should be increased until net.inet.ip.intr_queue_drops is zero
net.inet.ip.intr_queue_maxlen=4096
# Interrupt handling via multiple CPU, but with context switch.
# You can play with it. Default is 1;
#net.isr.direct=0
# This is for routers only
#net.inet.ip.forwarding=1
#net.inet.ip.fastforwarding=1
# This speed ups dummynet when channel isn't saturated
net.inet.ip.dummynet.io_fast=1
# Increase dummynet(4) hash
#net.inet.ip.dummynet.hash_size=2048
#net.inet.ip.dummynet.max_chain_len
# Should be increased when you have A LOT of files on server
# (Increase until vfs.ufs.dirhash_mem becomes lower)
vfs.ufs.dirhash_maxmem=67108864
# Note from commit http://svn.freebsd.org/base/head@211031 :
# For systems with RAID volumes and/or virtualization envirnments, where
# read performance is very important, increasing this sysctl tunable to 32
# or even more will demonstratively yield additional performance benefits.
vfs.read_max=32
# Explicit Congestion Notification (see http://en.wikipedia.org/wiki/Explicit_Congestion_Notification)
net.inet.tcp.ecn.enable=1
# Flowtable - flow caching mechanism
# Useful for routers
#net.inet.flowtable.enable=1
#net.inet.flowtable.nmbflows=65535
# Extreme polling tuning
#kern.polling.burst_max=1000
#kern.polling.each_burst=1000
#kern.polling.reg_frac=100
#kern.polling.user_frac=1
#kern.polling.idle_poll=0
# IPFW dynamic rules and timeouts tuning
# Increase dyn_buckets till net.inet.ip.fw.curr_dyn_buckets is lower
net.inet.ip.fw.dyn_buckets=65536
net.inet.ip.fw.dyn_max=65536
net.inet.ip.fw.dyn_ack_lifetime=120
net.inet.ip.fw.dyn_syn_lifetime=10
net.inet.ip.fw.dyn_fin_lifetime=2
net.inet.ip.fw.dyn_short_lifetime=10
# Make packets pass firewall only once when using dummynet
# i.e. packets going thru pipe are passing out from firewall with accept
#net.inet.ip.fw.one_pass=1
# shm_use_phys Wires all shared pages, making them unswappable
# Use this to lessen Virtual Memory Manager's work when using Shared Mem.
# Useful for databases
#kern.ipc.shm_use_phys=1
# ZFS
# Enable prefetch. Useful for sequential load type i.e fileserver.
# FreeBSD sets vfs.zfs.prefetch_disable to 1 on any i386 systems and
# on any amd64 systems with less than 4GB of avaiable memory
# For additional info check this nabble thread http://old.nabble.com/Samba-read-speed-performance-tuning-td27964534.html
#vfs.zfs.prefetch_disable=0
# On highload servers you may notice following message in dmesg:
# "Approaching the limit on PV entries, consider increasing either the
# vm.pmap.shpgperproc or the vm.pmap.pv_entry_max tunable"
vm.pmap.shpgperproc=2048
loader.conf:
# Accept filters for data, http and DNS requests
# Useful when your software uses select() instead of kevent/kqueue or when you under DDoS
# DNS accf available on 8.0+
accf_data_load="YES"
accf_http_load="YES"
accf_dns_load="YES"
# Async IO system calls
aio_load="YES"
# Linux specific devices in /dev
# As for 8.1 it only /dev/full
#lindev_load="YES"
# Adds NCQ support in FreeBSD
# WARNING! all ad[0-9]+ devices will be renamed to ada[0-9]+
# 8.0+ only
#ahci_load="YES"
#siis_load="YES"
# FreeBSD 8.2+
# New Congestion Control for FreeBSD
# http://caia.swin.edu.au/urp/newtcp/tools/cc_chd-readme-0.1.txt
# http://www.ietf.org/proceedings/78/slides/iccrg-5.pdf
# Initial merge commit message http://www.mail-archive.com/
[email protected]/msg31410.html
#cc_chd_load="YES"
# Increase kernel memory size to 3G.
#
# Use ONLY if you have KVA_PAGES in kernel configuration, and you have more than 3G RAM
# Otherwise panic will happen on next reboot!
#
# It's required for high buffer sizes: kern.ipc.nmbjumbop, kern.ipc.nmbclusters, etc
# Useful on highload stateful firewalls, proxies or ZFS fileservers
# (FreeBSD 7.2+ amd64 users: Check that current value is lower!)
#vm.kmem_size="3G"
# If your server has lots of swap (>4Gb) you should increase following value
# according to http://lists.freebsd.org/pipermail/freebsd-hackers/2009-October/029616.html
# Otherwise you'll be getting errors
# "kernel: swap zone exhausted, increase kern.maxswzone"
# kern.maxswzone="256M"
# Older versions of FreeBSD can't tune maxfiles on the fly
#kern.maxfiles="200000"
# Useful for databases
# Sets maximum data size to 1G
# (FreeBSD 7.2+ amd64 users: Check that current value is lower!)
#kern.maxdsiz="1G"
# Maximum buffer size(vfs.maxbufspace)
# You can check current one via vfs.bufspace
# Should be lowered/upped depending on server's load-type
# Usually decreased to preserve kmem
# (default is 10% of mem)
#kern.maxbcache="512M"
# Sendfile buffers
# For i386 only
#kern.ipc.nsfbufs=10240
# FreeBSD 9+
# HPET "legacy route" support. It should allow HPET to work per-CPU
# See http://www.mail-archive.com/
[email protected]/msg03603.html
#hint.atrtc.0.clock=0
#hint.attimer.0.clock=0
#hint.hpet.0.legacy_route=1
# syncache Hash table tuning
net.inet.tcp.syncache.hashsize=1024
net.inet.tcp.syncache.bucketlimit=512
net.inet.tcp.syncache.cachelimit=65536
# Increased hostcache
# Later host cache can be viewed via net.inet.tcp.hostcache.list hidden sysctl
# Very useful for it's RTT RTTVAR
# Must be power of two
net.inet.tcp.hostcache.hashsize=65536
# hashsize * bucketlimit (which is 30 by default)
# It allocates 255Mb (1966080*136) of RAM
net.inet.tcp.hostcache.cachelimit=1966080
# TCP control-block Hash table tuning
net.inet.tcp.tcbhashsize=4096
# Disable ipfw deny all
# Should be uncommented when there is a chance that
# kernel and ipfw binary may be out-of sync on next reboot
#net.inet.ip.fw.default_to_accept=1
#
# SIFTR (Statistical Information For TCP Research) is a kernel module that
# logs a range of statistics on active TCP connections to a log file.
# See prerelease notes http://groups.google.com/group/mailing.freebsd.current/browse_thread/thread/b4c18be6cdce76e4
# and man 4 sitfr
#siftr_load="YES"
# Enable superpages, for 7.2+ only
# Also read http://lists.freebsd.org/pipermail/freebsd-hackers/2009-November/030094.html
vm.pmap.pg_ps_enabled=1
# Usefull if you are using Intel-Gigabit NIC
#hw.em.rxd=4096
#hw.em.txd=4096
#hw.em.rx_process_limit="-1"
# Also if you have ALOT interrupts on NIC - play with following parameters
# NOTE: You should set them for every NIC
#dev.em.0.rx_int_delay: 250
#dev.em.0.tx_int_delay: 250
#dev.em.0.rx_abs_int_delay: 250
#dev.em.0.tx_abs_int_delay: 250
# There is also multithreaded version of em/igb drivers can be found here:
# http://people.yandex-team.ru/~wawa/
#
# for additional em monitoring and statistics use
# sysctl dev.em.0.stats=1 ; dmesg
# sysctl dev.em.0.debug=1 ; dmesg
# Also after r209242 (-CURRENT) there is a separate sysctl for each stat variable;
# Same tunings for igb
#hw.igb.rxd=4096
#hw.igb.txd=4096
#hw.igb.rx_process_limit=100
# Some useful netisr tunables. See sysctl net.isr
#net.isr.maxthreads=4
#net.isr.defaultqlimit=4096
#net.isr.maxqlimit: 10240
# Bind netisr threads to CPUs
#net.isr.bindthreads=1
#
# FreeBSD 9.x+
# Increase interface send queue length
# See commit message http://svn.freebsd.org/viewvc/base?view=revision&revision=207554
#net.link.ifqmaxlen=1024
# Nicer boot logo =)
loader_logo="beastie"
And finally here is KERNCONF:
# Just some of them, see also
# cat /sys/{i386,amd64,}/conf/NOTES
# This one useful only on i386
#options KVA_PAGES=512
# You can play with HZ in environments with high interrupt rate (default is 1000)
# 100 is for my notebook to prolong it's battery life
#options HZ=100
# Polling is goot on network loads with high packet rates and low-end NICs
# NB! Do not enable it if you want more than one netisr thread
#options DEVICE_POLLING
# Eliminate datacopy on socket read-write
# To take advantage with zero copy sockets you should have an MTU >= 4k
# This req. is only for receiving data.
# Read more in man zero_copy_sockets
# Also this epic thread on kernel trap:
# http://kerneltrap.org/node/6506
# Here Linus says that "anybody that does it that way (FreeBSD) is totally incompetent"
#options ZERO_COPY_SOCKETS
# Support TCP sign. Used for IPSec
options TCP_SIGNATURE
# There was stackoverflow found in KAME IPSec stack:
# See http://secunia.com/advisories/43995/
# For quick workaround you can use `ipfw add deny proto ipcomp`
options IPSEC
# This ones can be loaded as modules. They described in loader.conf section
#options ACCEPT_FILTER_DATA
#options ACCEPT_FILTER_HTTP
# Adding ipfw, also can be loaded as modules
options IPFIREWALL
# On 8.1+ you can disable verbose to see blocked packets on ipfw0 interface.
# Also there is no point in compiling verbose into the kernel, because
# now there is net.inet.ip.fw.verbose tunable.
#options IPFIREWALL_VERBOSE
#options IPFIREWALL_VERBOSE_LIMIT=10
options IPFIREWALL_FORWARD
# Adding kernel NAT
options IPFIREWALL_NAT
options LIBALIAS
# Traffic shaping
options DUMMYNET
# Divert, i.e. for userspace NAT
options IPDIVERT
# This is for OpenBSD's pf firewall
device pf
device pflog
# pf's QoS - ALTQ
options ALTQ
options ALTQ_CBQ # Class Bases Queuing (CBQ)
options ALTQ_RED # Random Early Detection (RED)
options ALTQ_RIO # RED In/Out
options ALTQ_HFSC # Hierarchical Packet Scheduler (HFSC)
options ALTQ_PRIQ # Priority Queuing (PRIQ)
options ALTQ_NOPCC # Required for SMP build
# Pretty console
# Manual can be found here http://forums.freebsd.org/showthread.php?t=6134
#options VESA
#options SC_PIXEL_MODE
# Disable reboot on Ctrl Alt Del
#options SC_DISABLE_REBOOT
# Change normal|kernel messages color
options SC_NORM_ATTR=(FG_GREEN|BG_BLACK)
options SC_KERNEL_CONS_ATTR=(FG_YELLOW|BG_BLACK)
# More scroll space
options SC_HISTORY_SIZE=8192
# Adding hardware crypto device
device crypto
device cryptodev
# Useful network interfaces
device vlan
device tap #Virtual Ethernet driver
device gre #IP over IP tunneling
device if_bridge #Bridge interface
device pfsync #synchronization interface for PF
device carp #Common Address Redundancy Protocol
device enc #IPsec interface
device lagg #Link aggregation interface
device stf #IPv4-IPv6 port
# Also for my notebook, but may be used with Opteron
device amdtemp
# Same for Intel processors
device coretemp
# man 4 cpuctl
device cpuctl # CPU control pseudo-device
# Support for ECMP. More than one route for destination
# Works even with default route so one can use it as LB for two ISP
# For now code is unstable and panics (panic: rtfree 2) on route deletions.
#options RADIX_MPATH
# Multicast routing
#options MROUTING
#options PIM
# Debug & DTrace
options KDB # Kernel debugger related code
options KDB_TRACE # Print a stack trace for a panic
options KDTRACE_FRAME # amd64-only(?)
options KDTRACE_HOOKS # all architectures - enable general DTrace hooks
#options DDB
#options DDB_CTF # all architectures - kernel ELF linker loads CTF data
# Adaptive spining in lockmgr (8.x+)
# See http://www.mail-archive.com/
[email protected]/msg10782.html
options ADAPTIVE_LOCKMGRS
# UTF-8 in console (8.x+)
#options TEKEN_UTF8
# FreeBSD 8.1+
# Deadlock resolver thread
# For additional information see http://www.mail-archive.com/
[email protected]/msg18124.html
# (FYI: "resolution" is panic so use with caution)
#options DEADLKRES
# Increase maximum size of Raw I/O and sendfile(2) readahead
#options MAXPHYS=(1024*1024)
#options MAXBSIZE=(1024*1024)
# For scheduler debug enable following option.
# Debug will be available via `kern.sched.stats` sysctl
# For more information see http://svnweb.freebsd.org/base/head/sys/conf/NOTES?view=markup
#options SCHED_STATS
If you are tuning network for maximum performance you may wish to play with ifconfig options like:
# You can list all capabilities via `ifconfig -m`
ifconfig [-]rxcsum [-]txcsum [-]tso [-]lro mtu
In case you've enabled DDB in kernel config, you should edit your /etc/ddb.conf and add something like this to enable automatic reboot (and textdump as bonus):
script kdb.enter.panic=textdump set; capture on; show pcpu; bt; ps; alltrace; capture off; call doadump; reset
script kdb.enter.default=textdump set; capture on; bt; ps; capture off; call doadump; reset
And do not forget to add ddb_enable="YES" to /etc/rc.conf
Since FreeBSD 9 you can select to enable/disable flowcontrol on your NIC:
# See http://en.wikipedia.org/wiki/Ethernet_flow_control and
# http://www.mail-archive.com/
[email protected]/msg07927.html for additional info
ifconfig bge0 media auto mediaopt flowcontrol
PS. Also most of FreeBSD's limits can be monitored by
# vmstat -z
and
# limits
PPS. variety of network counters can be monitored via
# netstat -s
In FreeBSD-9 netstat's -Q option appeared, try following command to display netisr
stats
# netstat -Q
PPPS. also see
# man 7 tuning
PPPPS. I wanted to thank FreeBSD community, especially author of nginx - Igor Sysoev, nginx-ru@ and FreeBSD-performance@ mailing lists for providing useful information about FreeBSD tuning.
FreeBSD WIP
* Whats cooking for FreeBSD 7?
* Whats cooking for FreeBSD 8?
* Whats cooking for FreeBSD 9?
So here is the question:
What tunings are you using on yours FreeBSD servers?
You can also post your /etc/sysctl.conf, /boot/loader.conf, kernel options, etc with description of its' meaning (do not copy-paste from sysctl -d). Don't forget to specify server type (web, smb, gateway, etc)
Let's share experience!