My challenge
I need to do tcpdumping of a lot of data - actually from 2 interfaces left in promiscuous mode that are able to see a lot of traffic.
To sum it up
Log all traffic in promiscuous mode from 2 interfaces
Those interfaces are not assigned an IP address
pcap files must be rotated per ~1G
When 10 TB of files are stored, start truncating the oldest
What I currently do
Right now I use tcpdump like this:
tcpdump -n -C 1000 -z /data/compress.sh -i any -w /data/livedump/capture.pcap $FILTER
The $FILTER contains src/dst filters so that I can use -i any. The reason for this is, that I have two interfaces and I would like to run the dump in a single thread rather than two.
compress.sh takes care of assigning tar to another CPU core, compress the data, give it a reasonable filename and move it to an archive location.
I cannot specify two interfaces, thus I have chosen to use filters and dump from any interface.
Right now, I do not do any housekeeping, but I plan on monitoring disk and when I have 100G left I will start wiping the oldest files - this should be fine.
And now; my problem
I see dropped packets. This is from a dump that has been running for a few hours and collected roughly 250 gigs of pcap files:
430083369 packets captured
430115470 packets received by filter
32057 packets dropped by kernel <-- This is my concern
How can I avoid so many packets being dropped?
These things I did already try or look at
Changed the value of /proc/sys/net/core/rmem_max and /proc/sys/net/core/rmem_default which did indeed help - actually it took care of just around half of the dropped packets.
I have also looked at gulp - the problem with gulp is, that it does not support multiple interfaces in one process and it gets angry if the interface does not have an IP address. Unfortunately, that is a deal breaker in my case.
Next problem is, that when the traffic flows though a pipe, I cannot get the automatic rotation going. Getting one huge 10 TB file is not very efficient and I don't have a machine with 10TB+ RAM that I can run wireshark on, so that's out.
Do you have any suggestions? Maybe even a better way of doing my traffic dump altogether.