extract payload from tcpflow output
- by Felipe Alvarez
Tcpflow outputs a bunch of files, many of which are HTTP responses from a web server. Inside, they contain HTTP headers, including Content-type: , and other important ones. I'm trying to write a script that can extract just the payload data (i.e. image/jpeg; text/html; et al.) and save it to a file [optional: with an appropriate name and file extension].
The EOL chars are \r\n (CRLF) and so this makes it difficult to use in GNU distros (in my experiences).
I've been trying something along the lines of:
sed /HTTP/,/^$/d
To delete all text from the the beginning of HTTP (incl) to the end of \r\n\r\n (incl) but I have found no luck. I'm looking for help from anyone with good experience in sed and/or awk. I have zero experience with Perl, please I'd prefer to use common GNU command line utilities for this
Find a sample tcpflow output file here.
Thanks,
Felipe