I am testing the regex needed for creating field extraction with Splunk for nmap and think I might be close...
Example full line:
Host: 10.0.0.1 (host)   Ports: 21/open|filtered/tcp//ftp///, 22/open/tcp//ssh//OpenSSH 5.9p1 Debian 5ubuntu1 (protocol 2.0)/, 23/closed/tcp//telnet///, 80/open/tcp//http//Apache httpd 2.2.22 ((Ubuntu))/,  10000/closed/tcp//snet-sensor-mgmt///  OS: Linux 2.6.32 - 3.2  Seq Index: 257  IP ID Seq: All zeros
I've used underscore "_" as the delimiter because it makes it a little easier to read.
root@host:/# sed -n -e 's_\([0-9]\{1,5\}\/[^/]*\/[^/]*\/\/[^/]*\/\/[^/]*\/.\)_\n\1_pg' filename
The same regex with the escape characters removed:
root@host:/# sed -n -e 's_\([0-9]\{1,5\}/[^/]*/[^/]*//[^/]*//[^/]*/.\)_\n\1_pg' filename
Output:
... ... ...
Host: 10.0.0.1 (host)   Ports: 
21/open|filtered/tcp//ftp///, 
22/open/tcp//ssh//OpenSSH 2.0p1 Debian 2ubuntu1 (protocol 2.0)/, 
23/closed/tcp//telnet///, 
80/open/tcp//http//Apache httpd 5.4.32 ((Ubuntu))/, 
10000/closed/tcp//snet-sensor-mgmt///   OS: Linux 9.8.76 - 7.3  Seq Index: 257 IPID Seq: All zeros
... ... ...
As you can see, the pattern matching appears to be working - although I am unable to:
1 - match on both the end of line ( comma , and white/tabspace). The last line contains unwanted text (in this case, the OS and TCP timing info)
and
2 - remove any of the un-necessary data - i.e. print only the matching pattern. It is actually printing the whole line.
If i remove the sed -n flag, the remaining file contents are also printed. I can't seem to locate a way to only print the matched regex.
Being fairly new to sed and regex, any help or pointers is greatly appreciated!