Input field separator in awk

Posted by Matthijs on Super User See other posts from Super User or by Matthijs
Published on 2012-10-05T20:43:04Z Indexed on 2012/10/05 21:41 UTC
Read the original article Hit count: 338

Filed under:

unix

|

regex

|

awk

I have many large data files. The delimiter between the fields is a semicolon. However, I have found that there are semicolons in some of the fields, so I cannot simply use the semicolon as a field separator.

The following example has 4 fields, but awk sees only 3, because the '1' in field 3 is stripped by the regex (which includes a '-' because some of the numerical data are negative):

echo '"This";"is";1;"line of; data"' | awk -F'[0-9"-];[0-9"-]' '{print "No. of fields:\t"NF; print "Field 3:\t" $3}'
No. of fields:  3
Field 3:        ;"line of; data"

Of course,

echo '"This";"is";1;"line of; data"' | awk -F';' '{print "No. of fields:\t"NF}'
No. of fields:  5

solves that problem, but counts the last field as two separate fields.

Does anyone know a solution to this?

Thanks!

Matthijs

© Super User or respective owner

Related posts about unix

polkit: disable all users except those in group wheel?

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
Is it possible to do the following using 1 polkit .pkla file? Disable all users except those in the wheel group from using polkit. The users in the wheel group will need to provide the root password when using polkit. /etc/polkit-1/localauthority/50-local.d/wheel-only.pkla [disable all users… >>> More
How to find other end of unix socket connection?

as seen on Server Fault - Search for 'Server Fault'
I have a process (dbus-daemon) which has many open connection over UNIX sockets. One of these connections is fd #36: =$ ps uw -p 23284 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND depesz 23284 0.0 0.0 24680 1772 ? Ss 15:25 0:00 /bin/dbus-daemon --fork… >>> More
Meaning of directories on Unix and Unix like systems

as seen on Server Fault - Search for 'Server Fault'
I've been using Linux for a couple of years now but I still haven't figured out what the origin or meaning of some the directory names are on Unix and Unix like systems. E.g. what does etc stand for or var? Where does the opt name come from? And while we're on the topic anyway. Can someone give a… >>> More
Easy understanding of UNIX and UNIX shell scripting!!

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi All, I am beginner to UNIX,UNIX Shell Scripting. Can you please guide me through any excellent sites for unix(which are easy to understand),some study materials(tutorials),video tutorials. Please help!! Thanks! >>> More
Is Mac OS X a licensed Unix or Unix-like clone that conforms to Unix specification?

as seen on Super User - Search for 'Super User'
Is Mac OS X developed on a licensed Unix or is it a Unix-like clone that, unlike Linux, conforms to Unix specification well enough to be registered as a Unix OS. Not until Leopard, Mac OS X did not gain the Unix certification. But in Leopard, Terminal still print: GNU bash, version 3.2.48(1)-release… >>> More

Related posts about regex

Find multiple regex in each line and skip result if one of the regex doesn't match

as seen on Stack Overflow - Search for 'Stack Overflow'
I have a list of variables: variables = ['VariableA', 'VariableB','VariableC'] which I'm going to search for, line by line ifile = open("temp.txt",'r') d = {} match = zeros(len(variables)) for line in ifile: emptyCells=0 for i in range(len(variables)): regex = r'('+variables[i]+r')[:|=|\(](-… >>> More
OWASP Regex Repository: Is this regex correct?

as seen on Stack Overflow - Search for 'Stack Overflow'
I was looking at the regular expression for validating various data types from the (OWASP Regex Repository). One of the regular expressions in there is called safetext and looks like: ^[a-zA-Z0-9\s.\-]+$ My first question is: Is this regular expression correct? complementary question If this… >>> More
Make a Perl-style regex interpreter behave like a basic or extended regex interpreter

as seen on Stack Overflow - Search for 'Stack Overflow'
I am writing a tool to help students learn regular expressions. I will probably be writing it in Java. The idea is this: the student types in a regular expression and the tool shows which parts of a text will get matched by the regex. Simple enough. But I want to support several different regex… >>> More
JS regex isn't matching, even thought it works with a regex tester

as seen on Stack Overflow - Search for 'Stack Overflow'
I'm writing a piece of client-side javascript code that takes a function and finds the derivative of it, however, the regex that's supposed to match with the power rule fails to work in the context of the javascript program, even though it sucessfully matches when it's used with an independent regex… >>> More
c# RegEx with "|"

as seen on Stack Overflow - Search for 'Stack Overflow'
I need to be able to check for a pattern with | in them. For example an expression like d*|*t should return true for a string like "dtest|test". I'm no regex hero so I just tried a couple of things, like: Regex Pattern = new Regex("s*\|*d"); //unable to build because of single backslash Regex Pattern… >>> More