use awk to identify multi-line record and filtering
- by nanshi
I need to process a big data file that contains multi-line records, example input:
1 Name Dan
1 Title Professor
1 Address aaa street
1 City xxx city
1 State yyy
1 Phone 123-456-7890
2 Name Luke
2 Title Professor
2 Address bbb street
2 City xxx city
3 Name Tom
3 Title Associate Professor
3 Like Golf
4 Name
4 Title Trainer
4 Likes Running
Note that the first integer field is unique and really identifies a whole record. So in the above input I really have 4 records although I dont know how many lines of attributes each records may have. I need to:
- identify valid record (must have "Name" and "Title" field)
- output the available attributes for each valid record, say "Name", "Title", "Address" are needed fields.
Example output:
1 Name Dan
1 Title Professor
1 Address aaa street
2 Name Luke
2 Title Professor
2 Address bbb street
3 Name Tom
3 Title Associate Professor
So in the output file, record 4 is removed since it doen't have the "Name" field. Record 3 doesn't have Address field but still being print to the output since it is a valid record that has "Name" and "Title".
Can I do this with awk? But how do i identify a whole record using the first "id" field on each line?
Thanks a lot to the unix shell script expert for helping me out! :)