How to print lines from a file that have repeated more than six times
- by Mike
I have a file containing the data shown below. The first comma-delimited field may be repeated any number of times, and I want to print only the lines after the sixth repetition of any value of this field
For example, there are eight fields with 1111111 as the first field, and I want to print only the seventh and eighth of these records
Input file:
1111111,aaaaaaaa,14
1111111,bbbbbbbb,14
1111111,cccccccc,14
1111111,dddddddd,14
1111111,eeeeeeee,14
1111111,ffffffff,14
1111111,gggggggg,14
1111111,hhhhhhhh,14
2222222,aaaaaaaa,14
2222222,bbbbbbbb,14
2222222,cccccccc,14
2222222,dddddddd,14
2222222,eeeeeeee,14
2222222,ffffffff,14
2222222,gggggggg,14
3333333,aaaaaaaa,14
3333333,bbbbbbbb,14
3333333,cccccccc,14
3333333,dddddddd,14
3333333,eeeeeeee,14
3333333,ffffffff,14
3333333,gggggggg,14
3333333,hhhhhhhh,14
Output:
1111111,gggggggg,14
1111111,hhhhhhhh,14
2222222,gggggggg,14
3333333,gggggggg,14
3333333,hhhhhhhh,14
What I have tried is to transponse the 2nd and 3rd fields with respect to 1st, so that I can use nawk on the field of $7 or $8
#!/usr/bin/ksh awk -F"," '{ a[$1]; b[$1]=b[$1]","$2 c[$1]=c[$1]","$3} END{ for(i in a){ print i","b[i]","c[i]} } ' file > output.txt