Awk filtering values between two files when regions intersect (any solutions welcome)
Posted
by
user964689
on Stack Overflow
See other posts from Stack Overflow
or by user964689
Published on 2012-10-04T15:36:30Z
Indexed on
2012/10/04
15:37 UTC
Read the original article
Hit count: 341
This is building upon an earlier question Awk conditional filter one file based on another (or other solutions)
I have an awk program that outputs a column from rows in a text file 'refGene.txt if values in that row match 2 out of 3 values in another text file.
I need to include an additional criteria for finding a match between the two files. The criteria is inclusion if the range of the 2 numberical values specified in each row in file 1 overlap with the range of the two values in a row in refGene.txt. An example of a line in File 1:
chr1 10 20
chr2 10 20
and an example line in file 2(refGene.txt) of the matching columns ($3, $5, $ 6):
chr1 5 30
Currently the awk program does not treat this as a match because although the first column matches neither the 2nd or 3rd columns do no. But I would like a way to treat this as a match because the region 10-20 in file 1 is WITHIN the range of 5-30 in refGene.txt. However the second line in file 1 should NOT match because the first column does not match, which is necessary. If there is a way to include cases when any of the range in file 1 overlaps with any of the range in refGene.txt that would be really helpful. It should also replace the below conditional statements as it would also find all the cases currently described below.
Please let me know if my question is unclear. Any help is really appreciated, thanks it advance! (solutions do not have to be in awk)
Rubal
FILES=/files/*txt
for f in $FILES ;
do
awk '
BEGIN {
FS = "\t";
}
FILENAME == ARGV[1] {
pair[ $1, $2, $3 ] = 1;
next;
}
{
if ( pair[ $3, $5, $6 ] == 1 ) {
print $13;
}
}
' $(basename $f) /files/refGene.txt > /files/results/$(basename $f) ;
done
© Stack Overflow or respective owner