Awk filtering values between two files when regions intersect (any solutions welcome)

Posted by user964689 on Stack Overflow See other posts from Stack Overflow or by user964689
Published on 2012-10-04T15:36:30Z Indexed on 2012/10/04 15:37 UTC
Read the original article Hit count: 344

Filed under:
|
|
|

This is building upon an earlier question Awk conditional filter one file based on another (or other solutions)

I have an awk program that outputs a column from rows in a text file 'refGene.txt if values in that row match 2 out of 3 values in another text file.

I need to include an additional criteria for finding a match between the two files. The criteria is inclusion if the range of the 2 numberical values specified in each row in file 1 overlap with the range of the two values in a row in refGene.txt. An example of a line in File 1:

chr1 10 20
chr2 10 20

and an example line in file 2(refGene.txt) of the matching columns ($3, $5, $ 6):

chr1 5 30

Currently the awk program does not treat this as a match because although the first column matches neither the 2nd or 3rd columns do no. But I would like a way to treat this as a match because the region 10-20 in file 1 is WITHIN the range of 5-30 in refGene.txt. However the second line in file 1 should NOT match because the first column does not match, which is necessary. If there is a way to include cases when any of the range in file 1 overlaps with any of the range in refGene.txt that would be really helpful. It should also replace the below conditional statements as it would also find all the cases currently described below.

Please let me know if my question is unclear. Any help is really appreciated, thanks it advance! (solutions do not have to be in awk)

Rubal

FILES=/files/*txt   
for f in $FILES ;
do

    awk '
        BEGIN {
            FS = "\t";
        }
        FILENAME == ARGV[1] {
            pair[ $1, $2, $3 ] = 1;
            next;
        }
        {
            if ( pair[ $3, $5, $6 ] == 1 ) {
                print $13;
            }
        }
    ' $(basename $f) /files/refGene.txt > /files/results/$(basename $f) ;
done

© Stack Overflow or respective owner

Related posts about text

Related posts about awk