Imputing missing data in aligned sequences
Posted
by
Kwame Oduro
on Stack Overflow
See other posts from Stack Overflow
or by Kwame Oduro
Published on 2012-09-06T15:31:07Z
Indexed on
2012/09/06
15:38 UTC
Read the original article
Hit count: 211
perl
I want a simple perl script that can help me impute missing nucleotides in aligned sequences: As an example, my old_file contains the following aligned sequences:
seq1
ATGTC
seq2
ATGTC
seq3
ATNNC
seq4
NNGTN
seq5
CTCTN
So I now want to infer all Ns in the file and get a new file with all the Ns inferred based on the majority nucleotide at a particular position. My new_file should look like this:
seq1
ATGTC
seq2
ATGTC
seq3
ATGTC
seq4
ATGTC
seq5
CTCTC
A script with usage: "impute_missing_data.pl old_file new_file" or any other approach will be helpful to me. Thank you.
© Stack Overflow or respective owner