Splitting Nucleotide Sequences in JS with Regexp
Posted
by TEmerson
on Stack Overflow
See other posts from Stack Overflow
or by TEmerson
Published on 2010-06-15T20:01:13Z
Indexed on
2010/06/15
20:12 UTC
Read the original article
Hit count: 150
JavaScript
|regex
I'm trying to split up a nucleotide sequence into amino acid strings using a regular expression. I have to start a new string at each occurrence of the string "ATG", but I don't want to actually stop the first match at the "ATG". Valid input is any ordering of a string of As, Cs, Gs, and Ts.
For example, given the input string: ATGAACATAGGACATGAGGAGTCA I should get two strings: ATGAACATAGGACATGAGGAGTCA (the whole thing) and ATGAGGAGTCA (the first match of "ATG" onward). A string that contains "ATG" n times should result in n results.
I thought the expression /(?:[ACGT]*)(ATG)[ACGT]*/g would work, but it doesn't. If this can't be done with a regexp it's easy enough to just write out the code for, but I always prefer an elegant solution if one is available.
© Stack Overflow or respective owner