Splitting Nucleotide Sequences in JS with Regexp

Posted by TEmerson on Stack Overflow See other posts from Stack Overflow or by TEmerson
Published on 2010-06-15T20:01:13Z Indexed on 2010/06/15 20:12 UTC
Read the original article Hit count: 150

Filed under:
|

I'm trying to split up a nucleotide sequence into amino acid strings using a regular expression. I have to start a new string at each occurrence of the string "ATG", but I don't want to actually stop the first match at the "ATG". Valid input is any ordering of a string of As, Cs, Gs, and Ts.

For example, given the input string: ATGAACATAGGACATGAGGAGTCA I should get two strings: ATGAACATAGGACATGAGGAGTCA (the whole thing) and ATGAGGAGTCA (the first match of "ATG" onward). A string that contains "ATG" n times should result in n results.

I thought the expression /(?:[ACGT]*)(ATG)[ACGT]*/g would work, but it doesn't. If this can't be done with a regexp it's easy enough to just write out the code for, but I always prefer an elegant solution if one is available.

© Stack Overflow or respective owner

Related posts about JavaScript

Related posts about regex