String manipulation appears to be inefficient
Posted
by
user2964780
on Stack Overflow
See other posts from Stack Overflow
or by user2964780
Published on 2013-11-07T12:34:18Z
Indexed on
2013/11/08
3:54 UTC
Read the original article
Hit count: 99
python
I think my code is too inefficient. I'm guessing it has something to do with using strings, though I'm unsure. Here is the code:
genome = FASTAdata[1]
genomeLength = len(genome);
# Hash table holding all the k-mers we will come across
kmers = dict()
# We go through all the possible k-mers by index
for outer in range (0, genomeLength-1):
for inner in range (outer+2, outer+22):
substring = genome[outer:inner]
if substring in kmers: # if we already have this substring on record, increase its value (count of num of appearances) by 1
kmers[substring] += 1
else:
kmers[substring] = 1 # otherwise record that it's here once
This is to search through all substrings of length at most 20. Now this code seems to take pretty forever and never terminate, so something has to be wrong here. Is using [:] on strings causing the huge overhead? And if so, what can I replace it with?
And for clarity the file in question is nearly 200mb, so pretty big.
© Stack Overflow or respective owner