Optimizing Haskell code

Posted by Masse on Stack Overflow See other posts from Stack Overflow or by Masse
Published on 2010-05-26T17:01:17Z Indexed on 2010/05/26 18:21 UTC
Read the original article Hit count: 434

Filed under:

I'm trying to learn Haskell and after an article in reddit about Markov text chains, I decided to implement Markov text generation first in Python and now in Haskell. However I noticed that my python implementation is way faster than the Haskell version, even Haskell is compiled to native code. I am wondering what I should do to make the Haskell code run faster and for now I believe it's so much slower because of using Data.Map instead of hashmaps, but I'm not sure

I'll post the Python code and Haskell as well. With the same data, Python takes around 3 seconds and Haskell is closer to 16 seconds.

It comes without saying that I'll take any constructive criticism :).

import random
import re
import cPickle
class Markov:
    def __init__(self, filenames):
        self.filenames = filenames
        self.cache = self.train(self.readfiles())
        picklefd = open("dump", "w")
        cPickle.dump(self.cache, picklefd)
        picklefd.close()

    def train(self, text):
        splitted = re.findall(r"(\w+|[.!?',])", text)
        print "Total of %d splitted words" % (len(splitted))
        cache = {}
        for i in xrange(len(splitted)-2):
            pair = (splitted[i], splitted[i+1])
            followup = splitted[i+2]
            if pair in cache:
                if followup not in cache[pair]:
                    cache[pair][followup] = 1
                else:
                    cache[pair][followup] += 1
            else:
                cache[pair] = {followup: 1}
        return cache

    def readfiles(self):
        data = ""
        for filename in self.filenames:
            fd = open(filename)
            data += fd.read()
            fd.close()
        return data

    def concat(self, words):
        sentence = ""
        for word in words:
            if word in "'\",?!:;.":
                sentence = sentence[0:-1] + word + " "
            else:
                sentence += word + " "
        return sentence

    def pickword(self, words):
        temp = [(k, words[k]) for k in words]
        results = []
        for (word, n) in temp:
            results.append(word)
            if n > 1:
                for i in xrange(n-1):
                    results.append(word)
        return random.choice(results)

    def gentext(self, words):
        allwords = [k for k in self.cache]
        (first, second) = random.choice(filter(lambda (a,b): a.istitle(), [k for k in self.cache]))
        sentence = [first, second]
        while len(sentence) < words or sentence[-1] is not ".":
            current = (sentence[-2], sentence[-1])
            if current in self.cache:
                followup = self.pickword(self.cache[current])
                sentence.append(followup)
            else:
                print "Wasn't able to. Breaking"
                break
        print self.concat(sentence)

Markov(["76.txt"])

--

module Markov
( train
, fox
) where

import Debug.Trace
import qualified Data.Map as M
import qualified System.Random as R
import qualified Data.ByteString.Char8 as B


type Database = M.Map (B.ByteString, B.ByteString) (M.Map B.ByteString Int)

train :: [B.ByteString] -> Database
train (x:y:[]) = M.empty
train (x:y:z:xs) = 
     let l = train (y:z:xs)
     in M.insertWith' (\new old -> M.insertWith' (+) z 1 old) (x, y) (M.singleton z 1) `seq` l

main = do
  contents <- B.readFile "76.txt"
  print $ train $ B.words contents

fox="The quick brown fox jumps over the brown fox who is slow jumps over the brown fox who is dead."

Developer IT

Optimizing Haskell code - Developer IT

Optimizing Haskell code

Performance

optimization

haskell

Related posts about Performance

Improving VPN performance - stronger encryption = more performance?

Inaccurate performance counter timer values in Windows Performance Monitor

Excel-based Performance Reviews transformed into Web Application for Performance Management

How to save a perfmon Performance Counter as a textfile (Reliability and Performance Monitor Version

SQLAuthority News – A Successful Performance Tuning Seminar at Pune – Dec 4-5, 2010

Related posts about optimization

Search Engine Optimization - The Importance of Page Optimization in Search Engine Optimization

SEO Optimization - How to Master the SEO Optimization Process in Four Easy Steps

Keywords Optimization For Website Optimization

The Expert Secret to Search Engine Optimization - Effective Website Optimization

Importance of On-Page Optimization in Search Engine Optimization (SEO)

Categories cloud