bytestring - Developer IT

How to convert ByteString to [Word8] in Haskell?

- by bodacydo

I am dealing with ByteStrings and at this one place I need to use [Word8] but I have a ByteString. How do I convert a ByteString to [Word8] list? I tried unpack function from ByteString library but it returned a [Char] list rather than [Word8] list. Do I need to take this step and convert it first to [Char] list and only to [Word8] list? If so, how do I do that. Please advise the most efficient method! Thank you!

Read the article

Haskell optimization of a function looking for a bytestring terminator

- by me2

Profiling of some code showed that about 65% of the time I was inside the following code. What it does is use the Data.Binary.Get monad to walk through a bytestring looking for the terminator. If it detects 0xff, it checks if the next byte is 0x00. If it is, it drops the 0x00 and continues. If it is not 0x00, then it drops both bytes and the resulting list of bytes is converted to a bytestring and returned. Any obvious ways to optimize this? I can't see it. parseECS = f [] False where f acc ff = do b <- getWord8 if ff then if b == 0x00 then f (0xff:acc) False else return $ L.pack (reverse acc) else if b == 0xff then f acc True else f (b:acc) False

Read the article

In Haskell, will calling length on a Lazy ByteString force the entire string into memory?

- by me2

I am reading a large data stream using lazy bytestrings, and want to know if at least X more bytes is available while parsing it. That is, I want to know if the bytestring is at least X bytes long. Will calling length on it result in the entire stream getting loaded, hence defeating the purpose of using the lazy bytestring? If yes, then the followup would be: How to tell if it has at least X bytes without loading the entire stream? EDIT: Originally I asked in the context of reading files but understand that there are better ways to determine filesize. Te ultimate solution I need however should not depend on the lazy bytestring source.

Read the article

Serializing Python bytestrings to JSON, preserving ordinal character values

- by Doctor J

I have some binary data produced as base-256 bytestrings in Python (2.x). I need to read these into JavaScript, preserving the ordinal value of each byte (char) in the string. If you'll allow me to mix languages, I want to encode a string s in Python such that ord(s[i]) == s.charCodeAt(i) after I've read it back into JavaScript. The cleanest way to do this seems to be to serialize my Python strings to JSON. However, json.dump doesn't like my bytestrings, despite fiddling with the ensure_ascii and encoding parameters. Is there a way to encode bytestrings to Unicode strings that preserves ordinal character values? Otherwise I think I need to encode the characters above the ASCII range into JSON-style \u1234 escapes; but a codec like this does not seem to be among Python's codecs. Is there an easy way to serialize Python bytestrings to JSON, preserving char values, or do I need to write my own encoder?

Read the article

Python and Unicode: How everything should be Unicode

- by A A

Forgive if this a long a question: I have been programming in Python for around six months. Self taught, starting with the Python tutorial and then SO and then just using Google for stuff. Here is the sad part: No one told me all strings should be Unicode. No, I am not lying or making this up, but where does the tutorial mention it? And most examples also I see just make use of byte strings, instead of Unicode strings. I was just browsing and came across this question on SO, which says how every string in Python should be a Unicode string. This pretty much made me cry! I read that every string in Python 3.0 is Unicode by default, so my questions are for 2.x: Should I do a: print u'Some text' or just print 'Text' ? Everything should be Unicode, does this mean, like say I have a tuple: t = ('First', 'Second'), it should be t = (u'First', u'Second')? I read that I can do a from __future__ import unicode_literals and then every string will be a Unicode string, but should I do this inside a container also? When reading/ writing to a file, I should use the codecs module. Right? Or should I just use the standard way or reading/ writing and encode or decode where required? If I get the string from say raw_input(), should I convert that to Unicode also? What is the common approach to handling all of the above issues in 2.x? The from __future__ import unicode_literals statement? Sorry for being a such a noob, but this changes what I have been doing for a long time and so clearly I am confused.

Read the article

How do I compare between ByteString and ByteSymbol in squeak?

- by Guy

Hi, I want to execute the following code: methodName := thisContext sender method selector. aClass selectors do: [:current | current == methodName ifTrue: aBlock]. Although the strings are equal, it never steps into the "ifTrue", I've tried converting both of them to ByteArray\String and it steel didn't work. Any ideas of how to compare them so I will get to the "ifTrue"? Thanks in advance

Read the article

What is a good method to solve cabal install problems?

- by sp3ctum

I've used the cabal package manager for Haskell programs to install libraries and new projects that I've cloned from some repositories. More often than not, I keep running into problems. Most projects make installing them seem super easy, but in my case that's not always true - sometimes they are very hard to get running. Some are so hard, in fact, that I've lost interest in the project solely because of not being able to install it. So instead of complaining, I'd like to ask what I should do to better this situation. I'd like to use my most recent problem as an example. I'm interested in trying out the Gitit project. It's a promising looking personal wiki that runs on various version control systems. So here's what I've done: Clone from Github run cabal install in the project directory like I'm told on the project install page: mika@eka:~/git/gitit$ ls BLUETRIP-LICENSE CHANGES HCAR-gitit.tex LICENSE Network README.markdown RELANN-0.6.1 Setup.lhs TANGOICONS YUI-LICENSE data expireGititCache.hs gitit.cabal gitit.hs plugins mika@eka:~/git/gitit$ cabal install Resolving dependencies... cabal: cannot configure happstack-server-7.0.7. It requires base64-bytestring ==1.0.* For the dependency on base64-bytestring ==1.0.* there are these packages: base64-bytestring-1.0.0.0. However none of them are available. base64-bytestring-1.0.0.0 was excluded because gitit-0.10 requires base64-bytestring ==0.1.* mika@eka:~/git/gitit$ So now I'm thinking: well, I'll install happstack-server on its own, maybe that will work: mika@eka:~/git/gitit$ cabal install happstack-server Resolving dependencies... Warning: happstack-server.cabal: Ignoring unknown section type: test-suite Configuring happstack-server-7.0.7... cabal: At least the following dependencies are missing: blaze-html ==0.5.*, hslogger >=1.0.2, monad-control ==0.3.*, network >=2.2.3, sendfile >=0.7.1 && <0.8, system-filepath >=0.3.1, text >=0.10 && <0.12, threads >=0.5, transformers-base ==0.4.* cabal: Error: some packages failed to install: happstack-server-7.0.7 failed during the configure step. The exception was: ExitFailure 1 So looks like there are some dependencies missing. But isn't installing these dependencies the whole point of using cabal in the first place? What should I do? File bug reports (to which project?), install the dependencies manually or something else? Bonus points for explaining what causes these kinds of problems.

Read the article

Optimizing Haskell code

- by Masse

I'm trying to learn Haskell and after an article in reddit about Markov text chains, I decided to implement Markov text generation first in Python and now in Haskell. However I noticed that my python implementation is way faster than the Haskell version, even Haskell is compiled to native code. I am wondering what I should do to make the Haskell code run faster and for now I believe it's so much slower because of using Data.Map instead of hashmaps, but I'm not sure I'll post the Python code and Haskell as well. With the same data, Python takes around 3 seconds and Haskell is closer to 16 seconds. It comes without saying that I'll take any constructive criticism :). import random import re import cPickle class Markov: def __init__(self, filenames): self.filenames = filenames self.cache = self.train(self.readfiles()) picklefd = open("dump", "w") cPickle.dump(self.cache, picklefd) picklefd.close() def train(self, text): splitted = re.findall(r"(\w+|[.!?',])", text) print "Total of %d splitted words" % (len(splitted)) cache = {} for i in xrange(len(splitted)-2): pair = (splitted[i], splitted[i+1]) followup = splitted[i+2] if pair in cache: if followup not in cache[pair]: cache[pair][followup] = 1 else: cache[pair][followup] += 1 else: cache[pair] = {followup: 1} return cache def readfiles(self): data = "" for filename in self.filenames: fd = open(filename) data += fd.read() fd.close() return data def concat(self, words): sentence = "" for word in words: if word in "'\",?!:;.": sentence = sentence[0:-1] + word + " " else: sentence += word + " " return sentence def pickword(self, words): temp = [(k, words[k]) for k in words] results = [] for (word, n) in temp: results.append(word) if n > 1: for i in xrange(n-1): results.append(word) return random.choice(results) def gentext(self, words): allwords = [k for k in self.cache] (first, second) = random.choice(filter(lambda (a,b): a.istitle(), [k for k in self.cache])) sentence = [first, second] while len(sentence) < words or sentence[-1] is not ".": current = (sentence[-2], sentence[-1]) if current in self.cache: followup = self.pickword(self.cache[current]) sentence.append(followup) else: print "Wasn't able to. Breaking" break print self.concat(sentence) Markov(["76.txt"]) -- module Markov ( train , fox ) where import Debug.Trace import qualified Data.Map as M import qualified System.Random as R import qualified Data.ByteString.Char8 as B type Database = M.Map (B.ByteString, B.ByteString) (M.Map B.ByteString Int) train :: [B.ByteString] -> Database train (x:y:[]) = M.empty train (x:y:z:xs) = let l = train (y:z:xs) in M.insertWith' (\new old -> M.insertWith' (+) z 1 old) (x, y) (M.singleton z 1) `seq` l main = do contents <- B.readFile "76.txt" print $ train $ B.words contents fox="The quick brown fox jumps over the brown fox who is slow jumps over the brown fox who is dead."

Read the article

haskell network io hgetline

- by Alex

I want to read all the data on a handle, and then block waiting for more data. listen1 stops when there is a '\n' character in the stream. listen2 works and could be made completely general by imitating the code for hGetNonBlocking. What is the best way to do this? import qualified Data.ByteString as B loop = sequence_ . repeat listen1 :: Handle -> TChan B.ByteString -> IO() listen1 sock chan = do loop ( do s <- B.hGetLine sock atomically (writeTChan chan s) ) listen2 :: Handle -> TChan B.ByteString -> IO() listen2 sock chan = do loop ( do s <- B.hGet sock 1 s1 <- B.hGetNonBlocking sock 65000 atomically (writeTChan chan (B.append s s1)) )

Read the article

Haskell Type error

- by Jon

I am getting a Couldn't match expected type error on this code and am not sure why. Would appreciate it if someone could point me in the right direction as to fixing it. import qualified Data.ByteString.Lazy as S import Data.Binary.Get import Data.Word getBinary :: Get Word16 getBinary = do a <- getWord16be "Test.class" return (a) main :: IO () main = do contents <- S.getContents print getBinary contents Specifically it cannot match expected type 'S.ByteString - IO ()' to inferred type 'IO ()'

Read the article

Haskell optimization of the following function

- by me2

Profiling of some code of mine showed that about 65% of the time I was running the following code. What it does is use the Data.Binary.Get monad to walk through a bytestring looking for the terminator. If it detects 0xff, it checks if the next byte is 0x00. If it is, it drops the 0x00 and continues. If it is not 0x00, then it drops both bytes and the resulting list of bytes is converted to a bytestring and returned. Any obvious ways to optimize this code? I can't see it. parseECS = f [] False where f acc ff = do b <- getWord8 if ff then if b == 0x00 then f (0xff:acc) False else return $ L.pack (reverse acc) else if b == 0xff then f acc True else f (b:acc) False

Read the article

Best way to convert between [Char] and [Word8]?

- by cmars232

I'm new to Haskell and I'm trying to use a pure SHA1 implementation in my app (Data.Digest.Pure.SHA) with a JSON library (AttoJSON). AttoJSON uses Data.ByteString.Char8 bytestrings, SHA uses Data.ByteString.Lazy bytestrings, and some of my string literals in my app are [Char]. This article seems to indicate this is something still being worked out in the Haskell language/Prelude: http // hackage.haskell.org/trac/haskell-prime/wiki/CharAsUnicode And this one lists a few libraries but its a couple years old: http //blog.kfish.org/2007/10/survey-haskell-unicode-support.html [Links broken because SO doesn't trust me -- whatever...] What is the current best way to convert between these types, and what are some of the tradeoffs? I don't want to pick something that is obsolete... Thanks!

Read the article

Haskell Convert Byte String To UTC Time

- by Steve

I have been trying to make a function in Haskell to take a ByteString which is a datetime and convert it to UTC time taking into account the time zone from the original encoding. I am very new to Haskell so I may be making a really basic mistake. convertStringToUtc s = do estTimeZone <- hoursToTimeZone -5 time <- read $ B.unpack(s) localTimeToUTC estTimeZone time The error I get is: Couldn't match expected type `Int -> b' against inferred type `UTCTime' In the expression: localTimeToUTC estTimeZone time In the expression: do { estTimeZone <- hoursToTimeZone - 5; time <- read $ B.unpack (s); localTimeToUTC estTimeZone time }

Read the article

ByteStrings in Haskell

- by Jon

So i am trying to write a program that can read in a java class file as bytecode. For this i am using Data.Binary and Data.ByteStream. The problem i am having is because im pretty new to Haskell i am having trouble actually using these tools. module Main where import Data.Binary.Get import Data.Word import qualified Data.ByteString.Lazy as S getBinary :: Get Word8 getBinary = do a <- getWord8 return (a) main :: IO () main = do contents <- S.getContents print (getBinary contents) This is what i have come up with so far and i fear that its not really even on the right track. Although i know this question is very general i would appreciate some help with what i should be doing with the reading.

Read the article

python appengine form-posted utf8 file issue

- by khany

hi, i am trying to form-post a sql file that consists on many INSERTS, eg. INSERT INTO `TABLE` VALUES ('abcdé', 2759); then i use re.search to parse it and extract the fields to put into my own datastore. The problem is that, although the file contains accented characters (see the e is a é), once uploaded it loses it and either errors or stores a bytestring representation of it. Heres what i am currently using (and I have tried loads of alternatives): form = cgi.FieldStorage() uFile = form['sql'] uSql = uFile.file.read() lineX = uSql.split("\n") # to get each line and so on. has anyone got a robust way of making this work? remember i am on appengine so access to some libraries is restricted/forbidden

Read the article

Haskell lazy I/O and closing files

- by Jesse

I've written a small Haskell program to print the MD5 checksums of all files in the current directory (searched recursively). Basically a Haskell version of md5deep. All is fine and dandy except if the current directory has a very large number of files, in which case I get an error like: <program>: <currentFile>: openBinaryFile: resource exhausted (Too many open files) It seems Haskell's laziness is causing it not to close files, even after its corresponding line of output has been completed. The relevant code is below. The function of interest is getList. import qualified Data.ByteString.Lazy as BS main :: IO () main = putStr . unlines =<< getList "." getList :: FilePath -> IO [String] getList p = let getFileLine path = liftM (\c -> (hex $ hash $ BS.unpack c) ++ " " ++ path) (BS.readFile path) in mapM getFileLine =<< getRecursiveContents p hex :: [Word8] -> String hex = concatMap (\x -> printf "%0.2x" (toInteger x)) getRecursiveContents :: FilePath -> IO [FilePath] -- ^ Just gets the paths to all the files in the given directory. Are there any ideas on how I could solve this problem? The entire program is available here: http://haskell.pastebin.com/PAZm0Dcb

Read the article

Should I strip the XML declaration from suds output before parsing with lxml?

- by mikl

I’m trying to implement a SOAP webservice in Python 2.6 using the suds library. That is working well, but I’ve run into a problem when trying to parse the output with lxml. Suds returns a suds.sax.text.Text object with the reply from the SOAP service. The suds.sax.text.Text class is a subclass of the Python built-in Unicode class. In essence, it would be comparable with this Python statement: u'<?xml version="1.0" encoding="utf-8" ?><root><lotsofelements \></root>' Which is incongrous, since if the XML declaration is correct, the contents are UTF-8 encoded, and thus not a Python Unicode object (because those are stored in some internal encoding like UCS4). lxml will refuse to parse this, as documented, since there is no clear answer to what encoding it should be interpreted as. As I see it, there are two ways out of this bind: Strip the <?xml> declaration, including the encoding. Convert the output from Suds into a bytestring, using the specified encoding. Currently, the data I’m receiving from the webservice is within the ASCII-range, so either way will work, but both feels very much like ugly hacks to me, and I’m not quite sure what would happen, if I start to receive data that would need a wider range of Unicode characters. Any good ideas? I can’t imagine I’m the first one in this position…

Read the article

Assigning address to array from heap

- by Schaltfehler

I want to save the state of my structs as a binary file and load them again. My structs look like this: typedef struct { uint8_t pointerLength; uint8_t *pointer; uint8_t NumBla; uinT16 Bla[MAX_NUM_Bla]; ... } BAR_STRUCT, *BAR; typedef struct { int numBar; BAR bars[MAX_NUM_BAR]; } FOO_STRUCT, *FOO; Saving is no problem, but restoring the state. Iam at the point where the bytestring from the file is on the heap and a pointer is pointing to the first adress of this string. And I do as follows: const void* dataPointer //points to adress in heap unsigned char* bytePointer = (unsigned char*)dataPointer; FOO foo = (FOO_STRUCT*)bytePointer; bytePointer += sizeof(FOO_STRUCT); for (int i=0; i < MAX_NUM_BAR; i++) { foo->bars[i] = (BAR_STRUCT*)bytePointer; } The last assignment doesn't work and I get an EXC_BAD_ACCESS. Because bars is an array of pointers i need to correct the adresses of each element is pointing to. Because they are not valid anymore. So I try to assign the adress of the object I saved in the bytesteam to foo-bars[i]; But I can not change foo-bars[i] at all. Accessing works but but assigning a new adress doesn't. I wonder why.

Search Results

Search found 18 results on 1 pages for 'bytestring'.

Page 1/1 | 1

- by bodacydo

- by me2

- by me2

- by Doctor J

- by A A

- by Guy

- by sp3ctum

- by Masse

- by Alex

- by Jon

- by me2

- by cmars232

- by Steve

- by Jon

- by khany

- by Jesse

- by mikl

- by Schaltfehler