Is there any python module to convert PDF files into text? I tried one piece of code found in Activestate which uses pypdf but the text generated had no space between and was of no use.
In Python I have a list of n lists, each with a variable number of elements. How can I create a single list containing all the possible permutations:
For example
[ [ a, b, c], [d], [e, f] ]
I want
[ [a, d, e] , [a, d, f], [b, d, e], [b, d, f], [c, d, e], [c, d, f] ]
Note I don't know n in advance. I thought itertools.product would be the right approach but it requires me to know the number of arguments in advance
Hi,
I need a memory efficient int-int dict in Python that would support the following operations in O(log n) time:
d[k] = v # replace if present
v = d[k] # None or a negative number if not present
I need to hold ~250M pairs, so it really has to be tight.
Do you happen to know a suitable implementation (Python 2.7)?
EDIT Removed impossible requirement and other nonsense. Thanks, Craig and Kylotan!
To rephrase. Here's a trivial int-int dictionary with 1M pairs:
>>> import random, sys
>>> from guppy import hpy
>>> h = hpy()
>>> h.setrelheap()
>>> d = {}
>>> for _ in xrange(1000000):
... d[random.randint(0, sys.maxint)] = random.randint(0, sys.maxint)
...
>>> h.heap()
Partition of a set of 1999530 objects. Total size = 49161112 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 1 0 25165960 51 25165960 51 dict (no owner)
1 1999521 100 23994252 49 49160212 100 int
On average, a pair of integers uses 49 bytes.
Here's an array of 2M integers:
>>> import array, random, sys
>>> from guppy import hpy
>>> h = hpy()
>>> h.setrelheap()
>>> a = array.array('i')
>>> for _ in xrange(2000000):
... a.append(random.randint(0, sys.maxint))
...
>>> h.heap()
Partition of a set of 14 objects. Total size = 8001108 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 1 7 8000028 100 8000028 100 array.array
On average, a pair of integers uses 8 bytes.
I accept that 8 bytes/pair in a dictionary is rather hard to achieve in general. Rephrased question: is there a memory-efficient implementation of int-int dictionary that uses considerably less than 49 bytes/pair?
I am attempting to write some tests using webtest to test out my python GAE application. The problem I am running into is that the application is listening on port 8080 but I cannot configure webtest to hit that port.
For example, I want to use app.get('/getreport') to hit http://localhost:8080/getreport. Obviously, it hits just thits http:// localhost/getreport.
Is there a way to set up webtest to hit a particular port?
There is function in python called eval that takes string input and evaluates it.
>>> x = 1
>>> print eval('x+1')
2
>>> print eval('12 + 32')
44
>>>
What is Haskell equivalent of eval function?
I'm looking for a way to prevent multiple hosts from issuing simultaneous commands to a Python XMLRPC listener. The listener is responsible for running scripts to perform tasks on that system that would fail if multiple users tried to issue these commands at the same time. Is there a way I can block all incoming requests until the single instance has completed?
Is it possible to perform multi-linear regression in Python using NumPy?
The documentation here suggests that it is, but I cannot find any more details on the topic.
hi, i am siva this is frist time taken the python programming language i have a small problem please help me the question is **Write two functions, called countSubStringMatch and countSubStringMatchRecursive that take two arguments, a key string and a target string. These functions iteratively and recursively count the number of instances of the key in the target string. You should complete definitions for
def countSubStringMatch(target,key):
and
def countSubStringMatchRecursive (target, key):
**
I have 4 reasonably complex r scripts that are used to manipulate csv and xml files. These were created by another department where they work exclusively in r.
My understanding is that while r is very fast when dealing with data, it's not really optimised for file manipulation. Can I expect to get significant speed increases by converting these scripts to python? Or is this something of a waste of time?
I am using python lxml library to parse html pages:
import lxml.html
# this might run indefinitely
page = lxml.html.parse('http://stackoverflow.com/')
Is there any way to set timeout for parsing?
I have a list in python ('A','B','C','D','E'), how do I get which item is under a particular index number?
Example:
Say it was given 0, it would return A.
Given 2, it would return C.
Given 4, it would return E.
I use the following method to break the double loop in Python.
for word1 in buf1:
find = False
for word2 in buf2:
...
if res == res1:
print "BINGO " + word1 + ":" + word2
find = True
if find:
break
Is there better way to break the double loop?
Suppose I have a python object x and a string s, how do I set the attribute s on x? So:
>>> x = SomeObject()
>>> attr = 'myAttr'
>>> # magic goes here
>>> x.myAttr
'magic'
What's the magic? The goal of this, incidentally, is to cache calls to x.__getattr__().
I've a python script that gives me 2 lists and another who is the reference(the time).
How can I create a graphic with the representation of my first list by the time. And same question for the second list. I need them on the same graphic.
list1 [12, 15, 17, 19]
list2 [34, 78, 54, 67]
list3 [10, 20, 30, 40] (time in minutes)
How can I create a graphic in png format with these lists?
Thanks
I have a solution which I developed in VS2008 and which I am trying to add to Source Control (TFS 2010, though the issue happened in TFS 2008 as well). I have several TFS workspaces on my computer and I have access to several Team Projects.
When I right click the solution in my Solution Explorer and choose the "Add Solution to Source Control" option I am never given an option of choosing which Workspace or which Team Project to add the existing solution too. VS2008 then proceeds to add it to the same team project every time. I have tried selecting an alternate workspace/team project in every window where I can see an option for it but it always adds it back to the same one. I even tried changing the name of my new workspace so that alphabetically it was the first thinking that it might be somehow related to that... no luck.
I then tried goign to the Change Source Control window where you can add/remove bindings on a solution/project but that window also defaults to the same Team Project as trying to add the solution directly does...
Any help would be greatly appreciated with this, maybe I'm just missing something?
Is there any free Python to C translator? for example capable to translate such lib as lib for Fast content-aware image resizing (which already depends on some C libs) to C classes and files?
Hi,
I have a script where I launch with popen a shell command.
The problem is that the script don't wait that popen command is finished and go forward.
om_points = os.popen(command, "w")
.....
How can I tell to my python script to wait until the shell command has finished?
Thanks.
What are my best options for creating a financial open-high-low-close (OHLC) chart in a high level language like Ruby or Python? While there seem to be a lot of options for graphing, I haven't seen any gems or eggs with this kind of chart.
http://en.wikipedia.org/wiki/Open-high-low-close_chart (but I don't need the moving average or Bollinger bands)
JFreeChart can do this in Java, but I'd like to make my codebase as small and simple as possible.
Thanks!
First of all, I get the name of the current window
win32gui.GetWindowText(win32gui.GetForegroundWindow())
k, no problem with that...
But now, how can I make an if with the result for having an specific string on it...
For example, the result gave me
C:/Python26/
How can I make an True of False for the result containing the word, 'python' ?
I'm trying with re.search, but I'm not being able to make it do it
Is there a way to get functionality similar to mkdir -p on the shell... from within python. I am looking for a solution other than a system call. I am sure the code is less than 20 lines... really I am wondering if someone has already written it?
pt=[2]
pt[0]=raw_input()
when i do this , and give an input suppose 1011 , it says list indexing error- " list assignment index out of range" . may i know why? i think i am not able to assign a list properly . how to assign an array of 2 elements in python then?
I'm looking for up-to-date documentation and tutorials on creating Python bindings for gobjects. Everything I can find on the web is either incomplete or out of date.