Search Results

Search found 196 results on 8 pages for 'findall'.

Page 2/8 | < Previous Page | 1 2 3 4 5 6 7 8  | Next Page >

  • Beautifulsoup recursive attribute

    - by Marcos Placona
    Hi, trying to parse an XML with Beautifulsoup, but hit a brick wall when trying to use the "recursive" attribute with findall() I have a pretty odd xml format shown below: <?xml version="1.0"?> <catalog> <book id="bk101"> <author>Gambardella, Matthew</author> <title>XML Developer's Guide</title> <genre>Computer</genre> <price>44.95</price> <publish_date>2000-10-01</publish_date> <description>An in-depth look at creating applications with XML.</description> <catalog>true</catalog> </book> <book id="bk102"> <author>Ralls, Kim</author> <title>Midnight Rain</title> <genre>Fantasy</genre> <price>5.95</price> <publish_date>2000-12-16</publish_date> <description>A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world.</description> <catalog>false</catalog> </book> </catalog> As you can see, the catalog tag repeats inside the book tag, which causes an error when I try to to something like: from BeautifulSoup import BeautifulStoneSoup as BSS catalog = "catalog.xml" def open_rss(): f = open(catalog, 'r') return f.read() def rss_parser(): rss_contents = open_rss() soup = BSS(rss_contents) items = soup.findAll('catalog', recursive=False) for item in items: print item.title.string rss_parser() As you will see, on my soup.findAll I've added recursive=false, which in theory would make it no recurse through the item found, but skip to the next one. This doesn't seem to work, as I always get the following error: File "catalog.py", line 17, in rss_parser print item.title.string AttributeError: 'NoneType' object has no attribute 'string' I'm sure I'm doing something stupid here, and would appreciate if someone could give me some help on how to solve this problem. Changing the HTML structure is not an option, this this code needs to perform well as it will potentially parse a large XML file. Thanks in advance, Marcos

    Read the article

  • Python regex on list

    - by Peter Nielsen
    Hi there I am trying to build a parser and save the results as an xml file but i have problems.. For instance i get a TypeError: expected string or buffer when i try to run the code.. Would you experts please have a look at my code ? import urllib2, re from xml.dom.minidom import Document from BeautifulSoup import BeautifulSoup as bs osc = open('OSCTEST.html','r') oscread = osc.read() soup=bs(oscread) doc = Document() root = doc.createElement('root') doc.appendChild(root) countries = doc.createElement('countries') root.appendChild(countries) findtags1 = re.compile ('<h1 class="title metadata_title content_perceived_text(.*?)</h1>', re.DOTALL | re.IGNORECASE).findall(soup) findtags2 = re.compile ('<span class="content_text">(.*?)</span>', re.DOTALL | re.IGNORECASE).findall(soup) for header in findtags1: title_elem = doc.createElement('title') countries.appendChild(title_elem) header_elem = doc.createTextNode(header) title_elem.appendChild(header_elem) for item in findtags2: art_elem = doc.createElement('artikel') countries.appendChild(art_elem) s = item.replace('<P>','') t = s.replace('</P>','') text_elem = doc.createTextNode(t) art_elem.appendChild(text_elem) print doc.toprettyxml()

    Read the article

  • Is there any Prototype Javascript function similar to Jquery Live to trace dynamic dom elements?

    - by Wbdvlpr
    Hi Event.observe(window,"load",function() { $$(".elem_classs").findAll(function(node){ return node.getAttribute('title'); }).each(function(node){ new Tooltip(node,node.title); node.removeAttribute("title"); }); }); Using above method, I can retrieve all elements having ".elem_class" and apply some javascript functions on them. But I have a modal/popup box which has some elements also having ".elem_class" and these dont get in the scope of findAll/each as they are loaded into the dom thru ajax. How do I apply the same to dynamically loaded elements as well? I am using Prototype Library. (I have used JQuery's Live function which keeps track of all future elements, but need to achieve something similar using Prototype) Thanks.

    Read the article

  • Implement dictionary using java

    - by ahmad
    Task Dictionary ADT The dictionary ADT models a searchable collection of key-element entries Multiple items with the same key are allowed Applications: word-definition pairs Dictionary ADT methods: find(k): if the dictionary has an entry with key k, returns it, else, returns null findAll(k): returns an iterator of all entries with key k insert(k, o): inserts and returns the entry (k, o) remove(e): remove the entry e from the dictionary size(), isEmpty() Operation Output Dictionary insert(5,A) (5,A) (5,A) insert(7,B) (7,B) (5,A),(7,B) insert(2,C) (2,C) (5,A),(7,B),(2,C) insert(8,D) (8,D) (5,A),(7,B),(2,C),(8,D) insert(2,E) (2,E) (5,A),(7,B),(2,C),(8,D),(2,E) find(7) (7,B) (5,A),(7,B),(2,C),(8,D),(2,E) find(4) null (5,A),(7,B),(2,C),(8,D),(2,E) find(2) (2,C) (5,A),(7,B),(2,C),(8,D),(2,E) findAll(2) (2,C),(2,E) (5,A),(7,B),(2,C),(8,D),(2,E) size() 5 (5,A),(7,B),(2,C),(8,D),(2,E) remove(find(5)) (5,A) (7,B),(2,C),(8,D),(2,E) find(5) null (7,B),(2,C),(8,D),(2,E) Detailed explanation: NO Specific requirements: please Get it done i need HELP !!!

    Read the article

  • Optimizing BeautifulSoup (Python) code

    - by user283405
    I have code that uses the BeautifulSoup library for parsing, but it is very slow. The code is written in such a way that threads cannot be used. Can anyone help me with this? I am using BeautifulSoup for parsing and than save into a DB. If I comment out the save statement, it still takes a long time, so there is no problem with the database. def parse(self,text): soup = BeautifulSoup(text) arr = soup.findAll('tbody') for i in range(0,len(arr)-1): data=Data() soup2 = BeautifulSoup(str(arr[i])) arr2 = soup2.findAll('td') c=0 for j in arr2: if str(j).find("<a href=") > 0: data.sourceURL = self.getAttributeValue(str(j),'<a href="') else: if c == 2: data.Hits=j.renderContents() #and few others... c = c+1 data.save() Any suggestions? Note: I already ask this question here but that was closed due to incomplete information.

    Read the article

  • python: multiline regular expression

    - by facha
    Hi, everyone I have a piece of text and I've got to parse usernames and hashes out of it. Right now I'm doing it with two regular expressions. Could I do it with just one multiline regular expression? #!/usr/bin/env python import re test_str = """ Hello, UserName. Please read this looooooooooooooooong text. hash Now, write down this hash: fdaf9399jef9qw0j. Then keep reading this loooooooooong text. Hello, UserName2. Please read this looooooooooooooooong text. hash Now, write down this hash: gtwnhton340gjr2g. Then keep reading this loooooooooong text. """ logins = re.findall('Hello, (?P<login>.+).',test_str) hashes = re.findall('hash: (?P<hash>.+).',test_str)

    Read the article

  • Anonymous functions in C#

    - by Maxim Gershkovich
    The following syntax is valid VB.NET code Dim myCollection As New List(Of Stock) myCollection.Add(New Stock(Guid.NewGuid, "Item1")) myCollection.Add(New Stock(Guid.NewGuid, "Item2")) Dim res As List(Of Stock) = myCollection.FindAll(Function(stock As Stock) As Boolean If stock.Description = "Item2" Then Return True End If Return False End Function) How can I accomplish the same thing in C#? I have tried... myCollection.FindAll(bool delegate(Stock stock) { if (blah blah) { } }); But it appears I have somehow structured it incorrectly as I get the following error. "Error 1 Invalid expression term 'bool'"

    Read the article

  • optimize python code

    - by user283405
    i have code that uses BeautifulSoup library for parsing. But it is very slow. The code is written in such a way that threads cannot be used. Can anyone help me about this? I am using beautifulsoup library for parsing and than save in DB. if i comment the save statement, than still it takes time so there is no problem with database. def parse(self,text): soup = BeautifulSoup(text) arr = soup.findAll('tbody') for i in range(0,len(arr)-1): data=Data() soup2 = BeautifulSoup(str(arr[i])) arr2 = soup2.findAll('td') c=0 for j in arr2: if str(j).find("<a href=") > 0: data.sourceURL = self.getAttributeValue(str(j),'<a href="') else: if c == 2: data.Hits=j.renderContents() #and few others... #... c = c+1 data.save() Any suggestions? Note: I already ask this question here but that was closed due to incomplete information.

    Read the article

  • What are the pros and cons of using manual list iteration vs recursion through fail

    - by magus
    I come up against this all the time, and I'm never sure which way to attack it. Below are two methods for processing some season facts. What I'm trying to work out is whether to use method 1 or 2, and what are the pros and cons of each, especially large amounts of facts. methodone seems wasteful since the facts are available, why bother building a list of them (especially a large list). This must have memory implications too if the list is large enough ? And it doesn't take advantage of Prolog's natural backtracking feature. methodtwo takes advantage of backtracking to do the recursion for me, and I would guess would be much more memory efficient, but is it good programming practice generally to do this? It's arguably uglier to follow, and might there be any other side effects? One problem I can see is that each time fail is called, we lose the ability to pass anything back to the calling predicate, eg. if it was methodtwo(SeasonResults), since we continually fail the predicate on purpose. So methodtwo would need to assert facts to store state. Presumably(?) method 2 would be faster as it has no (large) list processing to do? I could imagine that if I had a list, then methodone would be the way to go.. or is that always true? Might it make sense in any conditions to assert the list to facts using methodone then process them using method two? Complete madness? But then again, I read that asserting facts is a very 'expensive' business, so list handling might be the way to go, even for large lists? Any thoughts? Or is it sometimes better to use one and not the other, depending on (what) situation? eg. for memory optimisation, use method 2, including asserting facts and, for speed use method 1? season(spring). season(summer). season(autumn). season(winter). % Season handling showseason(Season) :- atom_length(Season, LenSeason), write('Season Length is '), write(LenSeason), nl. % ------------------------------------------------------------- % Method 1 - Findall facts/iterate through the list and process each %-------------------------------------------------------------- % Iterate manually through a season list lenseason([]). lenseason([Season|MoreSeasons]) :- showseason(Season), lenseason(MoreSeasons). % Findall to build a list then iterate until all done methodone :- findall(Season, season(Season), AllSeasons), lenseason(AllSeasons), write('Done'). % ------------------------------------------------------------- % Method 2 - Use fail to force recursion %-------------------------------------------------------------- methodtwo :- % Get one season and show it season(Season), showseason(Season), % Force prolog to backtrack to find another season fail. % No more seasons, we have finished methodtwo :- write('Done').

    Read the article

  • How To Configure Query Cacheing in EclipseLink

    - by rustyshelf
    I have a collection of states, that I want to cache for the life of the application, preferably after it is called for the first time. I'm using EclipseLink as my persistence provider. In my EJB3 entity I have the following code: @Cache @NamedQueries({ @NamedQuery( name = "State.findAll", query = "SELECT s FROM State s", hints = { @QueryHint(name=QueryHints.CACHE_USAGE, value=CacheUsage.CheckCacheThenDatabase), @QueryHint(name=QueryHints.READ_ONLY, value=HintValues.TRUE) } ) }) This doesn't seem to do anything though, if I monitor the SQL queries going to MySQL it still does a select each time my Session Bean uses this NamedQuery. What is the correct way to configure this query so that it is only ever read once from the database, preferably across all sessions? Edit: I am calling the query like this: Query query = em.createNamedQuery("State.findAll"); List<State> states = query.getResultList();

    Read the article

  • Set Hibernate session's flush mode in Spring

    - by glaz666
    I am writing integration tests and in one test method I'd like to write some data to DB and then read it. @RunWith(SpringJUnit4ClassRunner.class) @ContextConfiguration(locations = {"classpath:applicationContext.xml"}) @TransactionConfiguration() @Transactional public class SimpleIntegrationTest { @Resource private DummyDAO dummyDAO; /** * Tries to store {@link com.example.server.entity.DummyEntity}. */ @Test public void testPersistTestEntity() { int countBefore = dummyDAO.findAll().size(); DummyEntity dummyEntity = new DummyEntity(); dummyDAO.makePersistent(dummyEntity); //HERE SHOULD COME SESSION.FLUSH() int countAfter = dummyDAO.findAll().size(); assertEquals(countBefore + 1, countAfter); } } As you can see between storing and reading data, the session should be flushed because the default FushMode is AUTO thus no data can be actually stored in DB. Question: Can I some how set FlushMode to ALWAYS in session factory or somewhere else to avoid repeating session.flush() call? All DB calls in DAO goes with HibernateTemplate instance. Thanks in advance.

    Read the article

  • Save File Contents to Variable in Python3.3 [migrated]

    - by Neo_Programmer
    I have a Python3.3 script that seems to not work. The script will search for an XML pattern and then print the results to the screen. I am using Ubuntu 12.10 (AMD64) and python3.3. I prefer to use regex with XML, so please disregard this unconventional form of programming. #!/usr/bin/python3.3 import io, re openfile = open('./temp/xaiml/temp_db1.xaiml', 'r') TEMPDB = openfile.read() OUTPUT = print(''.join(re.findall('<cgy><prn>.*_.*<\/prn>.*<\/cgy>', TEMPDB, flags=re.I)))

    Read the article

  • C# Active Directory - Check username / password

    - by Michael G
    I'm using the following code on Windows Vista Ultimate SP1 to query our active directory server to check the user name and password of a user on a domain. public Object IsAuthenticated() { String domainAndUsername = strDomain + @"\" + strUser; DirectoryEntry entry = new DirectoryEntry(_path, domainAndUsername, strPass); SearchResult result; try { //Bind to the native AdsObject to force authentication. DirectorySearcher search = new DirectorySearcher(entry) { Filter = ("(SAMAccountName=" + strUser + ")") }; search.PropertiesToLoad.Add("givenName"); // First Name search.PropertiesToLoad.Add("sn"); // Last Name search.PropertiesToLoad.Add("cn"); // Last Name result = search.FindOne(); if (null == result) { return null; } //Update the new path to the user in the directory. _path = result.Path; _filterAttribute = (String)result.Properties["cn"][0]; } catch (Exception ex) { return new Exception("Error authenticating user. " + ex.Message); } return user; } the target is using .NET 3.5, and compiled with VS 2008 standard I'm logged in under a domain account that is a domain admin where the application is running. The code works perfectly on windows XP; but i get the following exception when running it on Vista: System.DirectoryServices.DirectoryServicesCOMException (0x8007052E): Logon failure: unknown user name or bad password. at System.DirectoryServices.DirectoryEntry.Bind(Boolean throwIfFail) at System.DirectoryServices.DirectoryEntry.Bind() at System.DirectoryServices.DirectoryEntry.get_AdsObject() at System.DirectoryServices.DirectorySearcher.FindAll(Boolean findMoreThanOne) at System.DirectoryServices.DirectorySearcher.FindOne() at Chain_Of_Custody.Classes.Authentication.LdapAuthentication.IsAuthenticated() at System.DirectoryServices.DirectoryEntry.Bind(Boolean throwIfFail) at System.DirectoryServices.DirectoryEntry.Bind() at System.DirectoryServices.DirectoryEntry.get_AdsObject() at System.DirectoryServices.DirectorySearcher.FindAll(Boolean findMoreThanOne) at System.DirectoryServices.DirectorySearcher.FindOne() at Chain_Of_Custody.Classes.Authentication.LdapAuthentication.IsAuthenticated() I've tried changing the authentication types, I'm not sure what's going on. See also: http://stackoverflow.com/questions/290548/c-validate-a-username-and-password-against-active-directory

    Read the article

  • scraping text from multiple html files into a single csv file

    - by Lulu
    I have just over 1500 html pages (1.html to 1500.html). I have written a code using Beautiful Soup that extracts most of the data I need but "misses" out some of the data within the table. My Input: e.g file 1500.html My Code: #!/usr/bin/env python import glob import codecs from BeautifulSoup import BeautifulSoup with codecs.open('dump2.csv', "w", encoding="utf-8") as csvfile: for file in glob.glob('*html*'): print 'Processing', file soup = BeautifulSoup(open(file).read()) rows = soup.findAll('tr') for tr in rows: cols = tr.findAll('td') #print >> csvfile,"#".join(col.string for col in cols) #print >> csvfile,"#".join(td.find(text=True)) for col in cols: print >> csvfile, col.string print >> csvfile, "===" print >> csvfile, "***" Output: One CSV file, with 1500 lines of text and columns of data. For some reason my code does not pull out all the required data but "misses" some data, e.g the Address1 and Address 2 data at the start of the table do not come out. I modified the code to put in * and === separators, I then use perl to put into a clean csv file, unfortunately I'm not sure how to work my code to get all the data I'm looking for!

    Read the article

  • Which of these is pythonic? and Pythonic vs. Speed

    - by Kashyap Nadig
    Hi! I'm new to python and just wrote this module level function: def _interval(patt): """ Converts a string pattern of the form '1y 42d 14h56m' to a timedelta object. y - years (365 days), M - months (30 days), w - weeks, d - days, h - hours, m - minutes, s - seconds""" m = _re.findall(r'([+-]?\d*(?:\.\d+)?)([yMwdhms])', patt) args = {'weeks': 0.0, 'days': 0.0, 'hours': 0.0, 'minutes': 0.0, 'seconds': 0.0} for (n,q) in m: if q=='y': args['days'] += float(n)*365 elif q=='M': args['days'] += float(n)*30 elif q=='w': args['weeks'] += float(n) elif q=='d': args['days'] += float(n) elif q=='h': args['hours'] += float(n) elif q=='m': args['minutes'] += float(n) elif q=='s': args['seconds'] += float(n) return _dt.timedelta(**args) My issue is with the for loop here i.e the long if elif block, and was wondering if there is a more pythonic way of doing it. So I re-wrote the function as: def _interval2(patt): m = _re.findall(r'([+-]?\d*(?:\.\d+)?)([yMwdhms])', patt) args = {'weeks': 0.0, 'days': 0.0, 'hours': 0.0, 'minutes': 0.0, 'seconds': 0.0} argsmap = {'y': ('days', lambda x: float(x)*365), 'M': ('days', lambda x: float(x)*30), 'w': ('weeks', lambda x: float(x)), 'd': ('days', lambda x: float(x)), 'h': ('hours', lambda x: float(x)), 'm': ('minutes', lambda x: float(x)), 's': ('seconds', lambda x: float(x))} for (n,q) in m: args[argsmap[q][0]] += argsmap[q][1](n) return _dt.timedelta(**args) I tested the execution times of both the codes using timeit module and found that the second one took about 5-6 seconds longer (for the default number of repeats). So my question is: 1. Which code is considered more pythonic? 2. Is there still a more pythonic was of writing this function? 3. What about the trade-offs between pythonicity and other aspects (like speed in this case) of programming? p.s. I kinda have an OCD for elegant code. EDITED _interval2 after seeing this answer: argsmap = {'y': ('days', 365), 'M': ('days', 30), 'w': ('weeks', 1), 'd': ('days', 1), 'h': ('hours', 1), 'm': ('minutes', 1), 's': ('seconds', 1)} for (n,q) in m: args[argsmap[q][0]] += float(n)*argsmap[q][1]

    Read the article

  • In Python BeautifulSoup How to move tags

    - by JJ
    I have a partially converted XML document in soup coming from HTML. After some replacement and editing in the soup, the body is essentially - <Text...></Text> # This replaces <a href..> tags but automatically creates the </Text> <p class=norm ...</p> <p class=norm ...</p> <Text...></Text> <p class=norm ...</p> and so forth. I need to "move" the <p> tags to be children to <Text> or know how to suppress the </Text>. I want - <Text...> <p class=norm ...</p> <p class=norm ...</p> </Text> <Text...> <p class=norm ...</p> </Text> I've tried using item.insert and item.append but I'm thinking there must be a more elegant solution. for item in soup.findAll(['p','span']): if item.name == 'span' and item.has_key('class') and item['class'] == 'section': xBCV = short_2_long(item._getAttrMap().get('value','')) if currentnode: pass currentnode = Tag(soup,'Text', attrs=[('TypeOf', 'Section'),... ]) item.replaceWith(currentnode) # works but creates end tag elif item.name == 'p' and item.has_key('class') and item['class'] == 'norm': childcdatanode = None for ahref in item.findAll('a'): if childcdatanode: pass newlink = filter_hrefs(str(ahref)) childcdatanode = Tag(soup, newlink) ahref.replaceWith(childcdatanode) Thanks

    Read the article

  • Python Regular Expressions: Capture lookahead value (capturing text without consuming it)

    - by Lattyware
    I wish to use regular expressions to split words into groups of (vowels, not_vowels, more_vowels), using a marker to ensure every word begins and ends with a vowel. import re MARKER = "~" VOWELS = {"a", "e", "i", "o", "u", MARKER} word = "dog" if word[0] not in VOWELS: word = MARKER+word if word[-1] not in VOWELS: word += MARKER re.findall("([%]+)([^%]+)([%]+)".replace("%", "".join(VOWELS)), word) In this example we get: [('~', 'd', 'o')] The issue is that I wish the matches to overlap - the last set of vowels should become the first set of the next match. This appears possible with lookaheads, if we replace the regex as follows: re.findall("([%]+)([^%]+)(?=[%]+)".replace("%", "".join(VOWELS)), word) We get: [('~', 'd'), ('o', 'g')] Which means we are matching what I want. However, it now doesn't return the last set of vowels. The output I want is: [('~', 'd', 'o'), ('o', 'g', '~')] I feel this should be possible (if the regex can check for the second set of vowels, I see no reason it can't return them), but I can't find any way of doing it beyond the brute force method, looping through the results after I have them and appending the first character of the next match to the last match, and the last character of the string to the last match. Is there a better way in which I can do this? The two things that would work would be capturing the lookahead value, or not consuming the text on a match, while capturing the value - I can't find any way of doing either.

    Read the article

  • grabbing a substring while scraping with Python2.6

    - by Diego
    Hey can someone help with the following? I'm trying to scrape a site that has the following information.. I need to pull just the number after the </strong> tag.. [<li><strong>ISBN-13:</strong> 9780375853401</li>, <li><strong>Pub. Date: </strong> 05/11/2010</li>] [<li><strong>UPC:</strong> 490355000372</li>, <li><strong>Catalog No:</strong> 15024/25</li>, <li><strong>Label:</strong> CAMERATA</li>] here's a piece of the code I've been using to grab the above data using mechanize and BeautifulSoup. I'm stuck here as it won't let me use the find() function for a list br_results = mechanize.urlopen(br_results) html = br_results.read() soup = BeautifulSoup(html) local_links = soup.findAll("a", {"class" : "down-arrow csa"}) upc_code = soup.findAll("ul", {"class" : "bc-meta3"}) for upc in upc_code: upc_text = upc.contents.contents print upc_text

    Read the article

  • Prettify working using python idle but not when using python script

    - by Loclip
    So when I use print soup.prettify() on idle it's prettifying my html but when I calling the script from server the prettify() not working #!/usr/bin/python import cgi, cgitb, urllib2, sys from bs4 import BeautifulSoup styles = [] i = 1 site = "www.example.com" page = urllib2.urlopen(site) soup = BeautifulSoup(page) table = soup.find('table', {'class' :'style5'}) alltd = table.findAll('td') allspans = table.findAll('span') while i<55: styles.append("style" + str(i)) i+=1 for td in alltd: if any(x in td["class"] for x in styles): td["class"] = "align" for span in allspans: span.replaceWithChildren() print "Content-type: text/html" print print "<!DOCTYPE html>" print "<html>" print "<head>" print '<meta http-equiv="content-type" content="text/html; charset=utf-8">' print '<link rel="stylesheet" href="lightbox/css/lightbox.css" type="text/css" media="screen">' print '<style type="text/css">' print "body {background-color:#b0c4de;}" print '.align {text-align: center;}' print "</style>" print "</head>" print "<body>" print table.pretiffy() print "</body>" print "</html>"

    Read the article

  • Optimization of Function with Dictionary and Zip()

    - by eWizardII
    Hello, I have the following function: def filetxt(): word_freq = {} lvl1 = [] lvl2 = [] total_t = 0 users = 0 text = [] for l in range(0,500): # Open File if os.path.exists("C:/Twitter/json/user_" + str(l) + ".json") == True: with open("C:/Twitter/json/user_" + str(l) + ".json", "r") as f: text_f = json.load(f) users = users + 1 for i in range(len(text_f)): text.append(text_f[str(i)]['text']) total_t = total_t + 1 else: pass # Filter occ = 0 import string for i in range(len(text)): s = text[i] # Sample string a = re.findall(r'(RT)',s) b = re.findall(r'(@)',s) occ = len(a) + len(b) + occ s = s.encode('utf-8') out = s.translate(string.maketrans("",""), string.punctuation) # Create Wordlist/Dictionary word_list = text[i].lower().split(None) for word in word_list: word_freq[word] = word_freq.get(word, 0) + 1 keys = word_freq.keys() numbo = range(1,len(keys)+1) WList = ', '.join(keys) NList = str(numbo).strip('[]') WList = WList.split(", ") NList = NList.split(", ") W2N = dict(zip(WList, NList)) for k in range (0,len(word_list)): word_list[k] = W2N[word_list[k]] for i in range (0,len(word_list)-1): lvl1.append(word_list[i]) lvl2.append(word_list[i+1]) I have used the profiler to find that it seems the greatest CPU time is spent on the zip() function and the join and split parts of the code, I'm looking to see if there is any way I have overlooked that I could potentially clean up the code to make it more optimized, since the greatest lag seems to be in how I am working with the dictionaries and the zip() function. Any help would be appreciated thanks!

    Read the article

  • RegEx: difference between "(?:...) and normal parentheses

    - by N0thing
    >>> re.findall(r"(?:do|re|mi)+", "mimi") ['mimi'] >>> re.findall(r"(do|re|mi)+", "mimi") ['mi'] According to my understanding of the definitions, it should produce the same answer. The only difference between (...) and (?:...) should be whether or not we can use back-references later. Am I missing something? (...) Matches whatever regular expression is inside the parentheses, and indicates the start and end of a group; the contents of a group can be retrieved after a match has been performed, and can be matched later in the string with the \number special sequence, described below. To match the literals '(' or ')', use ( or ), or enclose them inside a character class: [(] [)]. (?:...) A non-capturing version of regular parentheses. Matches whatever regular expression is inside the parentheses, but the substring matched by the group cannot be retrieved after performing a match or referenced later in the pattern.

    Read the article

  • : in node causing Keyerror in xmlparsing using ElementTree

    - by kguckian
    Hi I'm using ElementTree to parse out an xml feed from Kuler. I'm only beginning in python but am stuck here. The parsing works fine until I attempt to retrieve any nodes containing ':' e.g kuler:swatchHexColor Below is a cut down version of the full feed but same structure: <rss xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:kuler="http://kuler.adobe.com/kuler/API/rss/" xmlns:rss="http://blogs.law.harvard.edu/tech/rss" version="2.0"> <channel> <title>kuler popular themes</title> <item> <title>Theme Title: Fresh Money</title> <description> &lt;img src="http://kuler-api.adobe.com/kuler/themeImages/theme_808366.png" /&gt;&lt;br /&gt; Artist: thesylph005&lt;br /&gt; ThemeID: 808366&lt;br /&gt; Posted: 03/02/2010&lt;br /&gt; Hex: 2F400D, 8CBF26, A8CA65, E8E5B0, 419184 </description> <kuler:themeItem> <kuler:themeID>808366</kuler:themeID> <kuler:themeTitle>Fresh Money</kuler:themeTitle> <kuler:themeImage>http://kuler-api.adobe.com/kuler/themeImages/theme_808366.png</kuler:themeImage> <kuler:themeAuthor> <kuler:authorID>370750</kuler:authorID> <kuler:authorLabel>thesylph005</kuler:authorLabel> </kuler:themeAuthor> <kuler:themeTags/> <kuler:themeRating>4</kuler:themeRating> <kuler:themeDownloadCount>708</kuler:themeDownloadCount> <kuler:themeCreatedAt>20100302</kuler:themeCreatedAt> <kuler:themeEditedAt>20100302</kuler:themeEditedAt> <kuler:themeSwatches> <kuler:swatch> <kuler:swatchHexColor>2F400D</kuler:swatchHexColor> <kuler:swatchColorMode>rgb</kuler:swatchColorMode> <kuler:swatchChannel1>0.183333</kuler:swatchChannel1> <kuler:swatchChannel2>0.25</kuler:swatchChannel2> <kuler:swatchChannel3>0.05</kuler:swatchChannel3> <kuler:swatchChannel4>0.0</kuler:swatchChannel4> <kuler:swatchIndex>0</kuler:swatchIndex> </kuler:swatch> <kuler:swatch> <kuler:swatchHexColor>8CBF26</kuler:swatchHexColor> <kuler:swatchColorMode>rgb</kuler:swatchColorMode> <kuler:swatchChannel1>0.55</kuler:swatchChannel1> <kuler:swatchChannel2>0.75</kuler:swatchChannel2> <kuler:swatchChannel3>0.15</kuler:swatchChannel3> <kuler:swatchChannel4>0.0</kuler:swatchChannel4> <kuler:swatchIndex>1</kuler:swatchIndex> </kuler:swatch> <kuler:swatch> <kuler:swatchHexColor>A8CA65</kuler:swatchHexColor> <kuler:swatchColorMode>rgb</kuler:swatchColorMode> <kuler:swatchChannel1>0.659722</kuler:swatchChannel1> <kuler:swatchChannel2>0.791667</kuler:swatchChannel2> <kuler:swatchChannel3>0.395833</kuler:swatchChannel3> <kuler:swatchChannel4>0.0</kuler:swatchChannel4> <kuler:swatchIndex>2</kuler:swatchIndex> </kuler:swatch> <kuler:swatch> <kuler:swatchHexColor>E8E5B0</kuler:swatchHexColor> <kuler:swatchColorMode>rgb</kuler:swatchColorMode> <kuler:swatchChannel1>0.91</kuler:swatchChannel1> <kuler:swatchChannel2>0.898047</kuler:swatchChannel2> <kuler:swatchChannel3>0.688705</kuler:swatchChannel3> <kuler:swatchChannel4>0.0</kuler:swatchChannel4> <kuler:swatchIndex>3</kuler:swatchIndex> </kuler:swatch> <kuler:swatch> <kuler:swatchHexColor>419184</kuler:swatchHexColor> <kuler:swatchColorMode>rgb</kuler:swatchColorMode> <kuler:swatchChannel1>0.254901</kuler:swatchChannel1> <kuler:swatchChannel2>0.57</kuler:swatchChannel2> <kuler:swatchChannel3>0.519034</kuler:swatchChannel3> <kuler:swatchChannel4>0.0</kuler:swatchChannel4> <kuler:swatchIndex>4</kuler:swatchIndex> </kuler:swatch> </kuler:themeSwatches> Tue, 30 Mar 2010 11:27:12 PST So if I do a findall on say each item's description, I get that back fine. But the minute I try to retrieve anything with a : in the nodename I get Exception Type: KeyError Exception Value: ':' So this works from elementtree.ElementTree import Element, SubElement, dump, parse def xml(): kulerurl = 'http://kuler-api.adobe.com/rss/get.cfm?listType=popular&startIndex=0&itemsPerPage=5&timeSpan=30&key=mykey' rss = parse(urllib.urlopen(kulerurl)).getroot() for element in rss.findall('channel/item'): print(element.findtext('description')) dump (rss) but this doesn't def xml(): kulerurl = 'http://kuler-api.adobe.com/rss/get.cfm?listType=popular&startIndex=0&itemsPerPage=5&timeSpan=30&key=mykey' rss = parse(urllib.urlopen(kulerurl)).getroot() for element in rss.findall('channel/item/kuler:themeItem'): print(element.findtext('kuler:themeID')) dump (rss) I'm sure it's something simple if anyone could point me to what I'm doing wrong here I'd be most grateful thanks Kieran

    Read the article

  • Display root node of Hierarchical Tree using ADF - EJB DC

    - by arul.wilson(at)oracle.com
    Displaying Employee (HR schema) records in Hierarchical Tree can be achieved in ADF-BC by creating custom VO and a Viewlink for displaying root node. This can be more easily done using  EJB-DC by just introducing a NamedQuery to get the root node.Here you go to get this scenario working.Create DB connection based on HR schema.Create Entity Bean from Employees Table.Add custom NamedQuery to Employees.java bean, this named query is responsible for fetching the root node (King in this example). @NamedQueries({  @NamedQuery(name = "Employees.findAll", query = "select o from Employees o"),  @NamedQuery(name = "Employees.findRootEmp", query = "select p from Employees p where p.employees is null")}) Create Stateless Session Bean and expose the Named Queries through the Session Facade.Create Datacontrol from SessionBean local interface.Create jspx page in ViewController project.Drop employeesFindRootEmp from Data Controls Palette as ADF Tree.Add employeesList as Tree level rule.Run page to see the hierarchical tree with root node as 'King'

    Read the article

  • Splitting a filename into words and numbers in Python

    - by danspants
    The following code splits a string into a list of words but does not include numbers: txt="there_once was,a-monkey.called phillip?09.txt" sep=re.compile(r"[\s\.,-_\?]+") sep.split(txt) ['there', 'once', 'was', 'a', 'monkey', 'called', 'phillip', 'txt'] This code gives me words and numbers but still includes "_" as a valid character: re.findall(r"\w+|\d+",txt) ['there_once', 'was', 'a', 'monkey', 'called', 'phillip', '09', 'txt'] What do I need to alter in either piece of code to end up with the desired result of: ['there', 'once', 'was', 'a', 'monkey', 'called', 'phillip', '09', 'txt']

    Read the article

  • Linq-to-sql Compiled Query returning object NOT belonging to submitted DataContext

    - by Vladimir Kojic
    Compiled query: public static class Machines { public static readonly Func<OperationalDataContext, short, Machine> QueryMachineById = CompiledQuery.Compile((OperationalDataContext db, short machineID) => db.Machines.Where(m => m.MachineID == machineID).SingleOrDefault() ); public static Machine GetMachineById(IUnitOfWork unitOfWork, short id) { Machine machine; // Old code (working) //var machineRepository = unitOfWork.GetRepository<Machine>(); //machine = machineRepository.Find(m => m.MachineID == id).SingleOrDefault(); // New code (making problems) machine = QueryMachineById(unitOfWork.DataContext, id); return machine; } It looks like compiled query is caching Machine object and returning the same object even if query is called from new DataContext (I’m disposing DataContext in the service but I’m getting Machine from previous DataContext). I use POCOs and XML mapping. Revised: It looks like compiled query is returning result from new data context and it is not using the one that I passed in compiled-query. Therefore I can not reuse returned object and link it to another object obtained from datacontext thru non compiled queries. [TestMethod] public void GetMachinesTest() { // Test Preparation (not important) using (var unitOfWork = IoC.Get<IUnitOfWork>()) { var machineRepository = unitOfWork.GetRepository<Machine>(); // GET ALL List<Machine> list = machineRepository.FindAll().ToList<Machine>(); VerifyIntegratedMachine(list[2], 3, "Machine 3", "333333", "G300PET", "MachineIconC.xaml", false, true, LicenseType.Licensed, "10.0.97.3", "10.0.97.3", 0); var machine = Machines.GetMachineById(unitOfWork, 3); Assert.AreSame(list[2], machine); // PASS !!!! } using (var unitOfWork = IoC.Get<IUnitOfWork>()) { var machineRepository = unitOfWork.GetRepository<Machine>(); // GET ALL List<Machine> list = machineRepository.FindAll().ToList<Machine>(); VerifyIntegratedMachine(list[2], 3, "Machine 3", "333333", "G300PET", "MachineIconC.xaml", false, true, LicenseType.Licensed, "10.0.97.3", "10.0.97.3", 0); var machine = Machines.GetMachineById(unitOfWork, 3); Assert.AreSame(list[2], machine); // FAIL !!!! } } If I run other (complex) unit tests I'm getting as expected: An attempt has been made to Attach or Add an entity that is not new, perhaps having been loaded from another DataContext.

    Read the article

< Previous Page | 1 2 3 4 5 6 7 8  | Next Page >