Manually extracting portions of strings contained in a list (parsing)

Posted by user1652011 on Stack Overflow See other posts from Stack Overflow or by user1652011
Published on 2012-11-30T09:43:25Z Indexed on 2012/11/30 11:04 UTC
Read the original article Hit count: 248

Filed under:

urllib

I'm aware that there are modules that fully simplify this function, but saying that I am running from a base install of python (standard modules only), how would I extract the following:

I have a list. This list is the contents, line by line, of a webpage. Here is a mock up list (unformatted) for informative purposes:

<script>
    link = "/scripts/playlists/1/" + a.id + "/0-5417069212.asx";
<script>

"<a href="/apps/audio/?feedId=11065"><span class="px13">Eastern Metro Area Fire</span>"

From the above string, I need the following extracted. The feedId (11065), which is incidentally a.id in the code above., "/scripts/playlists/1/" and "/0-5417069212.asx". Remembering that each of these lines is just contents from objects in a list, how would I go about extracting that data?

Here is the full list:

contents = urllib2.urlopen("http://www.radioreference.com/apps/audio/?ctid=5586")

Pseudo:

from urllib2 import urlopen as getpage
page_contents = getpage("http://www.radioreference.com/apps/audio/?ctid=5586")

feedID        = % in (page_contents.search() for "/apps/audio/?feedId=%")
titleID       = % in (page_contents.search() for "<span class="px13">%</span>")
playlistID    = % in (page_contents.search() for "link = "%" + a.id + "*.asx";")
asxID         = * in (page_contents.search() for "link = "*" + a.id + "%.asx";")

streamURL     = "http://www.radioreference.com/" + playlistID + feedID + asxID + ".asx"

I plan to format it as such that streamURL should = :

http://www.radioreference.com/scripts/playlists/1/11065/0-5417067072.asx

Developer IT

Manually extracting portions of strings contained in a list (parsing) - Developer IT

Manually extracting portions of strings contained in a list (parsing)

python

list

parsing

urllib2

urllib

Related posts about python

unmet dependencies in Ubuntu 12.04

How can I get sikuli-ide to work?

Getting PATH right for python after MacPorts install

call python with system() in R to run a python script emulating the python console

Python - Calling a non python program from python?

Related posts about list

kernel module compiling error

SEO: Nested List vs List, Split Over Divs vs Definition List

ASA hairpining: I basicaly want to allow 2 spokes to be able to communicate with each other.

How to remove a package entirely?

Update a list from another list

Categories cloud