Python regex help
Posted
by
Dormish
on Stack Overflow
See other posts from Stack Overflow
or by Dormish
Published on 2010-12-28T15:45:57Z
Indexed on
2010/12/28
18:54 UTC
Read the original article
Hit count: 401
I am trying to make a regex that finds all names, url and phone numbers in an html page.
But I'm having trouble with the phone number part. I think the problem with the numbers part is that is searches until it finds the </strong>
but in that process it skips people, instead of making a empty string if the person has no phone number ( simply put instead of a list like this: url1+name1+num1 | url2+name2+"" | url3+name3+num3
it returns a list like this: url1+name1+num1 | url2+name2+num3
, with url3+name3
deleted in the process)
for url, name, pnumber in re.findall('Name"><div>(?:<a href="/si([^">]*)"> )?([^<]*)(?:.*?</strong>([^<]*))?',page):
I am searchin for people in s single very long line. A person could have an url or phone number. An example of a person with an url and a phone number
<tr> <td class="lablinksName"><div><a href="/si/ivan-bratko/default.html"> dr. Ivan Bratko akad. prof.</a></div></td> <td class="lablinksMail"><a href="javascript:void(cmPopup('sendMessage', '/si/ivan-bratko/mailer.html', true, 350, 350));"><img src="/Static/images/gui/mail.gif" height="8" width="11"></a></td> <td class="lablinksPhone"><div><strong>T:</strong> +386 1 4768 393 </div></td> </tr>
And an example of a person with no url or phone number
<tr> <td class="lablinksName"><div> dr. Branko Matjaž Juric prof.</div></td> <td class="lablinksMail"><a href="javascript:void(cmPopup('sendMessage', '/si/branko-matjaz-juric/mailer.html', true, 350, 350));"><img src="/Static/images/gui/mail.gif" height="8" width="11"></a></td> <td class="lablinksPhone"><div> </div></td> </tr>
I hope i was clear enough and if any one can help me.
© Stack Overflow or respective owner