Python regex help

Posted by Dormish on Stack Overflow See other posts from Stack Overflow or by Dormish
Published on 2010-12-28T15:45:57Z Indexed on 2010/12/28 18:54 UTC
Read the original article Hit count: 401

Filed under:
|

I am trying to make a regex that finds all names, url and phone numbers in an html page. But I'm having trouble with the phone number part. I think the problem with the numbers part is that is searches until it finds the </strong> but in that process it skips people, instead of making a empty string if the person has no phone number ( simply put instead of a list like this: url1+name1+num1 | url2+name2+"" | url3+name3+num3 it returns a list like this: url1+name1+num1 | url2+name2+num3 , with url3+name3 deleted in the process)

for url, name, pnumber in re.findall('Name"><div>(?:<a href="/si([^">]*)"> )?([^<]*)(?:.*?</strong>([^<]*))?',page):

I am searchin for people in s single very long line. A person could have an url or phone number. An example of a person with an url and a phone number

 <tr>  <td class="lablinksName"><div><a href="/si/ivan-bratko/default.html"> dr. Ivan Bratko  akad. prof.</a></div></td>  <td class="lablinksMail"><a href="javascript:void(cmPopup('sendMessage', '/si/ivan-bratko/mailer.html', true, 350, 350));"><img src="/Static/images/gui/mail.gif" height="8" width="11"></a></td> <td class="lablinksPhone"><div><strong>T:</strong> +386  1 4768 393 </div></td> </tr>

And an example of a person with no url or phone number

 <tr>  <td class="lablinksName"><div> dr. Branko Matjaž  Juric   prof.</div></td>  <td class="lablinksMail"><a href="javascript:void(cmPopup('sendMessage', '/si/branko-matjaz-juric/mailer.html', true, 350, 350));"><img src="/Static/images/gui/mail.gif" height="8" width="11"></a></td> <td class="lablinksPhone"><div> </div></td> </tr>

I hope i was clear enough and if any one can help me.

© Stack Overflow or respective owner

Related posts about python

Related posts about regex