Pulling out two separate words from a string using reg expressions?

Posted by Marvin on Stack Overflow See other posts from Stack Overflow or by Marvin
Published on 2010-03-29T21:37:43Z Indexed on 2010/03/29 21:43 UTC
Read the original article Hit count: 351

Filed under:

I need to improve on a regular expression I'm using. Currently, here it is:

^[a-zA-Z\s/-]+

I'm using it to pull out medication names from a variety of formulation strings, for example:

  • SULFAMETHOXAZOLE-TRIMETHOPRIM 200-40 MG/5ML PO SUSP
  • AMOX TR/POTASSIUM CLAVULANATE 125 mg-31.25 mg ORAL TABLET, CHEWABLE
  • AMOXICILLIN TRIHYDRATE 125 mg ORAL TABLET, CHEWABLE
  • AMOX TR/POTASSIUM CLAVULANATE 125 mg-31.25 mg ORAL TABLET, CHEWABLE
  • Amoxicillin 1000 MG / Clavulanate 62.5 MG Extended Release Tablet

The resulting matches on these examples are:

  • SULFAMETHOXAZOLE-TRIMETHOPRIM
  • AMOX TR/POTASSIUM CLAVULANATE
  • AMOXICILLIN TRIHYDRATE
  • AMOX TR/POTASSIUM CLAVULANATE
  • Amoxicillin

The first four are what I want, but on the fifth, I really need "Amoxicillin / Clavulanate".

How would I pull out patterns like "Amoxicillin / Clavulanate" (in fifth row) while missing patterns like "MG/5 ML" (in the first row)?

© Stack Overflow or respective owner

Related posts about regex