regex to break a string into "key" / "value" pairs when # of pairs is variable?

Posted by user141146 on Stack Overflow See other posts from Stack Overflow or by user141146
Published on 2011-01-09T02:24:21Z Indexed on 2011/01/09 3:53 UTC
Read the original article Hit count: 288

Filed under:
|
|

Hi, I'm using Ruby 1.9 and I'm wondering if there's a simple regex way to do this.

I have many strings that look like some variation of this:

str = "Allocation:  Random, Control:  Active Control, Endpoint Classification:  Safety Study, Intervention Model:  Parallel Assignment, Masking:  Double Blind (Subject, Caregiver, Investigator, Outcomes Assessor), Primary Purpose:  Treatment"

The idea is that I'd like to break this string into its functional components

  • Allocation: Random
  • Control: Active Control
  • Endpoint Classification: Safety Study
  • Intervention Model: Parallel Assignment
  • Masking: Double Blind (Subject, Caregiver, Investigator, Outcomes, Assessor)
  • Primary Purpose: Treatment

The "syntax" of the string is that there is a "key" which consists of one or more "words or other characters" (e.g. Intervention Model) followed by a colon (:). Each key has a corresponding "value" (e.g., Parallel Assignment) that immediately follows the colon (:)…The "value" consists of words, commas (whatever), but the end of the "value" is signaled by a comma.

The # of key/value pairs is variable. I'm also assuming that colons (:) aren't allowed to be part of the "value" and that commas (,) aren't allowed to be part of the "key".

One would think that there is a "regexy" way to break this into its component pieces, but my attempt at making an appropriate matching regex only picks up the first key/value pair and I'm not sure how to capture the others. Any thoughts on how to capture the other matches?

 regex = /(([^,]+?): ([^:]+?,))+?/
=> /(([^,]+?): ([^:]+?,))+?/
irb(main):139:0> str = "Allocation:  Random, Control:  Active Control, Endpoint Classification:  Safety Study, Intervention Model:  Parallel Assignment, Masking:  Double Blind (Subject, Caregiver, Investigator, Outcomes Assessor), Primary Purpose:  Treatment"
=> "Allocation:  Random, Control:  Active Control, Endpoint Classification:  Safety Study, Intervention Model:  Parallel Assignment, Masking:  Double Blind (Subject, Caregiver, Investigator, Outcomes Assessor), Primary Purpose:  Treatment"
irb(main):140:0> str.match regex
=> #<MatchData "Allocation:  Random," 1:"Allocation:  Random," 2:"Allocation" 3:" Random,">
irb(main):141:0> $1
=> "Allocation:  Random,"
irb(main):142:0> $2
=> "Allocation"
irb(main):143:0> $3
=> " Random,"
irb(main):144:0> $4
=> nil

© Stack Overflow or respective owner

Related posts about ruby

Related posts about regex