regex to break a string into "key" / "value" pairs when # of pairs is variable?
Posted
by
user141146
on Stack Overflow
See other posts from Stack Overflow
or by user141146
Published on 2011-01-09T02:24:21Z
Indexed on
2011/01/09
3:53 UTC
Read the original article
Hit count: 299
Hi, I'm using Ruby 1.9 and I'm wondering if there's a simple regex way to do this.
I have many strings that look like some variation of this:
str = "Allocation: Random, Control: Active Control, Endpoint Classification: Safety Study, Intervention Model: Parallel Assignment, Masking: Double Blind (Subject, Caregiver, Investigator, Outcomes Assessor), Primary Purpose: Treatment"
The idea is that I'd like to break this string into its functional components
- Allocation: Random
- Control: Active Control
- Endpoint Classification: Safety Study
- Intervention Model: Parallel Assignment
- Masking: Double Blind (Subject, Caregiver, Investigator, Outcomes, Assessor)
- Primary Purpose: Treatment
The "syntax" of the string is that there is a "key" which consists of one or more "words or other characters" (e.g. Intervention Model) followed by a colon (:). Each key has a corresponding "value" (e.g., Parallel Assignment) that immediately follows the colon (:)…The "value" consists of words, commas (whatever), but the end of the "value" is signaled by a comma.
The # of key/value pairs is variable. I'm also assuming that colons (:) aren't allowed to be part of the "value" and that commas (,) aren't allowed to be part of the "key".
One would think that there is a "regexy" way to break this into its component pieces, but my attempt at making an appropriate matching regex only picks up the first key/value pair and I'm not sure how to capture the others. Any thoughts on how to capture the other matches?
regex = /(([^,]+?): ([^:]+?,))+?/
=> /(([^,]+?): ([^:]+?,))+?/
irb(main):139:0> str = "Allocation: Random, Control: Active Control, Endpoint Classification: Safety Study, Intervention Model: Parallel Assignment, Masking: Double Blind (Subject, Caregiver, Investigator, Outcomes Assessor), Primary Purpose: Treatment"
=> "Allocation: Random, Control: Active Control, Endpoint Classification: Safety Study, Intervention Model: Parallel Assignment, Masking: Double Blind (Subject, Caregiver, Investigator, Outcomes Assessor), Primary Purpose: Treatment"
irb(main):140:0> str.match regex
=> #<MatchData "Allocation: Random," 1:"Allocation: Random," 2:"Allocation" 3:" Random,">
irb(main):141:0> $1
=> "Allocation: Random,"
irb(main):142:0> $2
=> "Allocation"
irb(main):143:0> $3
=> " Random,"
irb(main):144:0> $4
=> nil
© Stack Overflow or respective owner