Perl Regex - Condensing groups of find/replace
Posted
by brydgesk
on Stack Overflow
See other posts from Stack Overflow
or by brydgesk
Published on 2010-06-07T21:13:34Z
Indexed on
2010/06/07
21:42 UTC
Read the original article
Hit count: 329
perl
I'm using Perl to perform some file cleansing, and am running into some performance issues. One of the major parts of my code involves standardizing name fields. I have several sections that look like this:
sub substitute_titles
{
my ($inStr) = @_;
${$inStr} =~ s/ PHD./ PHD /;
${$inStr} =~ s/ P H D / PHD /;
${$inStr} =~ s/ PROF./ PROF /;
${$inStr} =~ s/ P R O F / PROF /;
${$inStr} =~ s/ DR./ DR /;
${$inStr} =~ s/ D.R./ DR /;
${$inStr} =~ s/ HON./ HON /;
${$inStr} =~ s/ H O N / HON /;
${$inStr} =~ s/ MR./ MR /;
${$inStr} =~ s/ MRS./ MRS /;
${$inStr} =~ s/ M R S / MRS /;
${$inStr} =~ s/ MS./ MS /;
${$inStr} =~ s/ MISS./ MISS /;
}
I'm passing by reference to try and get at least a little speed, but I fear that running so many (literally hundreds) of specific string replaces on tens of thousands (likely hundreds of thousands eventually) of records is going to hurt the performance.
Is there a better way to implement this kind of logic than what I'm doing currently?
Thanks
Edit: Quick note, not all the replace functions are just removing periods and spaces. There are string deletions, soundex groups, etc.
© Stack Overflow or respective owner