Perl Regex - Condensing groups of find/replace

Posted by brydgesk on Stack Overflow See other posts from Stack Overflow or by brydgesk
Published on 2010-06-07T21:13:34Z Indexed on 2010/06/07 21:42 UTC
Read the original article Hit count: 329

Filed under:

I'm using Perl to perform some file cleansing, and am running into some performance issues. One of the major parts of my code involves standardizing name fields. I have several sections that look like this:

sub substitute_titles
{
    my ($inStr) = @_;
    ${$inStr} =~ s/ PHD./ PHD /;
    ${$inStr} =~ s/ P H D / PHD   /;
    ${$inStr} =~ s/ PROF./ PROF /;
    ${$inStr} =~ s/ P R O F / PROF    /;
    ${$inStr} =~ s/ DR./ DR /;
    ${$inStr} =~ s/ D.R./ DR  /;
    ${$inStr} =~ s/ HON./ HON /;
    ${$inStr} =~ s/ H O N / HON   /;
    ${$inStr} =~ s/ MR./ MR /;
    ${$inStr} =~ s/ MRS./ MRS /;
    ${$inStr} =~ s/ M R S / MRS   /;
    ${$inStr} =~ s/ MS./ MS /;
    ${$inStr} =~ s/ MISS./ MISS /;
}

I'm passing by reference to try and get at least a little speed, but I fear that running so many (literally hundreds) of specific string replaces on tens of thousands (likely hundreds of thousands eventually) of records is going to hurt the performance.

Is there a better way to implement this kind of logic than what I'm doing currently?

Thanks

Edit: Quick note, not all the replace functions are just removing periods and spaces. There are string deletions, soundex groups, etc.

© Stack Overflow or respective owner

Related posts about perl