Perl Regex - Condensing groups of find/replace

Posted by brydgesk on Stack Overflow See other posts from Stack Overflow or by brydgesk
Published on 2010-06-07T21:13:34Z Indexed on 2010/06/07 21:42 UTC
Read the original article Hit count: 385

Filed under:

perl

I'm using Perl to perform some file cleansing, and am running into some performance issues. One of the major parts of my code involves standardizing name fields. I have several sections that look like this:

sub substitute_titles
{
    my ($inStr) = @_;
    ${$inStr} =~ s/ PHD./ PHD /;
    ${$inStr} =~ s/ P H D / PHD   /;
    ${$inStr} =~ s/ PROF./ PROF /;
    ${$inStr} =~ s/ P R O F / PROF    /;
    ${$inStr} =~ s/ DR./ DR /;
    ${$inStr} =~ s/ D.R./ DR  /;
    ${$inStr} =~ s/ HON./ HON /;
    ${$inStr} =~ s/ H O N / HON   /;
    ${$inStr} =~ s/ MR./ MR /;
    ${$inStr} =~ s/ MRS./ MRS /;
    ${$inStr} =~ s/ M R S / MRS   /;
    ${$inStr} =~ s/ MS./ MS /;
    ${$inStr} =~ s/ MISS./ MISS /;
}

I'm passing by reference to try and get at least a little speed, but I fear that running so many (literally hundreds) of specific string replaces on tens of thousands (likely hundreds of thousands eventually) of records is going to hurt the performance.

Is there a better way to implement this kind of logic than what I'm doing currently?

Thanks

Edit: Quick note, not all the replace functions are just removing periods and spaces. There are string deletions, soundex groups, etc.

Related posts about perl

Munin on Centos 6 - missing perl MODULE_COMPAT_5.8.8

as seen on Server Fault - Search for 'Server Fault'
I'm trying to install Munin on a new VPS through yum install munin but I keep getting an error about a missing perl module: Requires: perl(:MODULE_COMPAT_5.8.8). This is the perl version currently installed: v5.10.1. I've searched all around and still haven't found a solution for this. Here's the… >>> More
Pain removing a perl rootkit

as seen on Server Fault - Search for 'Server Fault'
So, we host a geoservice webserver thing at the office. Someone apparently broke into this box (probably via ftp or ssh), and put some kind of irc-managed rootkit thing. Now I'm trying to clean the whole thing up, I found the process pid who tries to connect via irc, but i can't figure out who's… >>> More
How To Avoid a Perl script calling an Another Perl Script

as seen on Stack Overflow - Search for 'Stack Overflow'
Hello, i am calling a perl script client.pl from a main script to capture the output of client.pl in @output. is there anyway to avoid the use of these two files so i can use the output of client.pl in main.pl itself here is my code.... main.pl ======= my @output = readpipe("client.pl"); client… >>> More
Perl :how to sort dates in perl

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi, How can I sort the dates in perl. my @dates = ( "02/11/2009" , "12/20/2001" , "11/21/2010" ) ; I have above dates in my array . How can I sort those dates... ? My date format is dd/mm/YYYY. >>> More
please suggest a perl book exclusively for perl programs

as seen on Stack Overflow - Search for 'Stack Overflow'
I want tha name of a perl book for only PERL PROGRAMS. The reason behind is I want to improve my programming skill in perl >>> More

Developer IT