How to replace custom IDs in the order of their appearance with a shell script?

Posted by Péter Török on Stack Overflow See other posts from Stack Overflow or by Péter Török
Published on 2010-04-20T10:49:13Z Indexed on 2010/04/20 10:53 UTC
Read the original article Hit count: 308

Filed under:
|
|
|

I have a pair of rather large log files with very similar content, except that some identifiers are different between the two. A couple of examples:

UnifiedClassLoader3@19518cc | UnifiedClassLoader3@d0357a
JBossRMIClassLoader@13c2d7f | JBossRMIClassLoader@191777e

That is, wherever the first file contains UnifiedClassLoader3@19518cc, the second contains UnifiedClassLoader3@d0357a, and so on.

I want to replace these with identical IDs so that I can spot the really important differences between the two files. I.e. I want to replace all occurrences of both UnifiedClassLoader3@19518cc in file1 and UnifiedClassLoader3@d0357a in file2 with UnifiedClassLoader3@1; all occurrences of both JBossRMIClassLoader@13c2d7f in file1 and JBossRMIClassLoader@191777e in file2 with JBossRMIClassLoader@2 etc.

Using the Cygwin shell, so far I managed to list all different identifiers occurring in one of the files with

grep -o -e 'ClassLoader[0-9]*@[0-9a-f][0-9a-f]*' file1.log | sort | uniq

However, now the original order is lost, so I don't know which is the pair of which ID in the other file. With grep -n I can get the line number, so the sort would preserve the order of appearance, but then I can't weed out the duplicate occurrences. Unfortunately grep can not print only the first match of a pattern.

I figured I could save the list of identifiers produced by the above command into a file, then iterate over the patterns in the file with grep -n | head -n 1, concatenate the results and sort them again. The result would be something like

2 ClassLoader3@19518cc
137 ClassLoader@13c2d7f
563 ClassLoader3@1267649
...

Then I could (either manually or with sed itself) massage this into a sed command like

sed -e 's/ClassLoader3@19518cc/ClassLoader3@2/g' 
    -e 's/ClassLoader@13c2d7f/ClassLoader@137/g' 
    -e 's/ClassLoader3@1267649/ClassLoader3@563/g' 
    file1.log > file1_processed.log

and similarly for file2.

However, before I start, I would like to verify that my plan is the simplest possible working solution to this.

Is there any flaw in this approach? Is there a simpler way?

© Stack Overflow or respective owner

Related posts about shell

Related posts about cygwin