How to replace custom IDs in the order of their appearance with a shell script?
Posted
by Péter Török
on Stack Overflow
See other posts from Stack Overflow
or by Péter Török
Published on 2010-04-20T10:49:13Z
Indexed on
2010/04/20
10:53 UTC
Read the original article
Hit count: 313
I have a pair of rather large log files with very similar content, except that some identifiers are different between the two. A couple of examples:
UnifiedClassLoader3@19518cc | UnifiedClassLoader3@d0357a
JBossRMIClassLoader@13c2d7f | JBossRMIClassLoader@191777e
That is, wherever the first file contains UnifiedClassLoader3@19518cc
, the second contains UnifiedClassLoader3@d0357a
, and so on.
I want to replace these with identical IDs so that I can spot the really important differences between the two files. I.e. I want to replace all occurrences of both UnifiedClassLoader3@19518cc
in file1 and UnifiedClassLoader3@d0357a
in file2 with UnifiedClassLoader3@1
; all occurrences of both JBossRMIClassLoader@13c2d7f
in file1 and JBossRMIClassLoader@191777e
in file2 with JBossRMIClassLoader@2
etc.
Using the Cygwin shell, so far I managed to list all different identifiers occurring in one of the files with
grep -o -e 'ClassLoader[0-9]*@[0-9a-f][0-9a-f]*' file1.log | sort | uniq
However, now the original order is lost, so I don't know which is the pair of which ID in the other file. With grep -n
I can get the line number, so the sort would preserve the order of appearance, but then I can't weed out the duplicate occurrences. Unfortunately grep can not print only the first match of a pattern.
I figured I could save the list of identifiers produced by the above command into a file, then iterate over the patterns in the file with grep -n | head -n 1
, concatenate the results and sort them again. The result would be something like
2 ClassLoader3@19518cc
137 ClassLoader@13c2d7f
563 ClassLoader3@1267649
...
Then I could (either manually or with sed
itself) massage this into a sed
command like
sed -e 's/ClassLoader3@19518cc/ClassLoader3@2/g'
-e 's/ClassLoader@13c2d7f/ClassLoader@137/g'
-e 's/ClassLoader3@1267649/ClassLoader3@563/g'
file1.log > file1_processed.log
and similarly for file2.
However, before I start, I would like to verify that my plan is the simplest possible working solution to this.
Is there any flaw in this approach? Is there a simpler way?
© Stack Overflow or respective owner