Need some ideas on how to acomplish this in Java (parsing strings)

Posted by Matt on Stack Overflow See other posts from Stack Overflow or by Matt
Published on 2010-05-02T06:05:11Z Indexed on 2010/05/02 6:07 UTC
Read the original article Hit count: 182

Filed under:
|

Sorry I couldn't think of a better title, but thanks for reading!

My ultimate goal is to read a .java file, parse it, and pull out every identifier. Then store them all in a list. Two preconditions are there are no comments in the file, and all identifiers are composed of letters only.

Right now I can read the file, parse it by spaces, and store everything in a list. If anything in the list is a java reserved word, it is removed. Also, I remove any loose symbols that are not attached to anything (brackets and arithmetic symbols).

Now I am left with a bunch of weird strings, but at least they have no spaces in them. I know I am going to have to re-parse everything with a . delimiter in order to pull out identifiers like System.out.print, but what about strings like this example:

Logger.getLogger(MyHash.class.getName()).log(Level.SEVERE,

After re-parsing by . I will be left with more crazy strings like:

getLogger(MyHash

getName())

log(Level

SEVERE,

How am I going to be able to pull out all the identifiers while leaving out all the trash? Just keep re-parsing by every symbol that could exist in java code? That seems rather lame and time consuming. I am not even sure if it would work completely. So, can you suggest a better way of doing this?

© Stack Overflow or respective owner

Related posts about java

Related posts about parsing