JAVA : How to get the positions of all matches in a String?

Posted by user692704 on Stack Overflow See other posts from Stack Overflow or by user692704
Published on 2012-11-10T22:55:48Z Indexed on 2012/11/10 22:59 UTC
Read the original article Hit count: 137

Filed under:
|
|

I have a text document and a query (the query could be more than one word). I want to find the position of all occurrences of the query in the document.

I thought of the documentText.indexOf(query) and using regular expression but I could not make it work.

I end up with the following method:

First, I have create a dataType called QueryOccurrence

public class QueryOccurrence implements Serializable{
  public QueryOccurrence(){}
  private int start;
  private int end;      

  public QueryOccurrence(int nameStart,int nameEnd,String nameText){
    start=nameStart;
    end=nameEnd;        
  }

  public int getStart(){
    return start;
  }

  public int getEnd(){
    return end;
  }

  public void SetStart(int i){
    start=i;
  }

  public void SetEnd(int i){
     end=i;
  }
}

Then, I have used this datatype in the following method:

    public static List<QueryOccurrence>FindQueryPositions(String documentText, String query){

    // Normalize do the following: lower case, trim, and remove punctuation
    String normalizedQuery = Normalize.Normalize(query);
    String normalizedDocument = Normalize.Normalize(documentText);

    String[] documentWords = normalizedDocument.split(" ");;               
    String[] queryArray = normalizedQuery.split(" ");


    List<QueryOccurrence> foundQueries = new ArrayList();
    QueryOccurrence foundQuery = new QueryOccurrence();

    int index = 0;

    for (String word : documentWords) {            

        if (word.equals(queryArray[0])){
            foundQuery.SetStart(index);
        }

        if (word.equals(queryArray[queryArray.length-1])){
            foundQuery.SetEnd(index);
            if((foundQuery.End()-foundQuery.Start())+1==queryArray.length){

                //add the found query to the list
                foundQueries.add(foundQuery);
                //flush the foundQuery variable to use it again
                foundQuery= new QueryOccurrence();
            }
        }

        index++;
    }
    return foundQueries;
}

This method return a list of all occurrence of the query in the document each one with its position.

Could you suggest any easer and faster way to accomplish this task.

Thanks

© Stack Overflow or respective owner

Related posts about java

Related posts about string