Need Help Customizing a Grammar Checking Replace Rule in Java

Posted by user567785 on Stack Overflow See other posts from Stack Overflow or by user567785
Published on 2011-01-08T05:50:12Z Indexed on 2011/01/08 5:53 UTC
Read the original article Hit count: 458

Filed under:

java

Hello, I am currently adding the Khmer (Cambodian) language to LanguageTool, an opensource grammar checker for OpenOffice (http://www.languagetool.org).

I don't know enough Java to customize one of the scripts and wanted to make a request here asking if anyone would be willing to customize it for me (I can put link to your website at http://www.sbbic.org/lang/en-us/volunteer/ if you help).

Here is the script that needs customization KhmerWordCoherencyRule.java:

/* LanguageTool, a natural language style checker 
 * Copyright (C) 2005 Daniel Naber (http://www.danielnaber.de)
 * 
 * This library is free software; you can redistribute it and/or
 * modify it under the terms of the GNU Lesser General Public
 * License as published by the Free Software Foundation; either
 * version 2.1 of the License, or (at your option) any later version.
 *
 * This library is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 * Lesser General Public License for more details.
 *
 * You should have received a copy of the GNU Lesser General Public
 * License along with this library; if not, write to the Free Software
 * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301
 * USA
 */
package de.danielnaber.languagetool.rules.km;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Locale;
import java.util.Map;
import java.util.ResourceBundle;

import de.danielnaber.languagetool.AnalyzedSentence;
import de.danielnaber.languagetool.AnalyzedToken;
import de.danielnaber.languagetool.AnalyzedTokenReadings;
import de.danielnaber.languagetool.JLanguageTool;
import de.danielnaber.languagetool.tools.StringTools;
import de.danielnaber.languagetool.rules.Category;
import de.danielnaber.languagetool.rules.RuleMatch;

/**
 * A Khmer rule that matches words or phrases which should not be used and suggests
 * correct ones instead. Loads the relevant words  from 
 * <code>rules/km/coherency.txt</code>, where km is a code of the language.
 * 
 * @author Andriy Rysin
 */
public abstract class KhmerWordCoherencyRule extends KhmerRule {

  private static final String FILE_ENCODING = "utf-8";

  private Map<String, String> wrongWords; // e.g. "????? -> "?????"

  private static final String FILE_NAME = "/km/coherency.txt";

  public abstract String getFileName();

  public String getEncoding() {
    return FILE_ENCODING;
  }

  /**
   * Indicates if the rule is case-sensitive. Default value is <code>true</code>.
   * @return true if the rule is case-sensitive, false otherwise.
   */
   //in Khmer there is no case
  public boolean isCaseSensitive() {
    return false;  
  }

  /**
   * @return the locale used for case conversion when {@link #isCaseSensitive()} is set to <code>false</code>.
   */
  public Locale getLocale() {
    return Locale.getDefault();
  }  

  public KhmerWordCoherencyRule(final ResourceBundle messages) throws IOException {
    if (messages != null) {
      super.setCategory(new Category(messages.getString("category_misc")));
    }
    wrongWords = loadWords(JLanguageTool.getDataBroker().getFromRulesDirAsStream(getFileName()));
  }

  public String getId() {
    return "KM_WORD_COHERENCY";
  }

  public String getDescription() {
    return "Checks for wrong words/phrases";
  }

  public String getSuggestion() {
    return " does not match your previous spelling of the word, use ";
  }

  public String getShort() {
    return "Use a consistant spelling throughout";
  }

  public final RuleMatch[] match(final AnalyzedSentence text) {
    final List<RuleMatch> ruleMatches = new ArrayList<RuleMatch>();
    final AnalyzedTokenReadings[] tokens = text.getTokensWithoutWhitespace();

    for (int i = 1; i < tokens.length; i++) {
      final String token = tokens[i].getToken();

      final String origToken = token;
      final String replacement = isCaseSensitive()?wrongWords.get(token):wrongWords.get(token.toLowerCase(getLocale()));
      if (replacement != null) {
        final String msg = token + getSuggestion() + replacement;
        final int pos = tokens[i].getStartPos();
        final RuleMatch potentialRuleMatch = new RuleMatch(this, pos, pos
            + origToken.length(), msg, getShort());
        if (!isCaseSensitive() && StringTools.startsWithUppercase(token)) {
          potentialRuleMatch.setSuggestedReplacement(StringTools.uppercaseFirstChar(replacement));
        } else {
          potentialRuleMatch.setSuggestedReplacement(replacement);
        }
        ruleMatches.add(potentialRuleMatch);
      }
    }
    return toRuleMatchArray(ruleMatches);
  }


  private Map<String, String> loadWords(final InputStream file) throws IOException {
    final Map<String, String> map = new HashMap<String, String>();
    InputStreamReader isr = null;
    BufferedReader br = null;
    try {
      isr = new InputStreamReader(file, getEncoding());
      br = new BufferedReader(isr);
      String line;

      while ((line = br.readLine()) != null) {
        line = line.trim();
        if (line.length() < 1) {
          continue;
        }
        if (line.charAt(0) == '#') { // ignore comments
          continue;
        }
        final String[] parts = line.split(";");
        if (parts.length != 2) {
          throw new IOException("Format error in file "
              + JLanguageTool.getDataBroker().getFromRulesDirAsUrl(getFileName()) + ", line: " + line);
        }
        map.put(parts[0], parts[1]);
      }

    } finally {
      if (br != null) {
        br.close();
      }
      if (isr != null) {
        isr.close();
      }
    }
    return map;
  }

  public void reset() {
  }  

}

Here is what I need the SimpleReplaceRule.java to do: 1 - Be able to have more than two spelling variations in the coherency.txt file (right now it can only be Word1;Word2). 2 - Find the first use of ANY of the spelling variations in a document that are found in coherency.txt and then make sure only that spelling is used throughout the document (ex. in the coherency.txt I have Word1;Word2;Word3 then in my document on the first line I write Word2. then on next line I write Word1 and Word 3 - then the grammar checker will flag Word1 and Word3 saying that I should use the spelling "Word2" instead...etc.).

If anyone can help I would be grateful! Thanks for your time, Nathan

Developer IT

Need Help Customizing a Grammar Checking Replace Rule in Java - Developer IT

Need Help Customizing a Grammar Checking Replace Rule in Java

java

Related posts about java

Tomcat 6: Access Control Exception?

Problem in creation MDB Queue connection at Jboss StartUp

failing to establish connection between Postgres db and gwt

failing to establish connection between postgre db and gwt

Migration and deployement problems JBoss 4.2.2.GA to JBoss 6.0.0.M2

Categories cloud