I have a program that does string manipulation on very large strings (around 100K). The first step in my program is to cleanup the input string so that it only contains certain characters. Here is my method for this cleanup:
public static String analyzeString (String input) {
String output = null;
output = input.replaceAll("[-+.^:,]","");
output = output.replaceAll("(\\r|\\n)", "");
output = output.toUpperCase();
output = output.replaceAll("[^XYZ]", "");
return output;
}
When i print my 'input' string of length 97498, it prints successfully. My output string after cleanup is of length 94788. I can print the size using output.length() but when I try to print this in Eclipse, output is empty and i can see in eclipse output console header. Since this is not my final program, so I ignored this and proceeded to next method that does pattern matching on this 'cleaned-up' string. Here is code for pattern matching:
public static List<Integer> getIntervals(String input, String regex) {
List<Integer> output = new ArrayList<Integer> ();
// Do pattern matching
Pattern p1 = Pattern.compile(regex);
Matcher m1 = p1.matcher(input);
// If match found
while (m1.find()) {
output.add(m1.start());
output.add(m1.end());
}
return output;
}
Based on this program, i identify the start and end intervals of my pattern match as 12351 and 87314. I tried to print this match as output.substring(12351, 87314) and only get blank output. Numerous hit and trial runs resulted in the conclusion that biggest substring that i can print is of length 4679. If i try 4680, i again get blank input. My confusion is that if i was able to print original string (97498) length, why i couldnt print the cleaned-up string (length 94788) or the substring (length 4679). Is it due to regular expression implementation which may be causing some memory issues and my system is not able to handle that? I have 4GB installed memory.