Search Results

Search found 2372 results on 95 pages for 'significant whitespace'.

Page 21/95 | < Previous Page | 17 18 19 20 21 22 23 24 25 26 27 28  | Next Page >

  • Optimizing a lot of Scanner.findWithinHorizon(pattern, 0) calls

    - by darvids0n
    I'm building a process which extracts data from 6 csv-style files and two poorly laid out .txt reports and builds output CSVs, and I'm fully aware that there's going to be some overhead searching through all that whitespace thousands of times, but I never anticipated converting about about 50,000 records would take 12 hours. Excerpt of my manual matching code (I know it's horrible that I use lists of tokens like that, but it was the best thing I could think of): public static String lookup(List<String> tokensBefore, List<String> tokensAfter) { String result = null; while(_match(tokensBefore)) { // block until all input is read if(id.hasNext()) { result = id.next(); // capture the next token that matches if(_matchImmediate(tokensAfter)) // try to match tokensAfter to this result return result; } else return null; // end of file; no match } return null; // no matches } private static boolean _match(List<String> tokens) { return _match(tokens, true); } private static boolean _match(List<String> tokens, boolean block) { if(tokens != null && !tokens.isEmpty()) { if(id.findWithinHorizon(tokens.get(0), 0) == null) return false; for(int i = 1; i <= tokens.size(); i++) { if (i == tokens.size()) { // matches all tokens return true; } else if(id.hasNext() && !id.next().matches(tokens.get(i))) { break; // break to blocking behaviour } } } else { return true; // empty list always matches } if(block) return _match(tokens); // loop until we find something or nothing else return false; // return after just one attempted match } private static boolean _matchImmediate(List<String> tokens) { if(tokens != null) { for(int i = 0; i <= tokens.size(); i++) { if (i == tokens.size()) { // matches all tokens return true; } else if(!id.hasNext() || !id.next().matches(tokens.get(i))) { return false; // doesn't match, or end of file } } return false; // we have some serious problems if this ever gets called } else { return true; // empty list always matches } } Basically wondering how I would work in an efficient string search (Boyer-Moore or similar). My Scanner id is scanning a java.util.String, figured buffering it to memory would reduce I/O since the search here is being performed thousands of times on a relatively small file. The performance increase compared to scanning a BufferedReader(FileReader(File)) was probably less than 1%, the process still looks to be taking a LONG time. I've also traced execution and the slowness of my overall conversion process is definitely between the first and last like of the lookup method. In fact, so much so that I ran a shortcut process to count the number of occurrences of various identifiers in the .csv-style files (I use 2 lookup methods, this is just one of them) and the process completed indexing approx 4 different identifiers for 50,000 records in less than a minute. Compared to 12 hours, that's instant. Some notes (updated): I don't necessarily need the pattern-matching behaviour, I only get the first field of a line of text so I need to match line breaks or use Scanner.nextLine(). All ID numbers I need start at position 0 of a line and run through til the first block of whitespace, after which is the name of the corresponding object. I would ideally want to return a String, not an int locating the line number or start position of the result, but if it's faster then it will still work just fine. If an int is being returned, however, then I would now have to seek to that line again just to get the ID; storing the ID of every line that is searched sounds like a way around that. Anything to help me out, even if it saves 1ms per search, will help, so all input is appreciated. Thankyou! Usage scenario 1: I have a list of objects in file A, who in the old-style system have an id number which is not in file A. It is, however, POSSIBLY in another csv-style file (file B) or possibly still in a .txt report (file C) which each also contain a bunch of other information which is not useful here, and so file B needs to be searched through for the object's full name (1 token since it would reside within the second column of any given line), and then the first column should be the ID number. If that doesn't work, we then have to split the search token by whitespace into separate tokens before doing a search of file C for those tokens as well. Generalised code: String field; for (/* each record in file A */) { /* construct the rest of this object from file A info */ // now to find the ID, if we can List<String> objectName = new ArrayList<String>(1); objectName.add(Pattern.quote(thisObject.fullName)); field = lookup(objectSearchToken, objectName); // search file B if(field == null) // not found in file B { lookupReset(false); // initialise scanner to check file C objectName.clear(); // not using the full name String[] tokens = thisObject.fullName.split(id.delimiter().pattern()); for(String s : tokens) objectName.add(Pattern.quote(s)); field = lookup(objectSearchToken, objectName); // search file C lookupReset(true); // back to file B } else { /* found it, file B specific processing here */ } if(field != null) // found it in B or C thisObject.ID = field; } The objectName tokens are all uppercase words with possible hyphens or apostrophes in them, separated by spaces. Much like a person's name. As per a comment, I will pre-compile the regex for my objectSearchToken, which is just [\r\n]+. What's ending up happening in file C is, every single line is being checked, even the 95% of lines which don't contain an ID number and object name at the start. Would it be quicker to use ^[\r\n]+.*(objectname) instead of two separate regexes? It may reduce the number of _match executions. The more general case of that would be, concatenate all tokensBefore with all tokensAfter, and put a .* in the middle. It would need to be matching backwards through the file though, otherwise it would match the correct line but with a huge .* block in the middle with lots of lines. The above situation could be resolved if I could get java.util.Scanner to return the token previous to the current one after a call to findWithinHorizon. I have another usage scenario. Will put it up asap.

    Read the article

  • TreeView Root Node Style

    - by ScG
    I have a tree view <asp:TreeView ID="TreeView1" runat="server" DataSourceID="SiteMapDataSource1" ShowExpandCollapse="False"> </asp:TreeView> As you can see, root node is hidden. However when the treenode gets rendered, there is a whitespace to the left of each leafnode. This is probably because the root node is hidden but is taking up space. How do i remove that white space.

    Read the article

  • Make HTML5 code look beautiful!

    - by blinry
    I'm looking for a command line program that pretty-prints (that is, indents, adds line breaks to, harmonizes the whitespace of) HTML5 code. It has to run under Linux (in case you're interested, I want to use it as an filter for nanoc). tidy does too much for me (heck, it alters my doctype!), vim too little. What do you use to make your HTML5 code look beautiful? Maybe there is a way to make tidy cooperate and not alter anything?

    Read the article

  • format string (postcode) in ruby

    - by noddy
    I need to re-format a list of UK postcodes and have started with the following to strip whitespace and capitalize: postcode.upcase.gsub(/\s/,'') I now need to change the postcode so the new postcode will be in a format that will match the following regexp: ^([A-PR-UWYZ0-9][A-HK-Y0-9][AEHMNPRTVXY0-9]?[ABEHMNPRVWXY0-9]? {1,2}[0-9][ABD-HJLN-UW-Z]{2}|GIR 0AA)$ I would be grateful of any assistance.

    Read the article

  • Where are VS/TFS 2010 DIFF options at?

    - by Jaxidian
    I'm newish to TFS and am working with VS and TFS 2010 RC releases. In every other DIFF tool I've used in the past I have had options for configuring how to treat whitespace differences, among other things. Where are these options when working with VS2010 and TFS2010? Thanks!

    Read the article

  • Regular expression for a phone number

    - by Zerobu
    Hello, I would like a regular expression in this format. It Must match one of the following formats: * (###)###-#### * ###-###-#### * ###.###.#### * ########## Strip all whitespace. Make sure it's a valid phone number, then (if necessary) translate it to the first format listed above.

    Read the article

  • Spring to understand properties in YAML

    - by litius
    Did Spring abandon YAML to use as an alternative to .properties / .xml because of: [Spring Developer]: ...YAML was considered, but we thought that counting whitespace significant was a support nightmare in the making... [reference from spring forum] I am confident YAML makes a lot of sense for properties, and I am using it currently on the project, but have difficulties to inject properties in a <property name="productName" value="${client.product.name}" /> fashion. Anything I am missing, or I should create a custom YamlPropertyPlaceholderConfigurer ? Thank you, /Anatoly

    Read the article

  • Building a regexp to split a string

    - by Kivin
    I'm seeking a solution to splitting a string which contains text in the following format: "abcd efgh 'ijklm no pqrs' tuv" which will produce the following results: ['abcd', 'efgh', 'ijklm no pqrs', 'tuv'] In otherwords, it splits by whitespace unless inside of a single quoted string. I think it could be done with .NET regexps using "Lookaround" operators, particularly balancing operators. I'm not so sure about perl.

    Read the article

  • String format or REGEX.

    - by ThePower
    I need an simple way to check whether a string that is sent to my function is of the form: (x + n)(x + m) //the +'s can be minus' //n and m represent a double //x represents the char 'x' Is there a simple string format that I can use to check that this is the form. As opposed to checking each character singularly. The whitespace will be removed to save any confusion. Regards Lloyd

    Read the article

  • Getting text at clicked location in an HTML element

    - by Marc
    I have a div element containing some text. When the user clicks a word inside that div I'd like to highlight just that word. In order to do this I need to know what character position in the text the click occurred at, so I can then locate nearby whitespace and insert some formatting around the word. Finding out where the click occurred within the text is the trick here. Is that kind of thing possible?

    Read the article

  • Reducing size of a character array in Numpy

    - by Morgoth
    Given a character array: In [21]: x = np.array(['a ','bb ','cccc ']) One can remove the whitespace using: In [22]: np.char.strip(x) Out[22]: array(['a', 'bb', 'cccc'], dtype='|S8') but is there a way to also shrink the width of the column to the minimum required size, in the above case |S4?

    Read the article

  • Do you have examples of un-helpful/Obscure error messages

    - by Wiretap
    Yesterday I got this error The processing instruction target matching "[xX][mM][lL]" is not allowed when I investigated, it was caused by whitespace at the very start of my XML document. Not difficult to solve, but I was struck with how unhelpful that particular error message was to identifying the actual problem. So what other examples of obscure errors do people have, and are you willing to admit to some of your own making.

    Read the article

  • Check for a string format...

    - by ThePower
    I need an simple way to check whether a string that is sent to my function is of the form: (x + n)(x + m) //the +'s can be minus' //n and m represent a double //x represents the char 'x' Is there a simple string format that I can use to check that this is the form. As opposed to checking each character singularly. The whitespace will be removed to save any confusion. Regards Lloyd

    Read the article

  • Writing white space to CSV fields in Python?

    - by matt
    When I try to write a field that includes whitespace in it, it gets split into multiple fields on the space. What's causing this? It's driving me insane. Thanks data = open("file.csv", "wb") w = csv.writer(data) w.writerow(['word1', 'word2']) w.writerow(['word 1', 'word2']) data.close() I'll get 2 fields(word1,word2) for first example and 3(word,1,word2) for the second.

    Read the article

  • What's the fastest way to check if a word from one string is in another string?

    - by Mike Trpcic
    I have a string of words; let's call them bad: bad = "foo bar baz" I can keep this string as a whitespace separated string, or as a list: bad = bad.split(" "); If I have another string, like so: str = "This is my first foo string" What's the fasted way to check if any word from the bad string is within my comparison string, and what's the fastest way to remove said word if it's found? #Find if a word is there bad.split(" ").each do |word| found = str.include?(word) end #Remove the word bad.split(" ").each do |word| str.gsub!(/#{word}/, "") end

    Read the article

  • Python File Search Line And Return Specific Number of Lines after Match

    - by Simos Anderson
    I have a text file that has lines representing some data sets. The file itself is fairly long but it contains certain sections of the following format: Series_Name INFO Number of teams : n1 | Team | # | wins | | TeamName1 | x | y | . . . | TeamNamen1 | numn | numn | Some Irrelevant lines Series_Name2 INFO Number of teams : n1 | Team | # | wins | | TeamName1 | num1 | num2 | . where each section has a header that begins with the Series_Name. Each Series_Name is different. The line with the header also includes the number of teams in that series, n1. Following the header line is a set of lines that represents a table of data. For each series there are n1+1 rows in the table, where each row shows an individual team name and associated stats. I have been trying to implement a function that will allow the user to search for a Team name and then print out the line in the table associated with that team. However, certain team names show up under multiple series. To resolve this, I am currently trying to write my code so that the user can search for the header line with series name first and then print out just the following n1+1 lines that represent the data associated with the series. Here's what I have come up with so far: import re print fname = raw_input("Enter filename: ") seriesname = raw_input("Enter series: ") def findcounter(fname, seriesname): logfile = open(fname, "r") pat = 'INFO Number of teams :' for line in logfile: if seriesname in line: if pat in line: s=line pattern = re.compile(r"""(?P<name>.*?) #starting name \s*INFO #whitespace and success \s*Number\s*of\s*teams #whitespace and strings \s*\:\s*(?P<n1>.*)""",re.VERBOSE) match = pattern.match(s) name = match.group("name") n1 = int(match.group("n1")) print name + " has " + str(n1) + " teams" lcount = 0 for line in logfile: if line.startswith(name): if pat in line: while lcount <= n1: s.append(line) lcount += 1 return result The first part of my code works; it matches the header line that the person searches for, parses the line, and then prints out how many teams are in that series. Since the header line basically tells me how many lines are in the table, I thought that I could use that information to construct a loop that would continue printing each line until a set counter reached n1. But I've tried running it, and I realize that the way I've set it up so far isn't correct. So here's my question: How do you return a number of lines after a matched line when given the number of desired lines that follow the match? I'm new to programming, and I apologize if this question seems silly. I have been working on this quite diligently with no luck and would appreciate any help.

    Read the article

  • Prepending character N followed by line numbers

    - by Denis
    Hi, I'm hand editing CNC Gcode text files and need a way to reference locations in the file and on the toolpath. I want to modify each line in the text file so that it begins with the the upper case N character followed by line numbers which increment in tens for every successive line, then a whitespace,followed by the original text file. Can I do this in vi?

    Read the article

  • CSS image float div problem in IE6

    - by Ben Dauphinee
    In the bottom cap of this page (bottom with corners) I seem to be having a weird IE6 issue. I've tried Google with no luck, as really, how do you ask this question. In IE6, the corner images that are floated left and right seem to cause the whitespace to drop. http://www.duncanhadleytriathlon.ca/ Any suggestions for why this may be?

    Read the article

  • I need to parameterize against sql injection in asp classic, what things should I take some time to

    - by Tchalvak
    I can already see that I'm not going to enjoy the experience, but I have to do some sql cleanup on this 1000 file asp classic web-app, and before I get to hacking away at it I'd like to be aware of any major gotchas to watch out for with asp classic/sql parameter preparing/asp whitespace altering. What are some good quick overview resources, and what should I watch out for?

    Read the article

  • contenteditable realtime replace youtube url

    - by pimz
    so the problem is, i have a contenteditable div, with a keyup function binded. everytime somebody puts a youtube url in it, it has to be replaced by an embedded movie. i came up with a regex like this : content.match(/http:\/\/\w{0,3}.?youtube+\.\w{2,3}\/watch\?v=.*?(?=\s)/g); firefox wil do the replace after a whitespace, but in ie it won't work. any suggestions? thnx in advance!

    Read the article

< Previous Page | 17 18 19 20 21 22 23 24 25 26 27 28  | Next Page >