lookahead - Developer IT

Using lookahead assertions in regular expressions

- by Greg Jackson

I use regular expressions on a daily basis, as my daily work is 90% in Perl (legacy codebase, but that's a different issue). Despite this, I still find lookahead and lookbehind to be terribly confusing and often unreadable. Right now, if I were to get a code review with a lookahead or lookbehind, I would immediately send it back to see if the problem can be solved by using multiple regular expressions or a different approach. The following are the main reasons I tend not to like them: They can be terribly unreadable. Lookahead assertions, for example, start from the beginning of the string no matter where they are placed. That, among other things, can cause some very "interesting" and non-obvious behaviors. It used to be the case that many languages didn't support lookahead/lookbehind (or supported them as "experimental features"). This isn't the case quite as much, but there's still always the question as to how well it's supported. Quite frankly, they feel like a dirty hack. Regexps often already are, but they can also be quite elegant, and have gained widespread acceptance. I've gotten by without any need for them at all... sometimes I think that they're extraneous. Now, I'll freely admit that especially the last two reasons aren't really good ones, but I felt that I should enumerate what goes through my mind when I see one. I'm more than willing to change my mind about them, but I feel that they violate some of my core tenets of programming, including: Code should be as readable as possible without sacrificing functionality -- this may include doing something in a less efficient, but clearer was as long as the difference is negligible or unimportant to the application as a whole. Code should be maintainable -- if another programmer comes along to fix my code, non-obvious behavior can hide bugs or make functional code appear buggy (see readability) "The right tool for the right job" -- I'm sure you can come up with contrived examples that could use lookahead, but I've never come across something that really needs them in my real-world development work. Is there anything that they're really the best tool for, as opposed to, say, multiple regexps (or, alternatively, are they the best tool for most cases they're used for today). My question is this: Is it good practice to use lookahead/lookbehind in regular expressions, or are they simply a hack that have found their way into modern production code? I'd be perfectly happy to be convinced that I'm wrong about this, and simple examples are useful for examples or illustration, but by themselves, won't be enough to convince me.

Read the article

Naming convetion of regex,lookahead and lookbehind

- by user198729

Why is it counter intuitive? /(?<!\d)\d{8}(?!\d)/,here (?<!\d) comes first,but called lookbehind,(?!\d) next,but called lookahead.All are counter intuitive. What's the reason to name it this way?

Read the article

LookAhead Regex in .Net - unexpected result

- by AaronM

Hello, I am a bit puzzled with my Regex results (and still trying to get my head around the syntax). I have been using http://regexpal.com/ to test out my expression, and its works as intended there, however in C# its not as expected. Here is a test - an expression of the following: (?=<open>).*?(?=</open>) on an input string of: <open>Text 1 </open>Text 2 <open>Text 3 </open>Text 4 <open>Text 5 </open> I would expect a result back of <open>Text1 <open>Text 2 <open>Text 3... etc However when I do this in C# it only returns the first match of <open>Text1 How do I get all five 'results' back from the Regex? Regex exx = new Regex("(?=<open>).*?(?=</open>)", RegexOptions.IgnoreCase | RegexOptions.Singleline); string input = "<open>Text 1</open> Text 2 <open> Text 3 </open> Text 4 <open> Text 5 </open>"; string result = Regex.Match(input, exx.ToString(), exx.Options).ToString();

Read the article

How to make negative lookahead work with end of line text

- by bryan-rasmussen

Hi, I have a regex like the following: .{0,1000}(?!(xa7|para(graf))$) using Java. I was expecting that it would cause the following text to fail: blaparagraf because paragraf is found at the end

Read the article

Regex negative lookahead

- by Alyn

I need to modify this regex href=\"(.*)\" which matches this... href="./pothole_locator_map.aspx?lang=en-gb&lat=53.153977&lng=-3.533306" To NOT match this... href="./pothole_locator_map.aspx?lang=en-gb&lat=53.153977&lng=-3.533306&returnurl=AbandonedVehicles.aspx" Tried this, but with no luck href=\"(.*)\"(?!&returnurl=AbandonedVehicles.aspx) Any help would be much appreciated. Thanks, Al.

Read the article

C# regex: negative lookahead fails with the single line option

- by Sylverdrag

I am trying to figure out why a regex with negative look ahead fails when the "single line" option is turned on. Example (simplified): <source>Test 1</source> <source>Test 2</source> <target>Result 2</target> <source>Test 3</source> This: <source>(?!.*<source>)(.*?)</source>(?!\s*<target) will fail if the single line option is on, and will work if the single line option is off. For instance, this works (disables the single line option): (?-s:<source>(?!.*<source>)(.*?)</source>(?!\s*<target)) My understanding is that the single line mode simply allows the dot "." to match new lines, and I don't see why it would affect the expression above. Can anyone explain what I am missing here?

Read the article

Poor performance / speed of regex with lookahead

- by Hugo Zaragoza

I have been observing extremely slow execution times with expressions with several lookaheads. I suppose that this is due to underlying data structures, but it seems pretty extreme and I wonder if I do something wrong or if there are known work-arounds. The problem is determining if a set of words are present in a string, in any order. For example we want to find out if two terms "term1" AND "term2" are somewhere in a string. I do this with the expresion: (?=.*\bterm1\b)(?=.*\bterm2\b) But what I observe is that this is an order of magnitude slower than checking first just \bterm1\b and just then \bterm2\b This seems to indicate that I should use an array of patterns instead of a single pattern with lookaheads... is this right? it seems wrong... Here is an example test code and resulting times: public static void speedLookAhead() { Matcher m, m1, m2; boolean find; int its = 1000000; // create long non-matching string char[] str = new char[2000]; for (int i = 0; i < str.length; i++) { str[i] = 'x'; } String test = str.toString(); // First method: use one expression with lookaheads m = Pattern.compile("(?=.*\\bterm1\\b)(?=.*\\bterm2\\b)").matcher(test); long time = System.currentTimeMillis(); ; for (int i = 0; i < its; i++) { m.reset(test); find = m.find(); } time = System.currentTimeMillis() - time; System.out.println(time); // Second method: use two expressions and AND the results m1 = Pattern.compile("\\bterm1\\b").matcher(test); m2 = Pattern.compile("\\bterm2\\b").matcher(test); time = System.currentTimeMillis(); ; for (int i = 0; i < its; i++) { m1.reset(test); m2.reset(test); find = m1.find() && m2.find(); } time = System.currentTimeMillis() - time; System.out.println(time); } This outputs in my computer: 1754 150

Read the article

How to get lookahead symbol when constructing LR(1) NFA for parser?

- by greenoldman

I am reading an explanation (awesome "Parsing Techniques" by D.Grune and C.J.H.Jacobs; p.292 in the 2nd edition) about how to construct an LR(1) parser, and I am at the stage of building the initial NFA. What I don't understand is how to get/compute a lookahead symbol. Here is the example from the book, the grammar: S -> E E -> E - T E -> T T -> ( E ) T -> n n is terminal. The "weird" transitions for me are is the sequence: 1) S -> . E eof 2) E -> . E - T eof 3) E -> . E - T - 4) E -> E . - T - 5) E -> E - . T - (Note: In the above table, the state numbers are in front and the lookahead symbol is at the end.) What puzzles me is that transition from (4) to (5) means reading - token, right? So how is it that - is still a lookahead symbol and even more important why is it that eof is no longer a lookahead symbol? After all in an input such as n - n eof there is only one - symbol. My naive thinking tells me (5) should be written as: 5) E -> E - . T - eof And another thing -- n is terminal. Why it is not used at all as a lookahead symbol? I mean -- we expect to see - or (, it is ok, but lack of n means we are sure it won't appear in input? Update: after more reading I am only more confused ;-) I.e. what is really a lookahead? Because I see such state as (p.292, 2nd column, 2nd row): E -> E . - T eof Lookahead says eof but the incoming input says -. Isn't it a contradiction? And it is not only in this book.

Read the article

JavaScript regex - positive lookahead -- giving me syntax errors

- by Tourshi

This piece of regex (?<=href\=")[^]+?(?=#_) is supposed to match everything in a href value except the the hash value and what follows it within the href url. It appears to work fine under Regex debuggers/testers such as http://gskinner.com/RegExr/ but in javascript it appears to produce syntax error. If i remove the < from the (?<=) it works however, that's not the positive lookahead I am looking for. I am pulling my hair off, as usual, thanks to Regex lol Please help

Read the article

Regex: Using lookahead assertion to check if character exist at most a certain number of times

- by teggy

How do I use lookahead assertion to determine if a certain character exist at most a certain number of times in a string. For example, let's say I want to check a string that has at least one character to make sure that it contains "@" at most 2 times. Thanks in advance. Using python if that matters.

Read the article

Python Regular Expressions: Capture lookahead value (capturing text without consuming it)

- by Lattyware

I wish to use regular expressions to split words into groups of (vowels, not_vowels, more_vowels), using a marker to ensure every word begins and ends with a vowel. import re MARKER = "~" VOWELS = {"a", "e", "i", "o", "u", MARKER} word = "dog" if word[0] not in VOWELS: word = MARKER+word if word[-1] not in VOWELS: word += MARKER re.findall("([%]+)([^%]+)([%]+)".replace("%", "".join(VOWELS)), word) In this example we get: [('~', 'd', 'o')] The issue is that I wish the matches to overlap - the last set of vowels should become the first set of the next match. This appears possible with lookaheads, if we replace the regex as follows: re.findall("([%]+)([^%]+)(?=[%]+)".replace("%", "".join(VOWELS)), word) We get: [('~', 'd'), ('o', 'g')] Which means we are matching what I want. However, it now doesn't return the last set of vowels. The output I want is: [('~', 'd', 'o'), ('o', 'g', '~')] I feel this should be possible (if the regex can check for the second set of vowels, I see no reason it can't return them), but I can't find any way of doing it beyond the brute force method, looping through the results after I have them and appending the first character of the next match to the last match, and the last character of the string to the last match. Is there a better way in which I can do this? The two things that would work would be capturing the lookahead value, or not consuming the text on a match, while capturing the value - I can't find any way of doing either.

Read the article

Regular Expressions: Positive Lookahead and Word Border question

- by Inf.S

Hello again Stackoverflow people! Assume I have these words: smartphones, smartphone I want to match the substring "phone" from within them. However, in both case, I want only "phone" to be returned, not "phones" in the first case. In addition to this, I want matches only if the word "phone" is a suffix only, such that: fonephonetics (just an example) is not matched. I assumed that the regex (phone([?=s])?)\b would give me what I need, but it is currently matching "phones" and "phone", but not the "fonephonetics" one. I don't need "phones". I want "phone" for both cases. Any ideas about what is wrong, and what I can do? Thank you in advance!

Read the article

Lookahead regex produces unexpected group

- by Ivan Yatskevich

I'm trying to extract a page name and query string from a URL which should not contain .html Here is an example code in Java: public class TestRegex { public static void main(String[] args) { Pattern pattern = Pattern.compile("/test/(((?!\\.html).)+)\\?(.+)"); Matcher matcher = pattern.matcher("/test/page?param=value"); System.out.println(matcher.matches()); System.out.println(matcher.group(1)); System.out.println(matcher.group(2)); } } By running this code one can get the following output: true page e What's wrong with my regex so the second group contains the letter e instead of param=value?

Read the article

lookahead and group

- by Istao

Hi, In Java, on a text like foo <on> bar </on> thing <on> again</on> now, I should want a regex with groups wich give me with a find "foo", "bar", empty string, then "thing", "again", "now". If I do (.*?)<on>(.*?)</on>(?!<on>), I get only two group (foo bar, thing again, and I've not the end "now"). if I do (.*?)<on>(.*?)</on>((?!<on>)) I get foo bar empty string, then thing again and empty string (here I should want "now"). Please what is the magical formula ? Thanks.

Read the article

RegExp: want to find all links that do not end in ".html"

- by grovel

Hi, I'm a relative novice to regular expressions (although I've used them many times successfully). I want to find all links in a document that do not end in ".html" The regular expression I came up with is: href=\"([^"]*)(?<!html)\" In Notepad++, my editor, href=\"([^"]*)\" finds all the links (both those that end in "html" and those that do not). Why doesn't negative lookbehind work? I've also tried lookahead: href=\"[^"]*(?!html\") but that didn't work either. Can anybody help? Cheers, grovel

Read the article

RegEx: h1 followed by h2 without p in between

- by voodoo555

Hey everyone, I need a regular expression to find out whether or not a h1 tag is followed by a h2 tag, without any paragraph elements in between. I tried to use a negative lookahead but it doesn't work: <h1(.+?)</h1>(\s|(?!<p))*<h2(.+?)</h2>

Read the article

Regexp for selecting spaces between digits and decimal char

- by Tirithen

I want to remove spaces from strings where the space is preceeded by a digit or a "." and acceded by a digit or ".". I have strings like: "50 .10", "50 . 10", "50. 10" and I want them all to become "50.10" but with an unknown number of digits on either side. I'm trying with lookahead/lookbehind assertions like this: $row = str_replace("/(?<=[0-9]+$)\s*[.]\s*(?=[0-9]+$)/", "", $row); But it does not work...

Read the article

Does lookaround affect which languages can be matched by regular expressions?

- by sepp2k

There are some features in modern regex engines which allow you to match languages that couldn't be matched without that feature. For example the following regex using back references matches the language of all strings that consist of a word that repeats itself: (.+)\1. This language is not regular and can't be matched by a regex, which does not use back references. My question: Does lookaround also affect which languages can be matched by a regular expression? I.e. are there any languages that can be matched using lookaround, which couldn't be matched otherwise? If so, is this true for all flavors of lookaround (negative or positive lookahead or lookbehind) or just for some of them?

Read the article

Regex Replace Between " Encoding

- by Eric Hendrickson

I want to be able to replace style="STUFF" I keep thinking that this is the correct REGEX: style=(")(?!")*(") But for some reason that won't match. Any ideas?

Read the article

PHP - REGEX - use string for pattern but exclude it from being removed!

- by aSeptik

Hi All guys! i'm pretty new on regex, i have learned something by the way, but is still pour knowledge! so i want ask you for clarification on how it work! assuming i have the following strings, as you can see they can be formatted little different way one from another but they are very similar! DTSTART;TZID="America/Chicago":20030819T000000 DTEND;TZID="America/Chicago":20030819T010000 DTSTART;TZID=US/Pacific DTSTART;VALUE=DATE now i want replace everything between the first A-Z block and the colon so for example i would keep DTSTART:20030819T000000 DTEND:20030819T010000 DTSTART DTSTART so on my very noobs knowledge i have worked out this shitty regex! :-( preg_replace( '/^[A-Z](?!;[A-Z]=[\w\W]+):$/m' , '' , $data ); but why i'm sure this regex will not work!? :-) Pls help me! PS: the title of question is pretty explaned, i want also know how for example use a well know string block for match another... preg_replace( '/^[DTSTART](?!;[A-Z]=[\w\W]+):$/m' , '' , $data ); ..without delete DTSTART Thanks for the time! Regards Luca Filosofi

Read the article

Perl pattern matching with zero width assertion

- by Simone

Hi everyone, I can't get why this code work: $seq = 'GAGAGAGA'; my $regexp = '(?=((G[UCGA][GA]A)|(U[GA]CG)|(CUUG)))'; # zero width match while ($seq =~ /$regexp/g){ # globally my $pos = pos($seq) + 1; # position of a zero width matching print "$1 position $pos\n"; } I know this is a zero width match and it dosn't put the matched string in $&, but why does it put it in $1? thank you!

Read the article

UNIX-style RegExp Replace running extremely slowly under windows. Help? EDIT: Negative lookahead ass

- by John Sullivan

I'm trying to run a unix regEXP on every log file in a 1.12 GB directory, then replace the matched pattern with ''. Test run on a 4 meg file is took about 10 minutes, but worked. Obviously something is murdering performance by several orders of magnitude. Find: ^(?!.*155[0-2][0-9]{4}\s.*).*$ -- NOTE: match any line NOT starting 155[0-2]NNNN where in is a number 0-9. Replace with: ''. Is there some justifiable reason for my regExp to take this long to replace matching text, or is the program I am using (this is windows / a program called "grepWin") most likely poorly optimized? Thanks. UPDATE: I am noticing that searching for ^(155[0-2]).$ takes ~7 seconds in a 5.6 MB file with 77 matches. Adding the Negative Lookahead Assertion, ?=, so that the regExp becomes ^(?!155[0-2]).$ is causing it to take at least 5-10 minutes; granted, there will be thousands and thousands of matches. Should the negative lookahead assertion be extremely detrimental to performance, and/or a large quantity of matches?

Read the article

How lookaheads are propagated in "channel" method of building LALR parser?

- by greenoldman

The method is described in Dragon Book, however I read about it in ""Parsing Techniques" by D.Grune and C.J.H.Jacobs". I start from my understanding of building channels for NFA: channels are built once, they are like water channels with current you "drop" lookahead symbols in right places (sources) of the channel, and they propagate with "current" when symbol propagates, there are no barriers (the only sufficient things for propagation are presence of channel and direction/current); i.e. lookahead cannot just die out of the blue Is that right? If I am correct, then eof lookahead should be present in all states, because the source of it is the start production, and all other production states are reachable from start state. How the DFA is made out of this NFA is not perfectly clear for me -- the authors of the mentioned book write about preserving channels, but I see no purpose, if you propagated lookaheads. If the channels have to be preserved, are they cut off from the source if the DFA state does not include source NFA state? I assume no -- the channels still runs between DFA states, not only within given DFA state. In the effect eof should still be present in all items in all states. But when you take a look at DFA presented in book (pdf is from errata): DFA for LALR (fig. 9.34 in the book, p.301) you will see there are items without eof in lookahead. The grammar for this DFA is: S -> E E -> E - T E -> T T -> ( E ) T -> n So how it was computed, when eof was dropped, and on what condition? Update It is textual pdf, so two interesting states (in DFA; # is eof): State 1: S--- >•E[#] E--- >•E-T[#-] E--- >•T[#-] T--- >•n[#-] T--- >•(E)[#-] State 6: T--- >(•E)[#-)] E--- >•E-T[-)] E--- >•T[-)] T--- >•n[-)] T--- >•(E)[-)] Arc from 1 to 6 is labeled (.

Read the article

How to read reduce/shift conflicts in LR(1) DFA?

- by greenoldman

I am reading an explanation (awesome "Parsing Techniques" by D.Grune and C.J.H.Jacobs; p.293 in the 2nd edition) and I moved forward from my last question: How to get lookahead symbol when constructing LR(1) NFA for parser? Now I have such "problem" (maybe not a problem, but rather need of confirmation from some more knowledgeable people). The authors present state in LR(0) which has reduce/shift conflict. Then they build DFA for LR(1) for the same grammar. And now they say it does not have a conflict (lookaheads at the end): S -> E . eof E -> E . - T eof E -> E . - T - and there is an edge from this state labeled - but no labeled eof. Authors says, that on eof there will be reduce, on - there will be shift. However eof is for shift as well (as lookahead). So my personal understanding of LR(1) DFA is this -- you can drop lookaheads for shifts, because they serve no purpose now -- shifts rely on input, not on lookaheads -- and after that, remove duplicates. S -> E . eof E -> E . - T So the lookahead for reduce serves as input really, because at this stage (all required input is read) it is really incoming symbol right now. For shifts, the input symbols are on the edges. So my question is this -- am I actually right about dropping lookaheads for shifts (after fully constructing DFA)?

Read the article

SOA Suite 11g Native Format Builder Complex Format Example

- by bob.webster

This rather long posting details the steps required to process a grouping of fixed length records using Format Builder. If it’s 10 pm and you’re feeling beat you might want to leave this until tomorrow. But if it’s 10 pm and you need to get a Format Builder Complex template done, read on… The goal is to process individual orders from a file using the 11g File Adapter and Format Builder Sample Data =========== 001Square Widget 0245.98 102Triagular Widget 1120.00 403Circular Widget 0099.45 ORD8898302/01/2011 301Hexagon Widget 1150.98 ORD6735502/01/2011 The records are fixed length records representing a number of logical Order records. Each order record consists of a number of item records starting with a 3 digit number, followed by a single Summary Record which starts with the constant ORD. How can this file be processed so that the first poll returns the first order? 001Square Widget 0245.98 102Triagular Widget 1120.00 403Circular Widget 0099.45 ORD8898302/01/2011 And the second poll returns the second order? 301Hexagon Widget 1150.98 ORD6735502/01/2011 Note: if you need more than one order per poll, that’s also possible, see the “Multiple Messages” field in the “File Adapter Step 6 of 9” snapshot further down. To follow along with this example you will need - Studio Edition Version 11.1.1.4.0 with the - SOA Extension for JDeveloper 11.1.1.4.0 installed Both can be downloaded from here: http://www.oracle.com/technetwork/middleware/soasuite/downloads/index.html You will not need a running WebLogic Server domain to complete the steps and Format Builder tests in this article. Start with a SOA Composite containing a File Adapter The Format Builder is part of the File Adapter so start by creating a new SOA Project and Composite. Here is a quick summary for those not familiar with these steps - Start JDeveloper - From the Main Menu choose File->New - In the New Gallery window that opens Expand the “General” category and Select the Applications node. Then choose SOA Application from the Items section on the right. Finally press the OK button. - In Step 1 of the “Create SOA Application wizard” that appears enter an Application Name and an Directory of your choice, then press the Next button. - In Step 2 of the “Create SOA Application wizard”, press the Next button leaving all entries as defaulted. - In Step 3 of the “Create SOA Application wizard”, Enter a composite name of your choice and Press the Finish Button These steps result in a new Application and SOA Project. The SOA Project contains a composite.xml file which is opened and shown below. For our example we have not defined a Mediator or a BPEL process to minimize the steps, but one or the other would eventually be needed to use the File Adapter we are about to create. Drag and drop the File Adapter icon from the Component Pallette onto either the LEFT side of the diagram under “Exposed Services” or the right side under “External References”. (See the Green Circle in the image below). Placing the adapter on the left side would indicate the file being processed is inbound to the composite, if the adapter is placed on the right side then the data is outbound to a file. Note that the same Format Builder definition can be used in both directions. For example we could use the format with a File Adapter on the left side of the composite to parse fixed data into XML, modify the data in our Composite or BPEL process and then use the same Format Builder definition with a File adapter on the right side of the composite to write the data back out in the same fixed data format When the File Adapter is dropped on the Composite the File Adapter Wizard Appears. Skip Past the first page, Step 1 of 9 by pressing the Next button. In Step 2 enter a service name of your choice as shown below, then press Next When the Native Format Builder appears, skip the welcome page by pressing next. Also press the Next button to accept the settings on Step 3 of 9 On Step 4, select Read File and press the Next button as shown below. On Step 5 enter a directory that will contain a file with the input data, then Press the Next button as shown below. In step 6, enter *.txt or another file format to select input files from the input directory mentioned in step 5. ALSO check the “Files contain Multiple Messages” checkbox and set the “Publish Messages in Batches of” field to 1. The value can be set higher to increase the number of logical order group records returned on each poll of the file adapter. In other words, it determines the number of Orders that will be sent to each instance of a Mediator or Composite processing using the File Adapter. Skip Step 7 by pressing the Next button In Step 8 press the Gear Icon on the right side to load the Native Format Builder. Native Format Builder appears Before diving into the format, here is an overview of the process. Approach - Bottom up Assuming an Order is a grouping of item records and a summary record…. - Define a separate Complex Type for each Record Type found in the group. (One for itemRecord and one for summaryRecord) - Define a Complex Type to contain the Group of Record types defined above (LogicalOrderRecord) - Define a top level element to represent an order. (order) The order element will be of type LogicalOrderRecord Defining the Format In Step 1 select “Create new” and “Complex Type” and “Next” In Step two browse to and select a file containing the test data shown at the start of this article. A link is provided at the end of this article to download a file containing the test data. Press the Next button In Step 3 Complex types must be define for each type of input record. Select the Root-Element and Click on the Add Complex Type icon This creates a new empty complex type definition shown below. The fastest way to create the definition is to highlight the first line of the Sample File data and drag the line onto the <new_complex_type> Format Builder introspects the data and provides a grid to define additional fields. Change the “Complex Type Name” to “itemRecord” Then click on the ruler to indicate the position of fixed columns. Drag the red triangle icons to the exact columns if necessary. Double click on an existing red triangle to remove an unwanted entry. In the case below fields are define in columns 0-3, 4-28, 29-eol When the field definitions are correct, press the “Generate Fields” button. Field entries named C1, C2 and C3 will be created as shown below. Click on the field names and rename them from C1->itemNum, C2->itemDesc and C3->itemCost When all the fields are correctly defined press OK to save the complex type. Next, the process is repeated to define a Complex Type for the SummaryRecord. Select the Root-Element in the schema tree and press the new complex type icon Then highlight and drag the Summary Record from the sample data onto the <new_complex_type> Change the complex type name to “summaryRecord” Mark the fixed fields for Order Number and Order Date. Press the Generate Fields button and rename C1 and C2 to itemNum and orderDate respectively. The last complex type to be defined is a type to hold the group of items and the summary record. Select the Root-Element in the schema tree and click the new complex type icon Select the “<new_complex_type>” entry and click the pencil icon On the Complex Type Details page change the name and type of each input field. Change line 1 to be named item and set the Type to “itemRecord” Change line 2 to be named summary and set the Type to “summaryRecord” We also need to indicate that itemRecords repeat in the input file. Click the pencil icon at the right side of the item line. On the Edit Details page change the “Max Occurs” entry from 1 to UNBOUNDED. We also need to indicate how to identify an itemRecord. Since each item record has “.” in column 32 we can use this fact to differentiate an item record from a summary record. Change the “Look Ahead” field to value 32 and enter a period in the “Look For” field Press the OK button to save entry. Finally, its time to create a top level element to represent an order. Select the “Root-Element” in the schema tree and press the New element icon Click on the <new_element> and press the pencil icon. Set the Element Name to “order” and change the Data Type to “logicalOrderRecord” Press the OK button to save the element definition. The final definition should match the screenshot below. Press the Next Button to view the definition source. Press the Test Button to test the definition Press the Green Triangle Icon to run the test. And we are presented with an unwelcome error. The error states that the processor ran out of data while working through the definition. The processor was unable to differentiate between itemRecords and summaryRecords and therefore treated the entire file as a list of itemRecords. At end of file, the “summary” portion of the logicalOrderRecord remained unprocessed but mandatory. This root cause of this error is the loss of our “lookAhead” definition used to identify itemRecords. This appears to be a bug in the Native Format Builder 11.1.1.4.0 Luckily, a simple workaround exists. Press the Cancel button and return to the “Step 4 of 4” Window. Manually add nxsd:lookAhead="32" nxsd:lookFor="." attributes after the maxOccurs attribute of the item element. as shown in the highlighted text below. When the lookAhead and lookFor attributes have been added Press the Test button and on the Test page press the Green Triangle. The test is now successful, the first order in the file is returned by the File Adapter. Below is a complete listing of the Result XML from the right column of the screen above Try running it The downloaded input test file and completed schema file can be used for testing without following all the Native Format Builder steps in this example. Use the following link to download a file containing the sample data. Download Sample Input Data This is the best approach rather than cutting and pasting the input data at the top of the article. Since the data is fixed length it’s very important to watch out for trailing spaces in the data and to ensure an eol character at the end of every line. The download file is correctly formatted. The final schema definition can be downloaded at the following link Download Completed Schema Definition - Save the inputData.txt file to a known location like the xsd folder in your project. - Save the inputData_6.xsd file to the xsd folder in your project. - At step 1 in the Native Format Builder wizard (as shown above) check the “Edit existing” radio button, then browse and select the inputData_6.xsd file - At step 2 of the Format Builder configuration Wizard (as shown above) supply the path and filename for the inputData.txt file. - You can then proceed to the test page and run a test. - Remember the wizard bug will drop the lookAhead and lookFor attributes, you will need to manually add nxsd:lookAhead="32" nxsd:lookFor="." after the maxOccurs attribute of the item element in the LogicalOrderRecord Complex Type. (as shown above) Good Luck with your Format Project

Search Results

Search found 56 results on 3 pages for 'lookahead'.

Page 1/3 | 1 2 3 | Next Page >

- by Greg Jackson

- by user198729

- by AaronM

- by bryan-rasmussen

- by Alyn

- by Sylverdrag

- by Hugo Zaragoza

- by greenoldman

- by Tourshi

- by teggy

- by Lattyware

- by Inf.S

- by Ivan Yatskevich

- by Istao

- by grovel

- by voodoo555

- by Tirithen

- by sepp2k

- by Eric Hendrickson

- by aSeptik

- by Simone

- by John Sullivan

- by greenoldman

- by greenoldman

- by bob.webster

1 2 3 | Next Page >