Search Results

Search found 126 results on 6 pages for 'lexer'.

Page 1/6 | 1 2 3 4 5 6 | Next Page >

Communication between lexer and parser

- by FredOverflow

Every time I write a simple lexer and parser, I stumble upon the same question: how should the lexer and the parser communicate? I see four different approaches: The lexer eagerly converts the entire input string into a vector of tokens. Once this is done, the vector is fed to the parser which converts it into a tree. This is by far the simplest solution to implement, but since all tokens are stored in memory, it wastes a lot of space. Each time the lexer finds a token, it invokes a function on the parser, passing the current token. In my experience, this only works if the parser can naturally be implemented as a state machine like LALR parsers. By contrast, I don't think it would work at all for recursive descent parsers. Each time the parser needs a token, it asks the lexer for the next one. This is very easy to implement in C# due to the yield keyword, but quite hard in C++ which doesn't have it. The lexer and parser communicate through an asynchronous queue. This is commonly known under the title "producer/consumer", and it should simplify the communication between the lexer and the parser a lot. Does it also outperform the other solutions on multicores? Or is lexing too trivial? Is my analysis sound? Are there other approaches I haven't thought of? What is used in real-world compilers? It would be really cool if compiler writers like Eric Lippert could shed some light on this issue.

Read the article
Error in running script [closed]

- by SWEngineer

I'm trying to run heathusf_v1.1.0.tar.gz found here I installed tcsh to make build_heathusf work. But, when I run ./build_heathusf, I get the following (I'm running that on a Fedora Linux system from Terminal): $ ./build_heathusf Compiling programs to build a library of image processing functions. convexpolyscan.c: In function ‘cdelete’: convexpolyscan.c:346:5: warning: incompatible implicit declaration of built-in function ‘bcopy’ [enabled by default] myalloc.c: In function ‘mycalloc’: myalloc.c:68:16: error: invalid storage class for function ‘store_link’ myalloc.c: In function ‘mymalloc’: myalloc.c:101:16: error: invalid storage class for function ‘store_link’ myalloc.c: In function ‘myfree’: myalloc.c:129:27: error: invalid storage class for function ‘find_link’ myalloc.c:131:12: warning: assignment makes pointer from integer without a cast [enabled by default] myalloc.c: At top level: myalloc.c:150:13: warning: conflicting types for ‘store_link’ [enabled by default] myalloc.c:150:13: error: static declaration of ‘store_link’ follows non-static declaration myalloc.c:91:4: note: previous implicit declaration of ‘store_link’ was here myalloc.c:164:24: error: conflicting types for ‘find_link’ myalloc.c:131:14: note: previous implicit declaration of ‘find_link’ was here Building the mammogram resizing program. gcc -O2 -I. -I../common mkimage.o -o mkimage -L../common -lmammo -lm ../common/libmammo.a(aggregate.o): In function `aggregate': aggregate.c:(.text+0x7fa): undefined reference to `mycalloc' aggregate.c:(.text+0x81c): undefined reference to `mycalloc' aggregate.c:(.text+0x868): undefined reference to `mycalloc' ../common/libmammo.a(aggregate.o): In function `aggregate_median': aggregate.c:(.text+0xbc5): undefined reference to `mymalloc' aggregate.c:(.text+0xbfb): undefined reference to `mycalloc' aggregate.c:(.text+0xc3c): undefined reference to `mycalloc' ../common/libmammo.a(aggregate.o): In function `aggregate': aggregate.c:(.text+0x9b5): undefined reference to `myfree' ../common/libmammo.a(aggregate.o): In function `aggregate_median': aggregate.c:(.text+0xd85): undefined reference to `myfree' ../common/libmammo.a(optical_density.o): In function `linear_optical_density': optical_density.c:(.text+0x29e): undefined reference to `mymalloc' optical_density.c:(.text+0x342): undefined reference to `mycalloc' optical_density.c:(.text+0x383): undefined reference to `mycalloc' ../common/libmammo.a(optical_density.o): In function `log10_optical_density': optical_density.c:(.text+0x693): undefined reference to `mymalloc' optical_density.c:(.text+0x74f): undefined reference to `mycalloc' optical_density.c:(.text+0x790): undefined reference to `mycalloc' ../common/libmammo.a(optical_density.o): In function `map_with_ushort_lut': optical_density.c:(.text+0xb2e): undefined reference to `mymalloc' optical_density.c:(.text+0xb87): undefined reference to `mycalloc' optical_density.c:(.text+0xbc6): undefined reference to `mycalloc' ../common/libmammo.a(optical_density.o): In function `linear_optical_density': optical_density.c:(.text+0x4d9): undefined reference to `myfree' ../common/libmammo.a(optical_density.o): In function `log10_optical_density': optical_density.c:(.text+0x8f1): undefined reference to `myfree' ../common/libmammo.a(optical_density.o): In function `map_with_ushort_lut': optical_density.c:(.text+0xd0d): undefined reference to `myfree' ../common/libmammo.a(virtual_image.o): In function `deallocate_cached_image': virtual_image.c:(.text+0x3dc6): undefined reference to `myfree' virtual_image.c:(.text+0x3dd7): undefined reference to `myfree' ../common/libmammo.a(virtual_image.o):virtual_image.c:(.text+0x3de5): more undefined references to `myfree' follow ../common/libmammo.a(virtual_image.o): In function `allocate_cached_image': virtual_image.c:(.text+0x4233): undefined reference to `mycalloc' virtual_image.c:(.text+0x4253): undefined reference to `mymalloc' virtual_image.c:(.text+0x4275): undefined reference to `mycalloc' virtual_image.c:(.text+0x42e7): undefined reference to `mycalloc' virtual_image.c:(.text+0x44f9): undefined reference to `mycalloc' virtual_image.c:(.text+0x47a9): undefined reference to `mycalloc' virtual_image.c:(.text+0x4a45): undefined reference to `mycalloc' virtual_image.c:(.text+0x4af4): undefined reference to `myfree' collect2: error: ld returned 1 exit status make: *** [mkimage] Error 1 Building the breast segmentation program. gcc -O2 -I. -I../common breastsegment.o segment.o -o breastsegment -L../common -lmammo -lm breastsegment.o: In function `render_segmentation_sketch': breastsegment.c:(.text+0x43): undefined reference to `mycalloc' breastsegment.c:(.text+0x58): undefined reference to `mycalloc' breastsegment.c:(.text+0x12f): undefined reference to `mycalloc' breastsegment.c:(.text+0x1b9): undefined reference to `myfree' breastsegment.c:(.text+0x1c6): undefined reference to `myfree' breastsegment.c:(.text+0x1e1): undefined reference to `myfree' segment.o: In function `find_center': segment.c:(.text+0x53): undefined reference to `mycalloc' segment.c:(.text+0x71): undefined reference to `mycalloc' segment.c:(.text+0x387): undefined reference to `myfree' segment.o: In function `bordercode': segment.c:(.text+0x4ac): undefined reference to `mycalloc' segment.c:(.text+0x546): undefined reference to `mycalloc' segment.c:(.text+0x651): undefined reference to `mycalloc' segment.c:(.text+0x691): undefined reference to `myfree' segment.o: In function `estimate_tissue_image': segment.c:(.text+0x10d4): undefined reference to `mycalloc' segment.c:(.text+0x14da): undefined reference to `mycalloc' segment.c:(.text+0x1698): undefined reference to `mycalloc' segment.c:(.text+0x1834): undefined reference to `mycalloc' segment.c:(.text+0x1850): undefined reference to `mycalloc' segment.o:segment.c:(.text+0x186a): more undefined references to `mycalloc' follow segment.o: In function `estimate_tissue_image': segment.c:(.text+0x1bbc): undefined reference to `myfree' segment.c:(.text+0x1c4a): undefined reference to `mycalloc' segment.c:(.text+0x1c7c): undefined reference to `mycalloc' segment.c:(.text+0x1d8e): undefined reference to `myfree' segment.c:(.text+0x1d9b): undefined reference to `myfree' segment.c:(.text+0x1da8): undefined reference to `myfree' segment.c:(.text+0x1dba): undefined reference to `myfree' segment.c:(.text+0x1dc9): undefined reference to `myfree' segment.o:segment.c:(.text+0x1dd8): more undefined references to `myfree' follow segment.o: In function `estimate_tissue_image': segment.c:(.text+0x20bf): undefined reference to `mycalloc' segment.o: In function `segment_breast': segment.c:(.text+0x24cd): undefined reference to `mycalloc' segment.o: In function `find_center': segment.c:(.text+0x3a4): undefined reference to `myfree' segment.o: In function `bordercode': segment.c:(.text+0x6ac): undefined reference to `myfree' ../common/libmammo.a(aggregate.o): In function `aggregate': aggregate.c:(.text+0x7fa): undefined reference to `mycalloc' aggregate.c:(.text+0x81c): undefined reference to `mycalloc' aggregate.c:(.text+0x868): undefined reference to `mycalloc' ../common/libmammo.a(aggregate.o): In function `aggregate_median': aggregate.c:(.text+0xbc5): undefined reference to `mymalloc' aggregate.c:(.text+0xbfb): undefined reference to `mycalloc' aggregate.c:(.text+0xc3c): undefined reference to `mycalloc' ../common/libmammo.a(aggregate.o): In function `aggregate': aggregate.c:(.text+0x9b5): undefined reference to `myfree' ../common/libmammo.a(aggregate.o): In function `aggregate_median': aggregate.c:(.text+0xd85): undefined reference to `myfree' ../common/libmammo.a(cc_label.o): In function `cc_label': cc_label.c:(.text+0x20c): undefined reference to `mycalloc' cc_label.c:(.text+0x6c2): undefined reference to `mycalloc' cc_label.c:(.text+0xbaa): undefined reference to `myfree' ../common/libmammo.a(cc_label.o): In function `cc_label_0bkgd': cc_label.c:(.text+0xe17): undefined reference to `mycalloc' cc_label.c:(.text+0x12d7): undefined reference to `mycalloc' cc_label.c:(.text+0x17e7): undefined reference to `myfree' ../common/libmammo.a(cc_label.o): In function `cc_relabel_by_intensity': cc_label.c:(.text+0x18c5): undefined reference to `mycalloc' ../common/libmammo.a(cc_label.o): In function `cc_label_4connect': cc_label.c:(.text+0x1cf0): undefined reference to `mycalloc' cc_label.c:(.text+0x2195): undefined reference to `mycalloc' cc_label.c:(.text+0x26a4): undefined reference to `myfree' ../common/libmammo.a(cc_label.o): In function `cc_relabel_by_intensity': cc_label.c:(.text+0x1b06): undefined reference to `myfree' ../common/libmammo.a(convexpolyscan.o): In function `polyscan_coords': convexpolyscan.c:(.text+0x6f0): undefined reference to `mycalloc' convexpolyscan.c:(.text+0x75f): undefined reference to `mycalloc' convexpolyscan.c:(.text+0x7ab): undefined reference to `myfree' convexpolyscan.c:(.text+0x7b8): undefined reference to `myfree' ../common/libmammo.a(convexpolyscan.o): In function `polyscan_poly_cacheim': convexpolyscan.c:(.text+0x805): undefined reference to `mycalloc' convexpolyscan.c:(.text+0x894): undefined reference to `myfree' ../common/libmammo.a(mikesfileio.o): In function `read_segmentation_file': mikesfileio.c:(.text+0x1e9): undefined reference to `mycalloc' mikesfileio.c:(.text+0x205): undefined reference to `mycalloc' ../common/libmammo.a(optical_density.o): In function `linear_optical_density': optical_density.c:(.text+0x29e): undefined reference to `mymalloc' optical_density.c:(.text+0x342): undefined reference to `mycalloc' optical_density.c:(.text+0x383): undefined reference to `mycalloc' ../common/libmammo.a(optical_density.o): In function `log10_optical_density': optical_density.c:(.text+0x693): undefined reference to `mymalloc' optical_density.c:(.text+0x74f): undefined reference to `mycalloc' optical_density.c:(.text+0x790): undefined reference to `mycalloc' ../common/libmammo.a(optical_density.o): In function `map_with_ushort_lut': optical_density.c:(.text+0xb2e): undefined reference to `mymalloc' optical_density.c:(.text+0xb87): undefined reference to `mycalloc' optical_density.c:(.text+0xbc6): undefined reference to `mycalloc' ../common/libmammo.a(optical_density.o): In function `linear_optical_density': optical_density.c:(.text+0x4d9): undefined reference to `myfree' ../common/libmammo.a(optical_density.o): In function `log10_optical_density': optical_density.c:(.text+0x8f1): undefined reference to `myfree' ../common/libmammo.a(optical_density.o): In function `map_with_ushort_lut': optical_density.c:(.text+0xd0d): undefined reference to `myfree' ../common/libmammo.a(virtual_image.o): In function `deallocate_cached_image': virtual_image.c:(.text+0x3dc6): undefined reference to `myfree' virtual_image.c:(.text+0x3dd7): undefined reference to `myfree' ../common/libmammo.a(virtual_image.o):virtual_image.c:(.text+0x3de5): more undefined references to `myfree' follow ../common/libmammo.a(virtual_image.o): In function `allocate_cached_image': virtual_image.c:(.text+0x4233): undefined reference to `mycalloc' virtual_image.c:(.text+0x4253): undefined reference to `mymalloc' virtual_image.c:(.text+0x4275): undefined reference to `mycalloc' virtual_image.c:(.text+0x42e7): undefined reference to `mycalloc' virtual_image.c:(.text+0x44f9): undefined reference to `mycalloc' virtual_image.c:(.text+0x47a9): undefined reference to `mycalloc' virtual_image.c:(.text+0x4a45): undefined reference to `mycalloc' virtual_image.c:(.text+0x4af4): undefined reference to `myfree' collect2: error: ld returned 1 exit status make: *** [breastsegment] Error 1 Building the mass feature generation program. gcc -O2 -I. -I../common afumfeature.o -o afumfeature -L../common -lmammo -lm afumfeature.o: In function `afum_process': afumfeature.c:(.text+0xd80): undefined reference to `mycalloc' afumfeature.c:(.text+0xd9c): undefined reference to `mycalloc' afumfeature.c:(.text+0xe80): undefined reference to `mycalloc' afumfeature.c:(.text+0x11f8): undefined reference to `myfree' afumfeature.c:(.text+0x1207): undefined reference to `myfree' afumfeature.c:(.text+0x1214): undefined reference to `myfree' ../common/libmammo.a(aggregate.o): In function `aggregate': aggregate.c:(.text+0x7fa): undefined reference to `mycalloc' aggregate.c:(.text+0x81c): undefined reference to `mycalloc' aggregate.c:(.text+0x868): undefined reference to `mycalloc' ../common/libmammo.a(aggregate.o): In function `aggregate_median': aggregate.c:(.text+0xbc5): undefined reference to `mymalloc' aggregate.c:(.text+0xbfb): undefined reference to `mycalloc' aggregate.c:(.text+0xc3c): undefined reference to `mycalloc' ../common/libmammo.a(aggregate.o): In function `aggregate': aggregate.c:(.text+0x9b5): undefined reference to `myfree' ../common/libmammo.a(aggregate.o): In function `aggregate_median': aggregate.c:(.text+0xd85): undefined reference to `myfree' ../common/libmammo.a(convexpolyscan.o): In function `polyscan_coords': convexpolyscan.c:(.text+0x6f0): undefined reference to `mycalloc' convexpolyscan.c:(.text+0x75f): undefined reference to `mycalloc' convexpolyscan.c:(.text+0x7ab): undefined reference to `myfree' convexpolyscan.c:(.text+0x7b8): undefined reference to `myfree' ../common/libmammo.a(convexpolyscan.o): In function `polyscan_poly_cacheim': convexpolyscan.c:(.text+0x805): undefined reference to `mycalloc' convexpolyscan.c:(.text+0x894): undefined reference to `myfree' ../common/libmammo.a(mikesfileio.o): In function `read_segmentation_file': mikesfileio.c:(.text+0x1e9): undefined reference to `mycalloc' mikesfileio.c:(.text+0x205): undefined reference to `mycalloc' ../common/libmammo.a(optical_density.o): In function `linear_optical_density': optical_density.c:(.text+0x29e): undefined reference to `mymalloc' optical_density.c:(.text+0x342): undefined reference to `mycalloc' optical_density.c:(.text+0x383): undefined reference to `mycalloc' ../common/libmammo.a(optical_density.o): In function `log10_optical_density': optical_density.c:(.text+0x693): undefined reference to `mymalloc' optical_density.c:(.text+0x74f): undefined reference to `mycalloc' optical_density.c:(.text+0x790): undefined reference to `mycalloc' ../common/libmammo.a(optical_density.o): In function `map_with_ushort_lut': optical_density.c:(.text+0xb2e): undefined reference to `mymalloc' optical_density.c:(.text+0xb87): undefined reference to `mycalloc' optical_density.c:(.text+0xbc6): undefined reference to `mycalloc' ../common/libmammo.a(optical_density.o): In function `linear_optical_density': optical_density.c:(.text+0x4d9): undefined reference to `myfree' ../common/libmammo.a(optical_density.o): In function `log10_optical_density': optical_density.c:(.text+0x8f1): undefined reference to `myfree' ../common/libmammo.a(optical_density.o): In function `map_with_ushort_lut': optical_density.c:(.text+0xd0d): undefined reference to `myfree' ../common/libmammo.a(virtual_image.o): In function `deallocate_cached_image': virtual_image.c:(.text+0x3dc6): undefined reference to `myfree' virtual_image.c:(.text+0x3dd7): undefined reference to `myfree' ../common/libmammo.a(virtual_image.o):virtual_image.c:(.text+0x3de5): more undefined references to `myfree' follow ../common/libmammo.a(virtual_image.o): In function `allocate_cached_image': virtual_image.c:(.text+0x4233): undefined reference to `mycalloc' virtual_image.c:(.text+0x4253): undefined reference to `mymalloc' virtual_image.c:(.text+0x4275): undefined reference to `mycalloc' virtual_image.c:(.text+0x42e7): undefined reference to `mycalloc' virtual_image.c:(.text+0x44f9): undefined reference to `mycalloc' virtual_image.c:(.text+0x47a9): undefined reference to `mycalloc' virtual_image.c:(.text+0x4a45): undefined reference to `mycalloc' virtual_image.c:(.text+0x4af4): undefined reference to `myfree' collect2: error: ld returned 1 exit status make: *** [afumfeature] Error 1 Building the mass detection program. make: Nothing to be done for `all'. Building the performance evaluation program. gcc -O2 -I. -I../common DDSMeval.o polyscan.o -o DDSMeval -L../common -lmammo -lm ../common/libmammo.a(mikesfileio.o): In function `read_segmentation_file': mikesfileio.c:(.text+0x1e9): undefined reference to `mycalloc' mikesfileio.c:(.text+0x205): undefined reference to `mycalloc' collect2: error: ld returned 1 exit status make: *** [DDSMeval] Error 1 Building the template creation program. gcc -O2 -I. -I../common mktemplate.o polyscan.o -o mktemplate -L../common -lmammo -lm Building the drawimage program. gcc -O2 -I. -I../common drawimage.o -o drawimage -L../common -lmammo -lm ../common/libmammo.a(mikesfileio.o): In function `read_segmentation_file': mikesfileio.c:(.text+0x1e9): undefined reference to `mycalloc' mikesfileio.c:(.text+0x205): undefined reference to `mycalloc' collect2: error: ld returned 1 exit status make: *** [drawimage] Error 1 Building the compression/decompression program jpeg. gcc -O2 -DSYSV -DNOTRUNCATE -c lexer.c lexer.c:41:1: error: initializer element is not constant lexer.c:41:1: error: (near initialization for ‘yyin’) lexer.c:41:1: error: initializer element is not constant lexer.c:41:1: error: (near initialization for ‘yyout’) lexer.c: In function ‘initparser’: lexer.c:387:21: warning: incompatible implicit declaration of built-in function ‘strlen’ [enabled by default] lexer.c: In function ‘MakeLink’: lexer.c:443:16: warning: incompatible implicit declaration of built-in function ‘malloc’ [enabled by default] lexer.c:447:7: warning: incompatible implicit declaration of built-in function ‘exit’ [enabled by default] lexer.c:452:7: warning: incompatible implicit declaration of built-in function ‘exit’ [enabled by default] lexer.c:455:34: warning: incompatible implicit declaration of built-in function ‘calloc’ [enabled by default] lexer.c:458:7: warning: incompatible implicit declaration of built-in function ‘exit’ [enabled by default] lexer.c:460:3: warning: incompatible implicit declaration of built-in function ‘strcpy’ [enabled by default] lexer.c: In function ‘getstr’: lexer.c:548:26: warning: incompatible implicit declaration of built-in function ‘malloc’ [enabled by default] lexer.c:552:4: warning: incompatible implicit declaration of built-in function ‘exit’ [enabled by default] lexer.c:557:21: warning: incompatible implicit declaration of built-in function ‘calloc’ [enabled by default] lexer.c:557:28: warning: incompatible implicit declaration of built-in function ‘strlen’ [enabled by default] lexer.c:561:7: warning: incompatible implicit declaration of built-in function ‘exit’ [enabled by default] lexer.c: In function ‘parser’: lexer.c:794:21: warning: incompatible implicit declaration of built-in function ‘calloc’ [enabled by default] lexer.c:798:8: warning: incompatible implicit declaration of built-in function ‘exit’ [enabled by default] lexer.c:1074:21: warning: incompatible implicit declaration of built-in function ‘calloc’ [enabled by default] lexer.c:1078:8: warning: incompatible implicit declaration of built-in function ‘exit’ [enabled by default] lexer.c:1116:21: warning: incompatible implicit declaration of built-in function ‘calloc’ [enabled by default] lexer.c:1120:8: warning: incompatible implicit declaration of built-in function ‘exit’ [enabled by default] lexer.c:1154:25: warning: incompatible implicit declaration of built-in function ‘calloc’ [enabled by default] lexer.c:1158:5: warning: incompatible implicit declaration of built-in function ‘exit’ [enabled by default] lexer.c:1190:5: warning: incompatible implicit declaration of built-in function ‘exit’ [enabled by default] lexer.c:1247:25: warning: incompatible implicit declaration of built-in function ‘calloc’ [enabled by default] lexer.c:1251:5: warning: incompatible implicit declaration of built-in function ‘exit’ [enabled by default] lexer.c:1283:5: warning: incompatible implicit declaration of built-in function ‘exit’ [enabled by default] lexer.c: In function ‘yylook’: lexer.c:1867:9: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] lexer.c:1867:20: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] lexer.c:1877:12: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] lexer.c:1877:23: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] make: *** [lexer.o] Error 1

Read the article
lexer/parser ambiguity

- by John Leidegren

How does a lexer solve this ambiguity? /*/*/ How is it that it doesn't just say, oh yeah, that's the begining of a multi-line comment, followed by another multi-line comment. Wouldn't a greedy lexer just return the following tokens? /* /* / I'm in the midst of writing a shift-reduce parser for CSS and yet this simple comment thing is in my way. You can read this question if you wan't some more background information.

Read the article
problem string recursion antlr lexer token

- by barrozz

How do I build a token in lexer that can handle recursion inside as this string: ${*anythink*${*anything*}*anythink*} ? thanks

Read the article
lexer skips a token

- by Eugene Strizhok

I am trying to do basic ANTLR-based scanning. I have a problem with a lexer not matching wanted tokens. lexer grammar DefaultLexer; ALPHANUM : (LETTER | DIGIT)+; ACRONYM : LETTER '.' (LETTER '.')+; HOST : ALPHANUM (('.' | '-') ALPHANUM)+; fragment LETTER : UNICODE_CLASS_LL | UNICODE_CLASS_LM | UNICODE_CLASS_LO | UNICODE_CLASS_LT | UNICODE_CLASS_LU; fragment DIGIT : UNICODE_CLASS_ND | UNICODE_CLASS_NL; For the grammar above, hello. world string given as an input results in world only. Whereas I would expect to get both hello and world. What am I missing? Thanks.

Read the article
Poor man's "lexer" for C#

- by Paul Hollingsworth

I'm trying to write a very simple parser in C#. I need a lexer -- something that lets me associate regular expressions with tokens, so it reads in regexs and gives me back symbols. It seems like I ought to be able to use Regex to do the actual heavy lifting, but I can't see an easy way to do it. For one thing, Regex only seems to work on strings, not streams (why is that!?!?). Basically, I want an implementation of the following interface: interface ILexer : IDisposable { /// <summary> /// Return true if there are more tokens to read /// </summary> bool HasMoreTokens { get; } /// <summary> /// The actual contents that matched the token /// </summary> string TokenContents { get; } /// <summary> /// The particular token in "tokenDefinitions" that was matched (e.g. "STRING", "NUMBER", "OPEN PARENS", "CLOSE PARENS" /// </summary> object Token { get; } /// <summary> /// Move to the next token /// </summary> void Next(); } interface ILexerFactory { /// <summary> /// Create a Lexer for converting a stream of characters into tokens /// </summary> /// <param name="reader">TextReader that supplies the underlying stream</param> /// <param name="tokenDefinitions">A dictionary from regular expressions to their "token identifers"</param> /// <returns>The lexer</returns> ILexer CreateLexer(TextReader reader, IDictionary<string, object> tokenDefinitions); } So, pluz send the codz... No, seriously, I am about to start writing an implementation of the above interface yet I find it hard to believe that there isn't some simple way of doing this in .NET (2.0) already. So, any suggestions for a simple way to do the above? (Also, I don't want any "code generators". Performance is not important for this thing and I don't want to introduce any complexity into the build process.)

Read the article
Writing re-entrant lexer with Flex

- by Viet

I'm newbie to flex. I'm trying to write a simple re-entrant lexer/scanner with flex. The lexer definition goes below. I get stuck with compilation errors as shown below (yyg issue): reentrant.l: /* Definitions */ digit [0-9] letter [a-zA-Z] alphanum [a-zA-Z0-9] identifier [a-zA-Z_][a-zA-Z0-9_]+ integer [0-9]+ natural [0-9]*[1-9][0-9]* decimal ([0-9]+\.|\.[0-9]+|[0-9]+\.[0-9]+) %{ #include <stdio.h> #define ECHO fwrite(yytext, yyleng, 1, yyout) int totalNums = 0; %} %option reentrant %option prefix="simpleit_" %% ^(.*)\r?\n printf("%d\t%s", yylineno++, yytext); %% /* Routines */ int yywrap(yyscan_t yyscanner) { return 1; } int main(int argc, char* argv[]) { yyscan_t yyscanner; if(argc < 2) { printf("Usage: %s fileName\n", argv[0]); return -1; } yyin = fopen(argv[1], "rb"); yylex(yyscanner); return 0; } Compilation errors: vietlq@mylappie:~/Desktop/parsers/reentrant$ gcc lex.simpleit_.c reentrant.l: In function ‘main’: reentrant.l:44: error: ‘yyg’ undeclared (first use in this function) reentrant.l:44: error: (Each undeclared identifier is reported only once reentrant.l:44: error: for each function it appears in.)

Read the article
Lexer antlr3 token problem

- by nioo

Can I construct a token ENDPLUS: '+' (options (greedy = false;):.) * '+' ; being considered by the lexer only if it is preceded by a token PREwithout including in ENDPLUS? PRE: '<<' ; Thanks.

Read the article
Antlr Lexer Quoted String Problem

- by Loki

I'm trying to build a lexer to tokenize lone words and quoted strings. I got the following: STRING: QUOTE (options {greedy=false;} : . )* QUOTE ; WS : SPACE+ { $channel = HIDDEN; } ; WORD : ~(QUOTE|SPACE)+ ; For the corner cases, it needs to parse: "string" word1" word2 As three tokens: "string" as STRING and word1" and word2 as WORD. Basically, if there is a last quote, it needs to be part of the WORD were it is. If the quote is surrounded by white spaces, it should be a WORD. I tried this rule for WORD, without success: WORD: ~(QUOTE|SPACE)+ | (~(QUOTE|SPACE)* QUOTE ~QUOTE*)=> ~(QUOTE|SPACE)* QUOTE ~(QUOTE|SPACE)* ;

Read the article
Antlr Lexer Quoted String Predicate

- by Loki

I'm trying to build a lexer to tokenize lone words and quoted strings. I got the following: STRING: QUOTE (options {greedy=false;} : . )* QUOTE ; WS : SPACE+ { $channel = HIDDEN; } ; WORD : ~(QUOTE|SPACE)+ ; For the corner cases, it needs to parse: "string" word1" word2 As three tokens: "string" as STRING and word1" and word2 as WORD. Basically, if there is a last quote, it needs to be part of the WORD were it is. If the quote is surrounded by white spaces, it should be a WORD. I tried this rule for WORD, without success: WORD: ~(QUOTE|SPACE)+ | (~(QUOTE|SPACE)* QUOTE ~QUOTE*)=> ~(QUOTE|SPACE)* QUOTE ~(QUOTE|SPACE)* ;

Read the article
grammar parser lexer antlr letteral

- by BB

What's the difference between this grammar: ... if_statement : 'if' condition 'then' statement 'else' statement 'end_if'; ... and this: ... if_statement : IF condition THEN statement ELSE statement END_IF; ... IF : 'if'; THEN: 'then'; ELSE: 'else'; END_IF: 'end_if'; .... ? If there is any difference, as this impacts on performance ... Thanks

Read the article
ANTLR lexer mismatches tokens

- by Barry Brown

I have a simple ANTLR grammar, which I have stripped down to its bare essentials to demonstrate this problem I'm having. I am using ANTLRworks 1.3.1. grammar sample; assignment : IDENT ':=' NUM ';' ; IDENT : ('a'..'z')+ ; NUM : ('0'..'9')+ ; WS : (' '|'\n'|'\t'|'\r')+ {$channel=HIDDEN;} ; Obviously, this statement is accepted by the grammar: x := 99; But this one also is: x := @!$()()%99***; Output from the ANTLRworks Interpreter: What am I doing wrong? Even other sample grammars that come with ANTLR (such as the CMinus grammar) exhibit this behavior.

Read the article
Lexer written in Javascript?

- by Phobis

I have a project where a user needs to define a set of instructions for a ui that is completely written in javascript. I need to have the ability to parse a string of instructions and then translate them into instructions. Is there any libraries out there for parsing that are 100% javascript? Or a generator that will generate in javascript? Thanks!

Read the article
Does C# have (direct) flex/yacc port? Or what lexer/parser people use for C#?

- by prosseek

I might be wrong, but it looks like that there's no direct flex/bison (lex/yacc) port for C#/.NET so far. For LALR parser, I found GPPG/GPLEX, and for LL parser, there is the famous ANTLR. But, I want to reuse my flex/bison grammar as much as possible. Is there any direct port of flex/bison for C#? What lexer/parser people normally use for C#? Is there any reason for that choice?

Read the article
Understanding hand written lexers

- by Cole Johnson

I am going to make a compiler for C (C99; I own the standards PDF), written in C (go figure) and looking up on how compilers work on Wikipedia has told me a lot. However, after reading up on lexers has confused me. The Wikipedia page states that: the GNU Compiler Collection (gcc) uses hand-written lexers I have tried googling what a hand written lexer and have come up with nothing except for "making a flowchart that describes how it should function", however, isn't that how all software development should be done? So my question is: "What is a hand written lexer?"

Read the article
Force CL-Lex to read whole word

- by Flávio Cruz

I'm using CL-Lex to implement a lexer (as input for CL-YACC) and my language has several keywords such as "let" and "in". However, while the lexer recognizes such keywords, it does too much. When it finds words such as "init", it returns the first token as IN, while it should return a "CONST" token for the "init" word. This is a simple version of the lexer: (define-string-lexer lexer (...) ("in" (return (values :in $@))) ("[a-z]([a-z]|[A-Z]|\_)" (return (values :const $@)))) How do I force the lexer to fully read the whole word until some whitespace appears?

Read the article
How to efficently build an interpreter (lexer+parser) in C?

- by Rizo

I'm trying to make a meta-language for writing markup code (such as xml and html) wich can be directly embedded into C/C++ code. Here is a simple sample written in this language, I call it WDI (Web Development Interface): /* * Simple wdi/html sample source code */ #include <mySite> string name = "myName"; string toCapital(string str); html { head { title { mySiteTitle; } link(rel="stylesheet", href="style.css"); } body(id="default") { // Page content wrapper div(id="wrapper", class="some_class") { h1 { "Hello, " + toCapital(name) + "!"; } // Lists post ul(id="post_list") { for(post in posts) { li { a(href=post.getID()) { post.tilte; } } } } } } } Basically it is a C source with a user-friendly interface for html. As you can see the traditional tag-based style is substituted by C-like, with blocks delimited by curly braces. I need to build an interpreter to translate this code to html and posteriorly insert it into C, so that it can be compiled. The C part stays intact. Inside the wdi source it is not necessary to use prints, every return statement will be used for output (in printf function). The program's output will be clean html code. So, for example a heading 1 tag would be transformed like this: h1 { "Hello, " + toCapital(name) + "!"; } // would become: printf("<h1>Hello, %s!</h1>", toCapital(name)); My main goal is to create an interpreter to translate wdi source to html like this: tag(attributes) {content} = <tag attributes>content</tag> Secondly, html code returned by the interpreter has to be inserted into C code with printfs. Variables and functions that occur inside wdi should also be sorted in order to use them as printf parameters (the case of toCapital(name) in sample source). I am searching for efficient (I want to create a fast parser) way to create a lexer and parser for wdi. Already tried flex and bison, but as I am not sure if they are the best tools. Are there any good alternatives? What is the best way to create such an interpreter? Can you advise some brief literature on this issue?

Read the article
Is it possible to create a single tokenizer to parse this?

- by Adrian

This extends off this other Q&A thread, but is going into details that are out of scope from the original question. I am generating a parser that is to parse a context-sensitive grammar which can take in the following subset of symbols: ,, [, ], {, }, m/[a-zA-Z_][a-zA-Z_0-9]*/, m/[0-9]+/ The grammar can take in the following string { abc[1] }, } and parse it as ({, abc[1], }, }). Another example would be to take: { abc[1] [, } and parse it as ({, abc[1], [,, }). This is similar to the grammar used in Perl for the qw() syntax. The braces indicate that the contents are to be whitespace tokenized. A closing brace must be on its own to indicate the end of the whitespace tokenized group. Can this be done using a single lexer/tokenizer, or would it be necessary to have a separate tokenizer when parsing this group?

Read the article
What functions a lexer needs to provide?

- by M28

I am making a lexer, don't tell me to not do because I already did most of it. Currently it makes an array of tokens and that's it. I would like to know, what functions the lexer needs to provide and a brief explanation of what each function needs to do. I'll accept the most complete list. An example function would be: next: Consume the current token and return it

Read the article
What follows after lexical analysis?

- by madflame991

I'm working on a toy compiler (for some simple language like PL/0) and I have my lexer up and running. At this point I should start working on building the parse tree, but before I start I was wondering: How much information can one gather from just the string of tokens? Here's what I gathered so far: One can already do syntax highlighting having only the list of tokens. Numbers and operators get coloured accordingly and keywords also. Autoformatting (indenting) should also be possible. How? Specify for each token type how many white spaces or new line characters should follow it. Also when you print tokens modify an alignment variable (when the code printer reads "{" increment the alignment variable by 1, and decrement by 1 for "}". Whenever it starts printing on a new line the code printer will align according to this alignment variable) In languages without nested subroutines one can get a complete list of subroutines and their signature. How? Just read what follows after the "procedure" or "function" keyword until you hit the first ")" (this should work fine in a Pascal language with no nested subroutines) In languages like Pascal you can even determine local variables and their types, as they are declared in a special place (ok, you can't handle initialization as well, but you can parse sequences like: "var a, b, c: integer") Detection of recursive functions may also be possible, or even a graph representation of which subroutine calls who. If one can identify the body of a function then one can also search if there are any mentions of other function's names. Gathering statistics about the code, like number of lines, instructions, subroutines EDIT: I clarified why I think some processes are possible. As I read comments and responses I realise that the answer depends very much on the language that I'm parsing.

Read the article
How are "Json.org"-like specs graphs called and how can I generate them?

- by Sebastián Grignoli

In http://www.json.org Douglas Crockford shows the specs of the JSON format in two interesting ways: In the right side column he lists a text spec that looks like a YACC or LEX listing. In the main body of the homepage, he put several images that gives us a simple way to visually understand the valid sequences that composes a JSON string. Those images look like a description of the path that a finite state automaton would follow when parsing the JSON string. Wich are the names (if any) of that listing format and that kind of graphics? Is there any software that renders a source file containing the specification into that kind of images?

Read the article
ANTLRv3 How to set a custom Lexer and Parser Class names

- by Filip

Is there a way to specify custom class name (meaning independent of the grammar name) for the ANTLRv3 generated Parser and Lexer classes? So in the case of grammar MDD; //other stufff Automtaically it would crate MDDParser and MDDLexer, but i would like to have them for example as MDDBaseParser and MDDLexer.

Read the article
error in coding a lexer in c

- by mekasperasky

#include<stdio.h> #include<ctype.h> #include<string.h> /* this is a lexer which recognizes constants , variables ,symbols, identifiers , functions , comments and also header files . It stores the lexemes in 3 different files . One file contains all the headers and the comments . Another file will contain all the variables , another will contain all the symbols. */ int main() { int i=0,j,k,count=0; char a,b[100],c[10000],d[100]; memset ( d, 0, 100 ); j=30; FILE *fp1,*fp2; fp1=fopen("source.txt","r"); //the source file is opened in read only mode which will passed through the lexer fp2=fopen("lext.txt","w"); //now lets remove all the white spaces and store the rest of the words in a file if(fp1==NULL) { perror("failed to open source.txt"); //return EXIT_FAILURE; } i=0; k=0; while(!feof(fp1)) { a=fgetc(fp1); if(a!=' '&&a!='\n') { if (!isalpha(a)) { switch(a) { case '+':{fprintf(fp2,"+ ----> PLUS \n"); i=0;break;} case '-':{fprintf(fp2,"- ---> MINUS \n"); i=0;break;} case '*':{fprintf(fp2, "* --->MULT \n"); i=0;break;} case '/':{fprintf(fp2, "/ --->DIV \n"); i=0;break;} //case '+=':fprintf(fp2, "%.20s\n", "ADD_ASSIGN"); //case '-=':fprintf(fp2, "%.20s\n", "SUB_ASSIGN"); case '=':{fprintf(fp2, "= ---> ASSIGN \n"); i=0;break;} case '%':{fprintf(fp2, "% ---> MOD \n"); i=0;break;} case '<':{fprintf(fp2, "< ---> LESSER_THAN \n"); i=0;break;} case '>':{fprintf(fp2, "> --> GREATER_THAN \n"); i=0;break;} //case '++':fprintf(fp2, "%.20s\n", "INCREMENT"); //case '--':fprintf(fp2, "%.20s\n", "DECREMENT"); //case '==':fprintf(fp2, "%.20s\n", "ASSIGNMENT"); case ';':{fprintf(fp2, "; --->SEMI_COLUMN \n"); i=0;break;} case ':':{fprintf(fp2, ": --->COLUMN \n"); i=0;break;} case '(':{fprintf(fp2, "( --->LPAR \n"); i=0;break;} case ')':{fprintf(fp2, ") --->RPAR \n"); i=0;break;} case '{':{fprintf(fp2, "{ --->LBRACE \n"); i=0;break;} case '}':{fprintf(fp2, "} ---> RBRACE \n"); i=0;break;} } } else { d[i]=a; //printf("%c\n",d[i]); i=i+1; } //} /* we can make the lexer more complex by including even more depths of checks for the symbols*/ } else { d[i+1]='\0'; printf("\n"); if((strcmp(d,"if ")==0)){fprintf(fp2,"if ----> IDENTIFIER \n"); //printf("%s \n",d); memset ( d, 0, 100 ); //printf("%s \n",d); count=count+1;} else if(strcmp(d,"then")==0){fprintf(fp2,"then ----> IDENTIFIER \n"); count=count+1;} else if(strcmp(d,"else")==0){fprintf(fp2,"else ----> IDENTIFIER \n"); count=count+1;} else if(strcmp(d,"switch")==0){fprintf(fp2,"switch ----> IDENTIFIER \n"); count=count+1;} else if(strcmp(d,"printf")==0){fprintf(fp2,"prtintf ----> IDENTIFIER \n"); count=count+1;} else if(strcmp(d,"scanf")==0){fprintf(fp2,"scanf ----> IDENTIFIER \n"); count=count+1;} else if(strcmp(d,"NULL")==0){fprintf(fp2,"NULL ----> IDENTIFIER \n"); count=count+1;} else if(strcmp(d,"int")==0){fprintf(fp2,"INT ----> IDENTIFIER \n"); count=count+1;} else if(strcmp(d,"char")==0){fprintf(fp2,"char ----> IDENTIFIER \n"); count=count+1;} else if(strcmp(d,"float")==0){fprintf(fp2,"float ----> IDENTIFIER \n"); count=count+1;} else if(strcmp(d,"long")==0){fprintf(fp2,"long ----> IDENTIFIER \n"); count=count+1;} else if(strcmp(d,"double")==0){fprintf(fp2,"double ----> IDENTIFIER \n"); count=count+1;} else if(strcmp(d,"const")==0){fprintf(fp2,"const ----> IDENTIFIER \n"); count=count+1;} else if(strcmp(d,"continue")==0)fprintf(fp2,"continue ----> IDENTIFIER \n"); else if(strcmp(d,"size of")==0){fprintf(fp2,"size of ----> IDENTIFIER \n"); count=count+1;} else if(strcmp(d,"register")==0){fprintf(fp2,"register ----> IDENTIFIER \n"); count=count+1;} else if(strcmp(d,"short")==0){fprintf(fp2,"short ----> IDENTIFIER \n"); count=count+1;} else if(strcmp(d,"auto")==0){fprintf(fp2,"auto ----> IDENTIFIER \n"); count=count+1;} else if(strcmp(d,"while")==0){fprintf(fp2,"while ----> IDENTIFIER \n"); count=count+1;} else if(strcmp(d,"do")==0){fprintf(fp2,"do ----> IDENTIFIER \n"); count=count+1;} else if(strcmp(d,"case")==0){fprintf(fp2,"case ----> IDENTIFIER \n"); count=count+1;} else if (isdigit(d[i])) { fprintf(fp2,"%s ---->NUMBER",d); } else if (isalpha(a)) { fprintf(fp2,"%s ----> Variable",d); //printf("%s",d); // memset ( d, 0, 100 );} //fprintf(fp2, "s\n", b); i=0; k=k+1; continue; } i=i+1; k=k+1; } fclose(fp1); fclose(fp2); printf("%d",count); return 0; } In this code , my source.txt has if (a+b) stored . But only ( , + and ) is getting written into lext.txt and not the identifier if or the variable a and b . Any particular reason why?

Read the article
Python/YACC Lexer: Token priority?

- by Rosarch

I'm trying to use reserved words in my grammar: reserved = { 'if' : 'IF', 'then' : 'THEN', 'else' : 'ELSE', 'while' : 'WHILE', } tokens = [ 'DEPT_CODE', 'COURSE_NUMBER', 'OR_CONJ', 'ID', ] + list(reserved.values()) t_DEPT_CODE = r'[A-Z]{2,}' t_COURSE_NUMBER = r'[0-9]{4}' t_OR_CONJ = r'or' t_ignore = ' \t' def t_ID(t): r'[a-zA-Z_][a-zA-Z_0-9]*' if t.value in reserved.values(): t.type = reserved[t.value] return t return None However, the t_ID rule somehow swallows up DEPT_CODE and OR_CONJ. How can I get around this? I'd like those two to take higher precedence than the reserved words.

Read the article
VB6 Parser/Lexer/Scripter

- by rlb.usa

I've got a game in VB6 and it works great and all, but I have been toying with the idea of creating a scripting engine. Ii'm thinking I'd like VB6 to read in flat text script files for me and then lex/parse/execute them. I have good programming experience, and I've built a simple C compiler, as well as a LOGO emulator before. My question is: Are there any tools that I can use, like Lexx/Yakk/Bison to help me? How should I approach this problem in regards to lexing, parsing, and feeding the commands back to VB6 so I can handle them? Is this idea a BAD IDEA in the sense that there are too many obstacles in the way (For example, building minesweeper in assembly, though not impossible, is very difficult, and a bad idea.)?

Read the article

1 2 3 4 5 6 | Next Page >