tokenizing - Developer IT

Tokenizing a string with unequal number of spaces between fields

- by gatechgrad

I am tryint to tokenize entries from a file. However I am not able to use the line.split("") option because of unequal number of spaces between files. I am copying a few lines from my file below: "08-09-2010 21:21:46 00:22:7f:a6:9b:69 -79" "08-09-2010 21:21:46 04:4f:aa:b4:49:49 -79" "08-09-2010 21:21:46 04:4f:aa:31:4e:59 tikona 18002090044 -83" "08-09-2010 21:21:46 00:22:7f:26:9b:69 tikona 18002090044 -74" "08-09-2010 21:21:46 04:4f:aa:34:0d:c9 tikona 18002090044 -82" "08-09-2010 21:21:46 04:4f:aa:71:4e:59 -85" "08-09-2010 21:21:46 04:4f:aa:34:21:89 tikona 18002090044 -75" "08-09-2010 21:21:46 04:4f:aa:34:49:49 tikona 18002090044 -77" "08-09-2010 21:21:46 04:4f:aa:74:0d:c9 -85" "08-09-2010 21:22:47 18 APs were seen " I need to access the first column (which is a datetime object) the second column (00:22...) and the last column (-79 etc.). I have no trouble accessing the first and second columns, but not the last column. When I do a info=line.spilt(""), since the third column might or might no entries, I am not able to determine the token number. How do i access the 4th column? Is there a way i can use info[i].contains(" -")?

Read the article

Tokenizing Twitter Posts in Lucene

- by Amaç Herdagdelen

Hello, My question in a nutshell: Does anyone know of a TwitterAnalyzer or TwitterTokenizer for Lucene? More detailed version: I want to index a number of tweets in Lucene and keep the terms like @user or #hashtag intact. StandardTokenizer does not work because it discards the punctuation (but it does other useful stuff like keeping domain names, email addresses or recognizing acronyms). How can I have an analyzer which does everything StandardTokenizer does but does not touch terms like @user and #hashtag? My current solution is to preprocess the tweet text before feeding it into the analyzer and replace the characters by other alphanumeric strings. For example, String newText = newText.replaceAll("#", "hashtag"); newText = newText.replaceAll("@", "addresstag"); Unfortunately this method breaks legitimate email addresses but I can live with that. Does that approach make sense? Thanks in advance! Amaç

Read the article

Parsing/Tokenizing a String Containing a SQL Command

- by Alan Storm

Are there any open source libraries (any language, python/PHP preferred) that will tokenize/parse an ANSI SQL string into its various components? That is, if I had the following string SELECT a.foo, b.baz, a.bar FROM TABLE_A a LEFT JOIN TABLE_B b ON a.id = b.id WHERE baz = 'snafu'; I'd get back a data structure/object something like //fake PHPish $results['select-columns'] = Array[a.foo,b.baz,a.bar]; $results['tables'] = Array[TABLE_A,TABLE_B]; $results['table-aliases'] = Array[a=TABLE_A, b=TABLE_B]; //etc... Restated, I'm looking for the code in a database package that teases the SQL command apart so that the engine knows what to do with it. Searching the internet turns up a lot of results on how to parse a string WITH SQL. That's not what I want. I realize I could glop through an open source database's code to find what I want, but I was hoping for something a little more ready made, (although if you know where in the MySQL, PostgreSQL, SQLite source to look, feel free to pass it along) Thanks!

Read the article

Tokenizing numbers for a parser

- by René Nyffenegger

I am writing my first parser and have a few questions conerning the tokenizer. Basically, my tokenizer exposes a nextToken() function that is supposed to return the next token. These tokens are distinguished by a token-type. I think it would make sense to have the following token-types: SYMBOL (such as <, :=, ( and the like REMARK (or a comment) NUMBER IDENT (such as the name of a function or a variable) STRING (Something enclosed between "....") Now, do you think this makes sense? Also, I am struggling with the NUMBER token-type. Do you think it makes more sense to further split it up into a NUMBER and a FLOAT token-type? Without a FLOAT token-type, I'd receive NUMBER (eg 402), a SYMBOL (.) followed by another NUMBER (eg 203) if I were about to parse a float. Finally, what do you think makes more sense for the tokenizer to return when it encounters a -909? Should it return the SYMBOL - first, followed by the NUMBER 909 or should it return a NUMBER -909 right away?

Read the article

Tokenizing JavaScript: A look at what’s left after minification

- by InfinitiesLoop

Minifiers JavaScript minifiers are popular these days. Closure , YUI Compressor , Microsoft Ajax Minifier , to name a few. Using one is essential for any site that uses more than a little script and cares about performance. Each tool of course has advantages and disadvantages. But they all do a pretty good job. The results vary only slightly in the grand scheme of things. Not enough to make so much of a difference that I’d say you should always use one over the other – use whatever fits in with your...(read more)

Read the article

Tokenizing a string with variable whitespace

- by Ron Holcomb

I've read through a few threads detailing how to tokenize strings, but I'm apparently too thick to adapt their suggestions and solutions into my program. What I'm attempting to do is tokenize each line from a large (5k+) line file into two strings. Here's a sample of the lines: 0 -0.11639404 9.0702948e-05 0.00012207031 0.0001814059 0.051849365 0.00027210884 0.062103271 0.00036281179 0.034423828 0.00045351474 0.035125732 The difference I'm finding between my lines and the other sample input from other threads is that I have a variable amount of whitespace between the parts that I want to tokenize. Anyways, here's my attempt at tokenizing: #include <iostream> #include <iomanip> #include <fstream> #include <string> using namespace std; int main(int argc, char *argv[]) { ifstream input; ofstream output; string temp2; string temp3; input.open(argv[1]); output.open(argv[2]); if (input.is_open()) { while (!input.eof()) { getline(input, temp2, ' '); while (!isspace(temp2[0])) getline(input, temp2, ' '); getline (input, temp3, '\n'); } input.close(); cout << temp2 << endl; cout << temp3 << endl; return 0; } I've clipped it some, since the troublesome bits are here. The issue that I'm having is that temp2 never seems to catch a value. Ideally, it should get populated with the first column of numbers, but it doesn't. Instead, it is blank, and temp3 is populated with the entire line. Unfortunately, in my course we haven't learned about vectors, so I'm not quite sure how to implement them in the other solutions for this I've seen, and I'd like to not just copy-paste code for assignments to get things work without actually understanding it. So, what's the extremely obvious/already been answered/simple solution I'm missing? I'd like to stick to standard libraries that g++ uses if at all possible.

Read the article

How add default value JQuery Tokenizing Autocomplete ?

- by TeknoSeyfo

I am using jquery autocomplete. I want, add default value when loaded page. I added input value tag. But, did not work :( How can I do ?

Read the article

StringTokenizer problem of tokenizing

- by Mr CooL

String a ="the STRING TOKENIZER CLASS ALLOWS an APPLICATION to BREAK a STRING into TOKENS. "; StringTokenizer st = new StringTokenizer(a); while (st.hasMoreTokens()){ System.out.println(st.nextToken()); Given above codes, the output is following, the STRING TOKENIZER CLASS ALLOWS an APPLICATION to BREAK a STRING into TOKENS. My only question is why the "STRING TOKENIZER CLASS" has been combined into one token???????? When I try to run this code, System.out.println("STRING TOKENIZER CLASS".contains(" ")); It printed funny result, FALSE It sound not logical right? I've no idea what went wrong. I found out the reason, the space was not recognized as valid space by Java somehow. But, I don't know how it turned up to be like that from the front processing up to the code that I've posted.

Read the article

Performance of tokenizing CSS in PHP

- by Boldewyn

This is a noob question from someone who hasn't written a parser/lexer ever before. I'm writing a tokenizer/parser for CSS in PHP (please don't repeat with 'OMG, why in PHP?'). The syntax is written down by the W3C neatly here (CSS2.1) and here (CSS3, draft). It's a list of 21 possible tokens, that all (but two) cannot be represented as static strings. My current approach is to loop through an array containing the 21 patterns over and over again, do an if (preg_match()) and reduce the source string match by match. In principle this works really good. However, for a 1000 lines CSS string this takes something between 2 and 8 seconds, which is too much for my project. Now I'm banging my head how other parsers tokenize and parse CSS in fractions of seconds. OK, C is always faster than PHP, but nonetheless, are there any obvious D'Oh! s that I fell into? I made some optimizations, like checking for '@', '#' or '"' as the first char of the remaining string and applying only the relevant regexp then, but this hadn't brought any great performance boosts. My code (snippet) so far: $TOKENS = array( 'IDENT' => '...regexp...', 'ATKEYWORD' => '@...regexp...', 'String' => '"...regexp..."|\'...regexp...\'', //... ); $string = '...CSS source string...'; $stream = array(); // we reduce $string token by token while ($string != '') { $string = ltrim($string, " \t\r\n\f"); // unconsumed whitespace at the // start is insignificant but doing a trim reduces exec time by 25% $matches = array(); // loop through all possible tokens foreach ($TOKENS as $t => $p) { // The '&' is used as delimiter, because it isn't used anywhere in // the token regexps if (preg_match('&^'.$p.'&Su', $string, $matches)) { $stream[] = array($t, $matches[0]); $string = substr($string, strlen($matches[0])); // Yay! We found one that matches! continue 2; } } // if we come here, we have a syntax error and handle it somehow } // result: an array $stream consisting of arrays with // 0 => type of token // 1 => token content

Read the article

reading a line, tokenizing and assigning to struct in C

- by Dervin Thunk

line is fgets'd, and running in a while loop with counter n, d is a struct with 2 char arrays, p and q. Basically, in a few words, I want to read a line, separate it into 2 strings, one up until the first space, and the rest of the line. I clean up afterwards (\n from the file becomes \'0'). The code works, but is there a more idiomatic way to do this? What errors am I running into "unknowingly"? int spc = strcspn(line," "); strncpy(d[n].p, line, spc); d[n].p[spc+1]='\0'; int l = strlen(line)-spc; strncpy(d[n].q, line+spc+1, l); char* nl = strchr(d[n].q, '\n'); if(nl){ *nl='\0'; } n++; Thanks.

Read the article

Modify loopj jQuery Plugin: Tokenizing Autocomplete Text Entry

- by dev.e.loper

I'm looking into updating LoopJ's autocomplete plug-in. I want to override one of the methods (highlight_term to be exact) however it looks like that method is private.

Read the article

How to get a Token from a Lucene TokenStream?

- by FarmBoy

I'm trying to use Apache Lucene for tokenizing, and I am baffled at the process to obtain Tokens from a TokenStream. The worst part is that I'm looking at the comments in the JavaDocs that address my question. http://lucene.apache.org/java/3_0_1/api/core/org/apache/lucene/analysis/TokenStream.html#incrementToken%28%29 Somehow, an AttributeSource is supposed to be used, rather than Tokens. I'm totally at a loss. Can anyone explain how to get token-like information from a TokenStream?

Read the article

Parsing data from txt file in J2ME

- by CSFYPMAIL

Basically I'm creating an indoor navigation system in J2ME. I've put the location details in a .txt file i.e. Locations names and their coordinates. Edges with respective start node and end node as well as the weight (length of the node). I put both details in the same file so users dont have to download multiple files to get their map working (it could become time consuming and seem complex). So what i did is to seperate the deferent details by typing out location Names and coordinates first, After that I seperated that section from the next section which is the edges by drawing a line with multiple underscores. Now the problem I'm having is parsing the different details into seperate arrays by setting up a command (while manually tokenizing the input stream) to check wether the the next token is an underscore. If it is, (in pseudocode terms), move to the next line in the stream, create a new array and fill it up with the next set of details. I found a some explanation/code HERE that does something similar but still parses into one array, although it manually tokenizes the input. Any ideas on what to do? Thanks Text File Explanation The text has the following format... <--1stSection-- /** * Section one has the following format * xCoordinate;yCoordinate;LocationName */ 12;13;New York City 40;12;Washington D.C. ...e.t.c _________________________ <--(underscore divider) <--2ndSection-- /** * Its actually an adjacency list but indirectly provides "edge" details. * Its in this form * StartNode/MainReferencePoint;Endnode1;distance2endNode1;Endnode2;distance2endNode2;...e.t.c */ philadelphia;Washington D.C.;7;New York City;2 New York City;Florida;24;Illinois;71 ...e.t.c

Read the article

Token replacement

- by ClarkeyBoy

Hey, I currently have a system on my website whereby I can put something like "[cfe]" anywhere in the site and, when the page is rendered, it will replace it with the root to the customer front end (same for "[afe]" and admin front end - so in the admin front end I can put "[cfe]/Default.aspx" to link to the homepage on the customer front end. This is in place as I have a development version of the site, then a test and a live version too. All 3 may have different roots to each section (for example the way the website is set up, the root to the admin front end in test is "/test/Administration/", but in live and development it is just "/Administration/"). Which version it is depends on the URL - all my development sites are in a folder called "development", whereas test is in a folder called "test" and any live urls do not contain either of these. There are also 3 different databases - one for each. All 3, obviously, require a different connection string. I also have a string replacement function in place which can change, for example, "[Product:Cards]" to point to the Cards catalogue page. Problem is that for this I go through all the products and do a replacement on "[Product:" & Product.Name() & "]". However I would like to take this further. I would like to pick up these custom strings when the page is rendered so it picks up "[Product:Cards]" and then goes off to find product "Cards" and replaces the string with a link to the Cards page, rather than looping through all the products and doing a replace just on the off chance that there are any replacements to make. One use for this, which I may start using in the future if I can figure out how to do this, is like on Wikipedia where you put the title of the page you want to point to, then a divider (think its a pipe from memory) then the link text. I would like to apply this to the above situation. This way broken links can also be picked up, and reported to admin (a major advantage as they can then locate them and remove the link or add the product / page that it refers to). I would like to take this to the stage where content of entire pages can be rearranged (kinda like web parts, but not as advanced as that). I mean like so you can put [layout type="3columnImageTopRight" image="imageurl"]Content here[/layout]. This will display, as specified, an image in the top right (with padding at the left and bottom) and 3 columns - maybe with the image spanning one or two columns). The imageurl can be specified as another token: maybe like [Image:imagename.gif] or something. This replaces it with the root to the image folder and then the specified filename. I have not really looked into how I am going to split the content into 3 columns yet, but this would be something to look at for my dissertation and then implement after my project deadline at least. Does anyone have any ideas or pointers which could help me with this? Also if this is not strictly token replacement then please point me to what it is, so I can further develop this. Thanks in advance, Regards, Richard Clarke

Read the article

ANTLR lexer mismatches tokens

- by Barry Brown

I have a simple ANTLR grammar, which I have stripped down to its bare essentials to demonstrate this problem I'm having. I am using ANTLRworks 1.3.1. grammar sample; assignment : IDENT ':=' NUM ';' ; IDENT : ('a'..'z')+ ; NUM : ('0'..'9')+ ; WS : (' '|'\n'|'\t'|'\r')+ {$channel=HIDDEN;} ; Obviously, this statement is accepted by the grammar: x := 99; But this one also is: x := @!$()()%99***; Output from the ANTLRworks Interpreter: What am I doing wrong? Even other sample grammars that come with ANTLR (such as the CMinus grammar) exhibit this behavior.

Read the article

GNU Flex, multiline rule

- by Simone Margaritelli

Hi there i have a flex rule inside my lexer definition : operators "[]"|"[]="|"[]<"|".."|"."|".="|"+"|"+="|"-"|"-="|"/"|"/="|"*"|"*="|"%"|"%="|"++"|"--"|"^"|"^="|"~"|"&"|"&="|"|"|"|="|"<<"|"<<="|">>"|"!"|"<"|">"|">="|"<="|"=="|"!="|"&&"|"||"|"~=" Is there any way to split this ruole on more lines to keep it clearer? I tried with \ just like macros but it does not seem to be accepted by flex :( PS: I don't want to split the rule in more sub-rules, but only split its regex in more lines to keep the code clearer.

Read the article

Calculation Expression Parser with Nesting and Variables in ActionScript

- by yuletide

Hi There, I'm trying to enable dynamic fields in the configuration file for my mapping app, but I can't figure out how to parse the "equation" passed in by the user, at least not without writing a whole parser from scratch! I'm sure there is some easier way to do this, and so I'm asking for ideas! Basic idea: public var testString:String = "(#TOTPOP_CY#-#HISPOP_CY#)/#TOTPOP_CY#"; public var valueObject:Object = {TOTPOP_CY:1000, HISPOP_CY:100}; public function calcParse(eq:String):String { // do calculations return calculatedValue } So far, I was thinking of splitting the expression by either the operators, or maybe the variable tokens, but that gets rid of the parenthetical nesting. Alternatively, use a series of regex to search and replace each piece of the expression with its value, recursively running until only a number is left. But I don't think regex does math (i.e. replace "\d + \d" with the sum of the two numbers) Ideally, I'd just do a find/replace all variable names with their values, then run an eval(), but there's no eval in AS... eesh I downloaded some course materials for a course on compiler design, so maybe I'll just write a full-fledged calculator language and parser and port it over from the OTHER flex (the parser generator) :-D

Read the article

valgrind complains doing a very simple strtok in c

- by monkeyking

Hi I'm trying to tokenize a string by loading an entire file into a char[] using fread. For some strange reason it is not always working, and valgrind complains in this very small sample program. Given an input like test.txt first second And the following program #include <stdio.h> #include <string.h> #include <stdlib.h> #include <sys/stat.h> //returns the filesize in bytes size_t fsize(const char* fname){ struct stat st ; stat(fname,&st); return st.st_size; } int main(int argc, char *argv[]){ FILE *fp = NULL; if(NULL==(fp=fopen(argv[1],"r"))){ fprintf(stderr,"\t-> Error reading file:%s\n",argv[1]); return 0; } char buffer[fsize(argv[1])]; fread(buffer,sizeof(char),fsize(argv[1]),fp); char *str = strtok(buffer," \t\n"); while(NULL!=str){ fprintf(stderr,"token is:%s with strlen:%lu\n",str,strlen(str)); str = strtok(NULL," \t\n"); } return 0; } compiling like gcc test.c -std=c99 -ggdb running like ./a.out test.txt thanks

Read the article

How to read values from file. tokenizer

- by user69514

I have a file in which each line contains two numbers. The problem is that the two number are separated by a space, but the space can be any number of blank spaces. either one, two, or more. I want to read the line and store each of the numbers in a variable, but I'm not sure how to tokenize it. i.e 1 5 3 2 5 6 3 4 83 54 23 23 32 88 8 203

Read the article

JavaCC: Please help me understand token ambiguity.

- by java.is.for.desktop

Hello, everyone! I had already many problems with understanding, how ambiguous tokens can be handled elegantly (or somehow at all) in JavaCC. Let's take this example: I want to parse XML processing instruction. The format is: "<?" <target> <data> "?>": target is an XML name, data can be anything except ?>, because it's the closing tag. So, lets define this in JavaCC: (I use lexical states, in this case DEFAULT and PROC_INST) TOKEN : <#NAME : (very-long-definition-from-xml-1.1-goes-here) > TOKEN : <WSS : (" " | "\t")+ > // WSS = whitespaces <DEFAULT> TOKEN : {<PI_START : "<?" > : PROC_INST} <PROC_INST> TOKEN : {<PI_TARGET : <NAME> >} <PROC_INST> TOKEN : {<PI_DATA : ~[] >} // accept everything <PROC_INST> TOKEN : {<PI_END : "?>" > : DEFAULT} Now the part which recognizes processing instructions: void PROC_INSTR() : {} { ( <PI_START> (t=<PI_TARGET>){System.out.println("target: " + t.image);} <WSS> (t=<PI_DATA>){System.out.println("data: " + t.image);} <PI_END> ) {} } The problem is (i guess): hence <PI_DATA> recognizes "everything", my definition is wrong. Let's test it with <?mytarget here-goes-some-data?>: The target is recognized: "target: mytarget". But now I get my favorite JavaCC parsing error: !! procinstparser.ParseException: Encountered "" at line 1, column 15. !! Was expecting one of: !! Encountered nothing? Was expecting nothing? Or what? Thank you, JavaCC! I know, that I could use the MORE keyword of JavaCC, but this would give me the whole processing instruction as one token, so I'd had to parse/tokenize it further by myself. Why should I do that? Am I writing a parser that does not parse? What I would need is telling JavaCC to recognize "everything until ?>" as processing instruction data. How can it be done? Thank you.

Read the article

How to make the tokenizer detect empty spaces while using strtok()

- by Shadi Al Mahallawy

I am designing a c++ program, somewhere in the program i need to detect if there is a blank(empty token) next to the token used know eg. if(token1==start) { token2=strtok(NULL," "); if(token2==NULL) {LCCTR=0;} else {LCCTR=atoi(token2);} so in the previous peice token1 is pointing to start , and i want to check if there is anumber next to the start , so I used token2=strtok(NULL," ") to point to the next token but unfortunattly the strtok function cannot detect empty spaces so it gives me an error at run time"INVALID NULL POINTER" how can i fix it or is there another function to use to detect empty spaces #include <iostream> #include<string> #include<map> #include<iomanip> #include<fstream> #include<ctype.h> using namespace std; const int MAX=300; int LCCTR; int START(char* token1); char* PASS1(char*token1); void tokinizer() { ifstream in; ofstream out; char oneline[MAX]; in.open("infile.txt"); out.open("outfile.txt"); if(in.is_open()) { char *token1; in.getline(oneline,MAX); token1 = strtok(oneline," \t"); START (token1); //cout<<'\t'; while(token1!=NULL) { //PASS1(token1); //cout<<token1<<" "; token1=strtok(NULL," \t"); if(NULL==token1) {//cout<<endl; //cout<<LCCTR<<'\t'; in.getline(oneline,MAX); token1 = strtok(oneline," \t"); } } } in.close(); out.close(); } int START(char* token1) { string start("START"); char*token2; if(token1 != start) {LCCTR=0;} else if(token1==start) { token2=strchr(token1+2,' '); cout<<token2; if(token2==NULL) {LCCTR=0;} else {LCCTR=atoi(token2); if(atoi(token2)>9999||atoi(token2)<0){cout<<"IVALID STARTING ADDRESS"<<endl;exit(1);} } } return LCCTR; } char* PASS1 (char*token1) { map<string,int> operations; map<string,int>symtable; map<string,int>::iterator it; pair<map<string,int>::iterator,bool> ret; char*token3=NULL; char*token2=NULL; string test; string comp(" "); string start("START"); string word("WORD"); string byte("BYTE"); string resb("RESB"); string resw("RESW"); string end("END"); operations["ADD"] = 18; operations["AND"] = 40; operations["COMP"] = 28; operations["DIV"] = 24; operations["J"] = 0X3c; operations["JEQ"] =30; operations["JGT"] =34; operations["JLT"] =38; operations["JSUB"] =48; operations["LDA"] =00; operations["LDCH"] =50; operations["LDL"] =55; operations["LDX"] =04; operations["MUL"] =20; operations["OR"] =44; operations["RD"] =0xd8; operations["RSUB"] =0x4c; operations["STA"] =0x0c; operations["STCH"] =54; operations["STL"] =14; operations["STSW"] =0xe8; operations["STX"] =10; operations["SUB"] =0x1c; operations["TD"] =0xe0; operations["TIX"] =0x2c; operations["WD"] =0xdc; if(operations.find("ADD")->first==token1) { token2=strtok(NULL," "); //test=token2; cout<<token2; //if(test.empty()){cout<<"MISSING OPERAND"<<endl;exit(1);} //else{LCCTR=LCCTR+3;} } /*else if(operations.find("AND")->first==token1) { token2=strtok(NULL," "); test=token2; if(test.empty()){cout<<"MISSING OPERAND"<<endl;exit(1);} else{LCCTR=LCCTR+3;} } else if(operations.find("COMP")->first==token1) { token2=token1+5; test=token2; if(test.empty()){cout<<"MISSING OPERAND"<<endl;exit(1);} else{LCCTR=LCCTR+3;} } else if(operations.find("DIV")->first==token1) { token2=token1+4; test=token2; if(test.empty()){cout<<"MISSING OPERAND"<<endl;exit(1);} else{LCCTR=LCCTR+3;} } else if(operations.find("J")->first==token1) { token2=token1+2; test=token2; if(test.empty()){cout<<"MISSING OPERAND"<<endl;exit(1);} else{LCCTR=LCCTR+3;} } else if(operations.find("JEQ")->first==token1) { token2=token1+5; test=token2; if(test.empty()){cout<<"MISSING OPERAND"<<endl;exit(1);} else{LCCTR=LCCTR+3;} } else if(operations.find("JGT")->first==token1) { token2=strtok(NULL," "); test=token2; if(test.empty()){cout<<"MISSING OPERAND"<<endl;exit(1);} else{LCCTR=LCCTR+3;} } else if(operations.find("JLT")->first==token1) { token2=token1+6; test=token2; if(test.empty()){cout<<"MISSING OPERAND"<<endl;exit(1);} else{LCCTR=LCCTR+3;} } else if(operations.find("JSUB")->first==token1) { token2=token1+6; test=token2; if(test.empty()){cout<<"MISSING OPERAND"<<endl;exit(1);} else{LCCTR=LCCTR+3;} } else if(operations.find("LDA")->first==token1) { token2=token1+6; test=token2; if(test.empty()){cout<<"MISSING OPERAND"<<endl;exit(1);} else{LCCTR=LCCTR+3;} } else if(operations.find("LDCH")->first==token1) { token2=token1+6; test=token2; if(test.empty()){cout<<"MISSING OPERAND"<<endl;exit(1);} else{LCCTR=LCCTR+3;} } else if(operations.find("LDL")->first==token1) { token2=token1+6; test=token2; if(test.empty()){cout<<"MISSING OPERAND"<<endl;exit(1);} else{LCCTR=LCCTR+3;} } else if(operations.find("LDX")->first==token1) { token2=token1+6; test=token2; if(test.empty()){cout<<"MISSING OPERAND"<<endl;exit(1);} else{LCCTR=LCCTR+3;} } else if(operations.find("MUL")->first==token1) { token2=token1+6; test=token2; if(test.empty()){cout<<"MISSING OPERAND"<<endl;exit(1);} else{LCCTR=LCCTR+3;} } else if(operations.find("OR")->first==token1) { token2=token1+6; test=token2; if(test.empty()){cout<<"MISSING OPERAND"<<endl;exit(1);} else{LCCTR=LCCTR+3;} } else if(operations.find("RD")->first==token1) { token2=token1+6; test=token2; if(test.empty()){cout<<"MISSING OPERAND"<<endl;exit(1);} else{LCCTR=LCCTR+3;} } else if(operations.find("RSUB")->first==token1) { token2=token1+6; test=token2; if(test.empty()){cout<<"MISSING OPERAND"<<endl;exit(1);} else{LCCTR=LCCTR+3;} } else if(operations.find("STA")->first==token1) { token2=token1+6; test=token2; if(test.empty()){cout<<"MISSING OPERAND"<<endl;exit(1);} else{LCCTR=LCCTR+3;} } else if(operations.find("STCH")->first==token1) { token2=token1+6; test=token2; if(test.empty()){cout<<"MISSING OPERAND"<<endl;exit(1);} else{LCCTR=LCCTR+3;} } else if(operations.find("STL")->first==token1) { token2=token1+6; test=token2; if(test.empty()){cout<<"MISSING OPERAND"<<endl;exit(1);} else{LCCTR=LCCTR+3;} } else if(operations.find("STSW")->first==token1) { token2=token1+6; test=token2; if(test.empty()){cout<<"MISSING OPERAND"<<endl;exit(1);} else{LCCTR=LCCTR+3;} } else if(operations.find("STX")->first==token1) { token2=token1+6; test=token2; if(test.empty()){cout<<"MISSING OPERAND"<<endl;exit(1);} else{LCCTR=LCCTR+3;} } else if(operations.find("SUB")->first==token1) { token2=token1+6; test=token2; if(test.empty()){cout<<"MISSING OPERAND"<<endl;exit(1);} else{LCCTR=LCCTR+3;} } else if(operations.find("TD")->first==token1) { token2=token1+6; test=token2; if(test.empty()){cout<<"MISSING OPERAND"<<endl;exit(1);} else{LCCTR=LCCTR+3;} } else if(operations.find("TIX")->first==token1) { token2=token1+6; test=token2; if(test.empty()){cout<<"MISSING OPERAND"<<endl;exit(1);} else{LCCTR=LCCTR+3;} } else if(operations.find("WD")->first==token1) { token2=token1+6; test=token2; if(test.empty()){cout<<"MISSING OPERAND"<<endl;exit(1);} else{LCCTR=LCCTR+3;} }*/ //else if( if(word==token1) {LCCTR=LCCTR+3;} else if(byte==token1) {string test; token2=token1+7; test=token2; if(test[0]=='C') {token3=token1+10; test=token3; if(test.length()>15) {cout<<"ERROR"<<endl; exit(1);} } else if(test[0]=='X') {token3=token1+10; test=token3; if(test.length()>14) {cout<<"ERROR"<<endl; exit(1);} } LCCTR=LCCTR+test.length(); } else if(resb==token1) {token3=token1+5; LCCTR=LCCTR+atoi(token3);} else if(resw==token1) {token3=token1+5; LCCTR=LCCTR+3*atoi(token3);} else if(end==token1) {exit(1);} /*else { test=token1; int last=test.length(); if(token1==start||test[0]=='C'||test[0]=='X'||ispunct(test[last])||isdigit(test[0])||isdigit(test[1])||isdigit(test[2])||isdigit(test[3])){} else { token2=strtok(NULL," "); //test=token2; cout<<token2; if(token2!=NULL) { symtable.insert( pair<string,int>(token1,LCCTR)); for(it=symtable.begin() ;it!=symtable.end() ;++it) {/*cout<<"symbol: "<<it->first<<" LCCTR: "<<it->second<<endl;} } else{} } }*/ return token3; } int main() { tokinizer(); return 0; }

Read the article

Does PL/SQL have an equivalent StringTokenizer to Java's?

- by dacracot

I use java.util.StringTokenizer for simple parsing of delimited strings in java. I have a need for the same type of mechanism in pl/sql. I could write it, but if it already exists, I would prefer to use that. Anyone know of a pl/sql implementation? Some useful alternative?

Read the article

parsing a string of ascii text into separate variables

- by jml

Hi there, I have a piece of text that gets handed to me like: here is line one\n\nhere is line two\n\nhere is line three What I would like to do is break this string up into three separate variables. I'm not quite sure how one would go about accomplishing this in python. Thanks for any help, jml

Read the article

Tokenize problem in Java with separator ". "

- by user112976

I need to split a text using the separator ". ". For example I want this string : Washington is the U.S Capital. Barack is living there. To be cut into two parts: Washington is the U.S Capital. Barack is living there. Here is my code : // Initialize the tokenizer StringTokenizer tokenizer = new StringTokenizer("Washington is the U.S Capital. Barack is living there.", ". "); while (tokenizer.hasMoreTokens()) { System.out.println(tokenizer.nextToken()); } And the output is unfortunately : Washington is the U S Capital Barack is living there Can someone explain what's going on?

Read the article

JavaCC: How can one exclude a string from a token? (A.k.a. understanding token ambiguity.)

- by java.is.for.desktop

Hello, everyone! I had already many problems with understanding, how ambiguous tokens can be handled elegantly (or somehow at all) in JavaCC. Let's take this example: I want to parse XML processing instruction. The format is: "<?" <target> <data> "?>": target is an XML name, data can be anything except ?>, because it's the closing tag. So, lets define this in JavaCC: (I use lexical states, in this case DEFAULT and PROC_INST) TOKEN : <#NAME : (very-long-definition-from-xml-1.1-goes-here) > TOKEN : <WSS : (" " | "\t")+ > // WSS = whitespaces <DEFAULT> TOKEN : {<PI_START : "<?" > : PROC_INST} <PROC_INST> TOKEN : {<PI_TARGET : <NAME> >} <PROC_INST> TOKEN : {<PI_DATA : ~[] >} // accept everything <PROC_INST> TOKEN : {<PI_END : "?>" > : DEFAULT} Now the part which recognizes processing instructions: void PROC_INSTR() : {} { ( <PI_START> (t=<PI_TARGET>){System.out.println("target: " + t.image);} <WSS> (t=<PI_DATA>){System.out.println("data: " + t.image);} <PI_END> ) {} } Let's test it with <?mytarget here-goes-some-data?>: The target is recognized: "target: mytarget". But now I get my favorite JavaCC parsing error: !! procinstparser.ParseException: Encountered "" at line 1, column 15. !! Was expecting one of: !! Encountered nothing? Was expecting nothing? Or what? Thank you, JavaCC! I know, that I could use the MORE keyword of JavaCC, but this would give me the whole processing instruction as one token, so I'd had to parse/tokenize it further by myself. Why should I do that? Am I writing a parser that does not parse? The problem is (i guess): hence <PI_DATA> recognizes "everything", my definition is wrong. I should tell JavaCC to recognize "everything except ?>" as processing instruction data. But how can it be done? NOTE: I can only exclude single characters using ~["a"|"b"|"c"], I can't exclude strings such as ~["abc"] or ~["?>"]. Another great anti-feature of JavaCC. Thank you.

Search Results

Search found 52 results on 3 pages for 'tokenizing'.

Page 1/3 | 1 2 3 | Next Page >

- by gatechgrad

- by Amaç Herdagdelen

- by Alan Storm

- by René Nyffenegger

- by InfinitiesLoop

- by Ron Holcomb

- by TeknoSeyfo

- by Mr CooL

- by Boldewyn

- by Dervin Thunk

- by dev.e.loper

- by FarmBoy

- by CSFYPMAIL

- by ClarkeyBoy

- by Barry Brown

- by Simone Margaritelli

- by yuletide

- by monkeyking

- by user69514

- by java.is.for.desktop

- by Shadi Al Mahallawy

- by dacracot

- by jml

- by user112976

- by java.is.for.desktop

1 2 3 | Next Page >