Search Results

Search found 4222 results on 169 pages for 'dtd parsing'.

Page 39/169 | < Previous Page | 35 36 37 38 39 40 41 42 43 44 45 46 | Next Page >

Technique to remove common words(and their plural versions) from a string

- by Jake M

I am attempting to find tags(keywords) for a recipe by parsing a long string of text. The text contains the recipe ingredients, directions and a short blurb. What do you think would be the most efficient way to remove common words from the tag list? By common words, I mean words like: 'the', 'at', 'there', 'their' etc. I have 2 methodologies I can use, which do you think is more efficient in terms of speed and do you know of a more efficient way I could do this? Methodology 1: - Determine the number of times each word occurs(using the library Collections) - Have a list of common words and remove all 'Common Words' from the Collection object by attempting to delete that key from the Collection object if it exists. - Therefore the speed will be determined by the length of the variable delims import collections from Counter delim = ['there','there\'s','theres','they','they\'re'] # the above will end up being a really long list! word_freq = Counter(recipe_str.lower().split()) for delim in set(delims): del word_freq[delim] return freq.most_common() Methodology 2: - For common words that can be plural, look at each word in the recipe string, and check if it partially contains the non-plural version of a common word. Eg; For the string "There's a test" check each word to see if it contains "there" and delete it if it does. delim = ['this','at','them'] # words that cant be plural partial_delim = ['there','they',] # words that could occur in many forms word_freq = Counter(recipe_str.lower().split()) for delim in set(delims): del word_freq[delim] # really slow for delim in set(partial_delims): for word in word_freq: if word.find(delim) != -1: del word_freq[delim] return freq.most_common()

Read the article
PyParsing: Not all tokens passed to setParseAction()

- by Rosarch

I'm parsing sentences like "CS 2110 or INFO 3300". I would like to output a format like: [[("CS" 2110)], [("INFO", 3300)]] To do this, I thought I could use setParseAction(). However, the print statements in statementParse() suggest that only the last tokens are actually passed: >>> statement.parseString("CS 2110 or INFO 3300") Match [{Suppress:("or") Re:('[A-Z]{2,}') Re:('[0-9]{4}')}] at loc 7(1,8) string CS 2110 or INFO 3300 loc: 7 tokens: ['INFO', 3300] Matched [{Suppress:("or") Re:('[A-Z]{2,}') Re:('[0-9]{4}')}] -> ['INFO', 3300] (['CS', 2110, 'INFO', 3300], {'Course': [(2110, 1), (3300, 3)], 'DeptCode': [('CS', 0), ('INFO', 2)]}) I expected all the tokens to be passed, but it's only ['INFO', 3300]. Am I doing something wrong? Or is there another way that I can produce the desired output? Here is the pyparsing code: from pyparsing import * def statementParse(str, location, tokens): print "string %s" % str print "loc: %s " % location print "tokens: %s" % tokens DEPT_CODE = Regex(r'[A-Z]{2,}').setResultsName("DeptCode") COURSE_NUMBER = Regex(r'[0-9]{4}').setResultsName("CourseNumber") OR_CONJ = Suppress("or") COURSE_NUMBER.setParseAction(lambda s, l, toks : int(toks[0])) course = DEPT_CODE + COURSE_NUMBER.setResultsName("Course") statement = course + Optional(OR_CONJ + course).setParseAction(statementParse).setDebug()

Read the article
Why am I getting a ParseException when using SimpleDateFormat to format a date and then parse it?

- by Greg

I have been debugging some existing code for which unit tests are failing on my system, but not on colleagues' systems. The root cause is that SimpleDateFormat is throwing ParseExceptions when parsing dates that should be parseable. I created a unit test that demonstrates the code that is failing on my system: import java.text.DateFormat; import java.text.ParseException; import java.text.SimpleDateFormat; import java.util.Date; import java.util.TimeZone; import junit.framework.TestCase; public class FormatsTest extends TestCase { public void testParse() throws ParseException { DateFormat formatter = new SimpleDateFormat("yyyyMMddHHmmss.SSS Z"); formatter.setTimeZone(TimeZone.getDefault()); formatter.setLenient(false); formatter.parse(formatter.format(new Date())); } } This test throws a ParseException on my system, but runs successfully on other systems. java.text.ParseException: Unparseable date: "20100603100243.118 -0600" at java.text.DateFormat.parse(DateFormat.java:352) at FormatsTest.testParse(FormatsTest.java:16) I have found that I can setLenient(true) and the test will succeed. The setLenient(false) is what is used in the production code that this test mimics, so I don't want to change it.

Read the article
How to transform a production to LL(1) for a list separated by a semicolon?

- by Subb

Hi, I'm reading this introductory book on parsing (which is pretty good btw) and one of the exercice is to "build a parser for your favorite language." Since I don't want to die today, I thought I could do a parser for something relatively simple, ie a simplified CSS. Note: This book teach you how to right a LL(1) parser using the recursive-descent algorithm. So, as a sub-exercice, I am building the grammar from what I know of CSS. But I'm stuck on a production that I can't transform in LL(1) : //EBNF block = "{", declaration, {";", declaration}, [";"], "}" //BNF <block> =:: "{" <declaration> "}" <declaration> =:: <single-declaration> <opt-end> | <single-declaration> ";" <declaration> <opt-end> =:: "" | ";" This describe a CSS block. Valid block can have the form : { property : value } { property : value; } { property : value; property : value } { property : value; property : value; } ... The problem is with the optional ";" at the end, because it overlap with the starting character of {";", declaration}, so when my parser meet a semicolon in this context, it doesn't know what to do. The book talk about this problem, but in its example, the semicolon is obligatory, so the rule can be modified like this : block = "{", declaration, ";", {declaration, ";"}, "}" So, Is it possible to achieve what I'm trying to do using a LL(1) parser?

Read the article
How to parse phpDoc style comment block with php?

- by Reveller

Please consider the following code with which I'm trying to parse only the first phpDoc style comment (noy using any other libraries) in a file (file contents put in $data variable for testing purposes): $data = " /** * @file A lot of info about this file * Could even continue on the next line * @author [email protected] * @version 2010-05-01 * @todo do stuff... */ /** * Comment bij functie bar() * @param Array met dingen */ function bar($baz) { echo $baz; } "; $data = trim(preg_replace('/\r?\n *\* */', ' ', $data)); preg_match_all('/@([a-z]+)\s+(.*?)\s*(?=$|@[a-z]+\s)/s', $data, $matches); $info = array_combine($matches[1], $matches[2]); print_r($info) This almose works, except for the fact that everything after @todo (including the bar() comment block and code) is considered the value of @todo: Array ( [file] => A lot of info about this file Could even continue on the next line [author] => [email protected] [version] => 2010-05-01 [todo] => do stuff... / /** Comment bij functie bar() [param] => Array met dingen / function bar() { echo ; } ) How does my code need to be altered so that only the first comment block is being parsed (in other words: parsing should stop after the first "*/" encountered?

Read the article
How to parse the second child node from xml page in iphone

- by Warrior

I am new to iphone development.I want parse an you-tube XML page and retrieve its contents and display in a RSS feed. my xml page is <entry> <id>xxxxx</id> <title>xxx xxxx xxxx</title> <content>xxxxxxxxxxx</content> <media:group> <media:thumbnail url="http://tiger.jpg"/> </media:group> </entry> To retrieve the content i am using xml parsing. - (void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName attributes:(NSDictionary *)attributeDict{ currentElement = [elementName copy]; if ([elementName isEqualToString:@"entry"]) { entry = [[NSMutableDictionary alloc] init]; currentTitle = [[NSMutableString alloc] init]; currentcontent = [[NSMutableString alloc] init]; } } - (void)parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName{ if ([elementName isEqualToString:@"entry"]) { [entry setObject:currentTitle forKey:@"title"]; [entry setObject:currentDate forKey:@"content"]; [stories addObject:[entry copy]]; }} - (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string{ if ([currentElement isEqualToString:@"title"]) { [currentTitle appendString:string]; } else if ([currentElement isEqualToString:@"content"]) { [currentLink appendString:string]; } } I am able to retrieve id , title and content value and display it in a table-view.How can i retrieve tiger image URL and display it in table-view.Please help me out.Thanks.

Read the article
Pull specific information from a long list with Perl

- by melignus

The file that I've got to work with here is the result of an LDAP extraction but I need to ultimately get the information formatted over to something that a spreadsheet can use. So, the data is as follows: DataDataDataDataDataDataDataDataDataDataDataDataDataDataDataData DataDataDataDataDataDataDataDataDataDataDataDataDataDataDataData displayName: John Doe name: ##userName DataDataDataDataDataDataDataDataDataDataDataDataDataDataDataData DataDataDataDataDataDataDataDataDataDataDataDataDataDataDataData displayName: Jane Doe name: ##userName DataDataDataDataDataDataDataDataDataDataDataDataDataDataDataData DataDataDataDataDataDataDataDataDataDataDataDataDataDataDataData displayName: Ted Doe name: ##userName The format that I need to export to is: firstName lastName userName firstName lastName userName firstName lastName userName Where the spaces are tabs so I can then impor that file into a database. I have experience doing this in VBScript but I'm trying to switch over to using Perl for as much server administration as possible. I'm not sure on the syntax for what I want which is basically while not endoffile{ detect "displayName: " & $firstName & " " & $lastName detect "name: ##" & $userName write $firstName tab $lastName tab $userName to file } Also if someone could point me to a resource specifically on the text parsing syntax that Perl uses, I'd be very grateful. Most of the resources that I've come across haven't been very helpful.

Read the article
Which technology should I use to transform my latex documents into html documents

- by Matthias Günther

Hey, I want to write a little program that transforms my TeX files into HTML. I want to parse the documents and turn the macros (the build-in and of course my own) into HTML pieces. Here are my requirements: predefined rules (e.g. begin{itemize} \item text \end{itemize} = <br> <p>text </p> <br/>) defining own CSS style ability to convert formulars (extract the formulars, load them in an imagecreator and then save the jpg/png) easy to maintain and concise I know there are several technologies out there, but I don't exactly know which is the best for me. Here are the technologies which flow into my mind Ruby (I/O is easy, formular loading via webrat), XML XSLT (I don't think that I need just overhead) perl (there are many libs out there but I'm not quite familiar with it) bash (I worked with sed and was surprised how easy it was to work with regular expressions) latex2html ... (these converters won't work for me and they don't give me freedom in parsing) Any suggestions, hints and comments are welcome. Thanks for your time, folks.

Read the article
Replace string with incremented value

- by Andrei

Hello, I'm trying to write a CSS parser to automatically dispatch URLs in background images to different subdomains in order to parallelize downloads. Basically, I want to replace things like url(/assets/some-background-image.png) with url(http://assets[increment].domain.com/assets/some-background-image.png) I'm using this inside a class that I eventually want to evolve into doing various CSS parsing tasks. Here are the relevant parts of the class : private function parallelizeDownloads(){ static $counter = 1; $newURL = "url(http://assets".$counter.".domain.com"; The counter needs to be reset when it reaches 4 in order to limit to 4 subdomains. if ($counter == 4) { $counter = 1; } $counter ++; return $newURL; } public function replaceURLs() { This is mostly nonsense, but I know the code I'm looking for looks somewhat like this. Note : $this-css contains the CSS string. preg_match("/url/i",$this->css,$match); foreach($match as $URL) { $newURL = self::parallelizeDownloads(); $this->css = str_replace($match, $newURL,$this->css); } }

Read the article
App-Engine Parse a UrlFetch UTF-8 encoded stream

- by Davidrd91

I am trying to parse an XML from a URL using the xml.sax parser. I know there are other libraries to use but coming from Java this is the one I am most familiar with and seems the least complicated to me. The code I'm using to parse is as follows: parser = xml.sax.make_parser() handler = MangaHandler() parser.setContentHandler(handler) url = urlfetch.Fetch('http://www.mangapanda.com/alphabetical', allow_truncated = False, follow_redirects = False, deadline = False) xml.sax.parseString(url.content, handler) This returns a SaxException (invalid token) once the parser reaches the first & sign: SAXParseException: <unknown>:582:34: not well-formed (invalid token) Because urlfetch returns a string and not a stream I cannot use the parse() (which only works with streams) and am left to use parseString() instead. To see if parsing as a stream would fix this I tried: parser.parse(io.StringIO(url.content).encode('utf-8')) but this returns: TypeError: initial_value must be unicode or None, not str I have also tried to use the urllib2 libraries which do return a stream instead of urlfetch but the file is too large and is automatically truncated, leaving me with missing data. Any Sort of work-around for this would be greatly appreciated as I've spent days getting around one obstacle just to be stopped by another.

Read the article
Robust DateTime parser library for .NET

- by Frank Krueger

Hello, I am writing an RSS and Mail reader app in C# (technically MonoTouch). I have run into the issue of parsing DateTimes. I see a lot of variance in how dates are presented in the wild and have begun writing a function like this: public static DateTime ParseTime(string timeStr) { var formats = new string[] { "ddd, d MMM yyyy H:mm:ss \"GMT+00:00\"", "d MMM yyyy H:mm:ss \"EST\"", "yyyy-MM-dd\"T\"HH:mm:ss\"Z\"", "ddd MMM d HH:mm:ss \"+0000\" yyyy", }; try { return DateTime.Parse(timeStr); } catch (Exception) { } foreach (var f in formats) { try { var t = DateTime.ParseExact(timeStr, f, CultureInfo.InvariantCulture); return t; } catch (Exception) { } } return DateTime.MinValue; } This, well, makes me sick. Three points. (1) It's silly of me to think that I can actually collect a format list that will cover everything out there. (2) It's wrong! Notice that I'm treating an EST date time as UTC (since .NET seems oblivious to time zones). (3) I don't like using exceptions for logic. I am looking for an existing library (source only please) that is known to handle a bunch of these formats. Also, I would like to keep using UTC DateTimes throughout my code so whatever library is suggested should be able to produce DateTimes. Is there anything out there like this?

Read the article
How to transform a production to LL(1) grammar for a list separated by a semicolon?

- by Subb

Hi, I'm reading this introductory book on parsing (which is pretty good btw) and one of the exercice is to "build a parser for your favorite language." Since I don't want to die today, I thought I could do a parser for something relatively simple, ie a simplified CSS. Note: This book teach you how to right a LL(1) parser using the recursive-descent algorithm. So, as a sub-exercice, I am building the grammar from what I know of CSS. But I'm stuck on a production that I can't transform in LL(1) : //EBNF block = "{", declaration, {";", declaration}, [";"], "}" //BNF <block> =:: "{" <declaration> "}" <declaration> =:: <single-declaration> <opt-end> | <single-declaration> ";" <declaration> <opt-end> =:: "" | ";" This describe a CSS block. Valid block can have the form : { property : value } { property : value; } { property : value; property : value } { property : value; property : value; } ... The problem is with the optional ";" at the end, because it overlap with the starting character of {";", declaration}, so when my parser meet a semicolon in this context, it doesn't know what to do. The book talk about this problem, but in its example, the semicolon is obligatory, so the rule can be modified like this : block = "{", declaration, ";", {declaration, ";"}, "}" So, Is it possible to achieve what I'm trying to do using a LL(1) parser?

Read the article
Delphi: Transporting Objects to remote computers

- by pr0wl

Hallo. I am writing a tier2 ordering software for network usage. So we have client and server. On the client I create Objects of TBest in which the Product ID, the amount and the user who orders it are saved. (So this is a item of an Order). An order can have multiple items and those are saved in an array to later send the created order to the server. The class that holds the array is called TBestellung. So i created both TBest.toString: string; and TBest.fromString(source: string): TBest; Now, I send the toString result to the server via socket and on the server I create the object using fromString (its parsing the attributes received). This works as intended. Question: Is there a better and more elegant way to do that? Serialisation is a keyword, yes, but isn't that awful / difficult when you serialize an object (TBestellung in this case) that contains an Array of other Objects (TBest in this case)? //Small amendment: Before it gets asked. Yes I should create an extra (static) class for toString and fromString because otherwise the server needs to create an "empty" TBest in order to be able to use fromString.

Read the article
How to cache code in PHP?

- by Janis Peisenieks

I am creating a custom form building system, which includes various tokens. These tokens are found using Regular Expressions, and depending on the type of toke, parsed. Some require simple replacement, some require cycles, and so forth. Now I know, that RegExp is quite resource and time consuming, so I would like to be able to parse the code for the form once, creating a php code, and then save the PHP code, for next uses. How would I go about doing this? So far I have only seen output caching. Is there a way to cache commands like echo and cycles like foreach()? Because of misunderstandings, I'll create an example. Unparsed template data: Thank You for Your interest, [*Title*] [*Firstname*] [*Lastname*]. Here are the details of Your order! [*KeyValuePairs*] Here is the link to Your request: [*LinkToRequest*]. Parsed template: "Thank You for Your interest, <?php echo $data->title;?> <?php echo $data->firstname;?> <?php echo $data->lastname;?>. Here are the details of Your order! <?php foreach($data->values as $key=>$value){ echo $key."-".$value }?> Here is the link to Your request: <?php echo $data->linkToRequest;?>. I would then save the parsed template, and instead of parsing the template every time, just pass the $data variable to the already parsed one, which would generate an output.

Read the article
If Html File Has No Ending "/tr" Tag OR "/td" Tag Then HTML Agility Pack Does Not Read That Informat

- by Harikrishna

I am using HTML Agility Pack to parse html content. I am using parsing to extract table information. It works. But if there is no ending "/tr" tag or "/td" tag then it does not parse that information perfectly.(in which there is no ending tr tag or td tag.) Like <TABLE border=0><TBODY><TR height=20><TD class=xl27boL noWrap width="7%">01890345 </TD> <TD class=xl27boL noWrap width="4%">1416</TD> <TD class=xl27boL noWrap width="7%">kjlkjkls </TD><TD class=xl27boL noWrap width="4%">14:01:57 </TD> <TD class=xl27boL noWrap align=left width="15%">Football</TD><TD class=xl27boL noWrap align=right width="5%"> </TD> <TD class=xl27boL noWrap align=right width="5%">50 </TD> <TD class=xl27boL noWrap align=right width="5%">4997.2500</TD><TD class=xl27boL noWrap align=right width="7%">249862.50 </TD><TD class=xl27boL noWrap align=right width="5%"> </TD><TD class=xl27boL noWrap align=right width="5%"> </TD><TD class=xl27boRLnoWrap align=right width="8%">249612.64 </TD><TD class=xl27boL noWrap align=right width="5%">4997.2500</TD><TD class=xl27boL noWrap align=right width="7%">249862.50 </TD><TD class=xl27boL noWrap align=right width="5%">249.86</TD><TD class=xl27boL noWrap align=right width="5%">4992.2528</TD><TD class=xl27boL noWrap align=right width="5%"> </TD><TD class=xl27boL noWrap align=right width="5%"> </TD> <TD class=xl27boRL noWrap align=right width="8%">249612.64 </TD> </table> So for that what should I do ?

Read the article
Error when feeding a mysql db with a python-parsed data

- by Barnabe

I use this bit of code to feed some data i have parsed from a web page to a mysql database c=db.cursor() c.executemany( """INSERT INTO data (SID, Time, Value1, Level1, Value2, Level2, Value3, Level3, Value4, Level4, Value5, Level5, ObsDate) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)""", clean_data ) The parsed data looks like this (there are several hundred such lines) clean_data = [(161,00:00:00,8.19,1,4.46,4,7.87,4,6.54,null,4.45,6,2010-04-12),(162,00:00:00,7.55,1,9.52,1,1.90,1,4.76,null,0.14,1,2010-04-12),(164,00:00:00,8.01,1,8.09,1,0,null,8.49,null,0.20,2,2010-04-12),(166,00:00:00,8.30,1,4.77,4,10.99,5,9.11,null,0.36,2,2010-04-12)] if i hard code the data as above mySQL accepts my request (except for some quibbles about formatting) but if the variable clean_data is instead defined as the result of the parsing code, like this: cleaner = [(""" $!!'""", ')]'),(' $!!', ') etc etc] def processThis(str,lst): for find, replace in lst: str = str.replace(find, replace) return str clean_data = processThis(data,cleaner) then i get the dreaded "TypeError: not enough arguments for format string" After playing with formatting options for a few hours (I am very new to this) I am confused... what is the difference between the hard coded data and the result of the processThis function as fas as mySQL is concerned? Any idea greatly appreciated...

Read the article
How to deal with unknown entity references?

- by Chris

I'm parsing (a lot of) XML files that contain entity references which i dont know in advance (can't change that fact). For example: xml = "<tag>I'm content with &funny; &entity; &references;.</tag>" when i try to parse this using the following code: final DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); final DocumentBuilder db = dbf.newDocumentBuilder(); final InputSource is = new InputSource(new StringReader(xml)); final Document d = db.parse(is); i get the following exception: org.xml.sax.SAXParseException: The entity "funny" was referenced, but not declared. but, what i do want to achieve is, that the parser replaces every entity that is not declared (unknown to the parser) with an empty String ''. Or even better, is there a way to pass a map to the parser like: Map<String,String> entityMapping = ... entityMapping.put("funny","very"); entityMapping.put("entity","important"); entityMapping.put("references","stuff"); so that i could do the following: final DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); final DocumentBuilder db = dbf.newDocumentBuilder(); final InputSource is = new InputSource(new StringReader(xml)); db.setEntityResolver(entityMapping); final Document d = db.parse(is); if i would obtain the text from the document using this example code i should receive: I'm content with very important stuff. Any suggestions? Of course, i already would be happy to just replace the unknown entity's with empty strings. Thanks,

Read the article
How would I sort files to directories based on filenames?

- by gnomed

I have a huge number of files to sort all named in some terrible convention. Here are some examples: (4)_mr__mcloughlin____.txt 12__sir_john_farr____.txt (b)mr__chope____.txt dame_elaine_kellett-bowman____.txt dr__blackburn______.txt These names are supposed to be a different person (speaker) each. Someone in another IT department produced these from a ton of XML files using some script but the naming is unfathomably stupid as you can see. I need to sort literally tens of thousands of these files with multiple files of text for each person; each with something stupid making the filename different, be it more underscores or some random number. They need to be sorted by speaker. This would be easier with a script to do most of the work then I could just go back and merge folders that should be under the same name or whatever. There are a number of ways I was thinking about doing this. parse the names from each file and sort them into folders for each unique name. get a list of all the unique names from the filenames, then look through this simplified list of unique names for similar ones and ask me whether they are the same, and once it has determined this it will sort them all accordingly. I plan on using Perl, but I can try a new language if it's worth it. I'm not sure how to go about reading in each filename in a directory one at a time into a string for parsing into an actual name. I'm not completely sure how to parse with regex in perl either, but that might be googleable. For the sorting, I was just gonna use the shell command: `cp filename.txt /example/destination/filename.txt` but just cause that's all I know so it's easiest. I dont even have a pseudocode idea of what im going to do either so if someone knows the best sequence of actions, im all ears. I guess I am looking for a lot of help, I am open to any suggestions. Many many many thanks to anyone who can help. B.

Read the article
HTML Purifier: Removing an element conditionally based on its attributes

- by pinkgothic

As per the HTML Purifier smoketest, 'malformed' URIs are occasionally discarded to leave behind an attribute-less anchor tag, e.g. <a href="javascript:document.location='http://www.google.com/'">XSS</a> becomes <a>XSS</a> ...as well as occasionally being stripped down to the protocol, e.g. <a href="http://1113982867/">XSS</a> becomes <a href="http:/">XSS</a> While that's unproblematic, per se, it's a bit ugly. Instead of trying to strip these out with regular expressions, I was hoping to use HTML Purifier's own library capabilities / injectors / plug-ins / whathaveyou. Point of reference: Handling attributes Conditionally removing an attribute in HTMLPurifier is easy. Here the library offers the class HTMLPurifier_AttrTransform with the method confiscateAttr(). While I don't personally use the functionality of confiscateAttr(), I do use an HTMLPurifier_AttrTransform as per this thread to add target="_blank" to all anchors. // more configuration stuff up here $htmlDef = $htmlPurifierConfiguration->getHTMLDefinition(true); $anchor = $htmlDef->addBlankElement('a'); $anchor->attr_transform_post[] = new HTMLPurifier_AttrTransform_Target(); // purify down here HTMLPurifier_AttrTransform_Target is a very simple class, of course. class HTMLPurifier_AttrTransform_Target extends HTMLPurifier_AttrTransform { public function transform($attr, $config, $context) { // I could call $this->confiscateAttr() here to throw away an // undesired attribute $attr['target'] = '_blank'; return $attr; } } That part works like a charm, naturally. Handling elements Perhaps I'm not squinting hard enough at HTMLPurifier_TagTransform, or am looking in the wrong place(s), or generally amn't understanding it, but I can't seem to figure out a way to conditionally remove elements. Say, something to the effect of: // more configuration stuff up here $htmlDef = $htmlPurifierConfiguration->getHTMLDefinition(true); $anchor = $htmlDef->addElementHandler('a'); $anchor->elem_transform_post[] = new HTMLPurifier_ElementTransform_Cull(); // add target as per 'point of reference' here // purify down here With the Cull class extending something that has a confiscateElement() ability, or comparable, wherein I could check for a missing href attribute or a href attribute with the content http:/. HTMLPurifier_Filter I understand I could create a filter, but the examples (Youtube.php and ExtractStyleBlocks.php) suggest I'd be using regular expressions in that, which I'd really rather avoid, if it is at all possible. I'm hoping for an onboard or quasi-onboard solution that makes use of HTML Purifier's excellent parsing capabilities. Returning null in a child-class of HTMLPurifier_AttrTransform unfortunately doesn't cut it. Anyone have any smart ideas, or am I stuck with regexes? :)

Read the article
Haskell: Left-biased/short-circuiting function

- by user2967411

Two classes ago, our professor presented to us a Parser module. Here is the code: module Parser (Parser,parser,runParser,satisfy,char,string,many,many1,(+++)) where import Data.Char import Control.Monad import Control.Monad.State type Parser = StateT String [] runParser :: Parser a -> String -> [(a,String)] runParser = runStateT parser :: (String -> [(a,String)]) -> Parser a parser = StateT satisfy :: (Char -> Bool) -> Parser Char satisfy f = parser $ \s -> case s of [] -> [] a:as -> [(a,as) | f a] char :: Char -> Parser Char char = satisfy . (==) alpha,digit :: Parser Char alpha = satisfy isAlpha digit = satisfy isDigit string :: String -> Parser String string = mapM char infixr 5 +++ (+++) :: Parser a -> Parser a -> Parser a (+++) = mplus many, many1 :: Parser a -> Parser [a] many p = return [] +++ many1 p many1 p = liftM2 (:) p (many p) Today he gave us an assignment to introduce "a left-biased, or short-circuiting version of (+++)", called (<++). His hint was for us to consider the original implementation of (+++). When he first introduced +++ to us, this was the code he wrote, which I am going to call the original implementation: infixr 5 +++ (+++) :: Parser a -> Parser a -> Parser a p +++ q = Parser $ \s -> runParser p s ++ runParser q s I have been having tons of trouble since we were introduced to parsing and so it continues. I have tried/am considering two approaches. 1) Use the "original" implementation, as in p +++ q = Parser $ \s - runParser p s ++ runParser q s 2) Use the final implementation, as in (+++) = mplus Here are my questions: 1) The module will not compile if I use the original implementation. The error: Not in scope: data constructor 'Parser'. It compiles fine using (+++) = mplus. What is wrong with using the original implementation that is avoided by using the final implementation? 2) How do I check if the first Parser returns anything? Is something like (not (isNothing (Parser $ \s - runParser p s) on the right track? It seems like it should be easy but I have no idea. 3) Once I figure out how to check if the first Parser returns anything, if I am to base my code on the final implementation, would it be as easy as this?: -- if p returns something then p <++ q = mplus (Parser $ \s -> runParser p s) mzero -- else (<++) = mplus Best, Jeff

Read the article
Parse XML and populate in List Box

- by cedar715

I've posted the same question here and I've also got couple of good answers as well. While I was trying the same answers, I was getting compilation errors. Later I got to know that we are using .NET 2.0 and our existing application has no references to LINQ files. After searching in SO, i tried to figured out partly: public partial class Item { public object CHK { get; set; } public int SEL { get; set; } public string VALUE { get; set; } } Parsing: XmlDocument doc = new XmlDocument(); doc.LoadXml("<LISTBOX_ST> <item><CHK></CHK><SEL>00001</SEL><VALUE>val01</VALUE></item> <item><CHK></CHK><SEL>00002</SEL><VALUE>val02</VALUE></item> <item><CHK></CHK><SEL>00003</SEL><VALUE>val03</VALUE></item> <item><CHK></CHK><SEL>00004</SEL><VALUE>val04</VALUE></item> <item><CHK></CHK><SEL>00005</SEL><VALUE>val05</VALUE></item> </LISTBOX_ST>"); List<Item> _lbList = new List<Item>(); foreach (XmlNode node in doc.DocumentElement.ChildNodes) { string text = node.InnerText; //or loop through its children as well //HOW - TO - POPULATE THE ITEM OBJECT ?????? } listBox1.DataSource = _lbList; listBox1.DisplayMember = "VALUE"; listBox1.ValueMember = "SEL"; How to read two child nodes - SEL and VALUE of node and populate the same in the new Item DTO??

Read the article
next line character a huge influence on xmlparser?

- by jovany

I have question about a basic xml file I'm parsing and just putting in simple nextlines(Enters). I'll try to explain my problem with this next example. I'm( still) building an xml tree and all it has to do ( this is a testtree ) is put the summary in an itemlist. I then export it to a plist so I can see if everything is done correctly. A method that does this is in the parser which looks like this if([elementName isEqualToString:@"Book"]) { [appDelegate.books addObject:aBook]; [aBook release]; aBook = nil; } else { [aBook setValue:currentElementValue forKey:elementName]; NSString *directions = [NSString stringWithFormat:currentElementValue]; [directionTree = setObject:directions forKey:@"directions"]; } [currentElementValue release]; currentElementValue = nil; } the export for the plistfile happens at the endtag of books. Below is the first xmlfile <?xml version="1.0" encoding="UTF-8"?> <Books><Book id="1"><summary>Ero adn the ancient quest to measure the globe.</summary></Book><Book id="2"><summary>how the scientific revolution began.</summary></Book></Books> This is my output http://img139.imageshack.us/img139/9175/picture6rtn.png If I make some adjustments like here <?xml version="1.0" encoding="UTF-8"?> <Books><Book id="1"> <summary>Ero adn the ancient quest to measure the globe.</summary> </Book> <Book id="2"> <summary>how the scientific revolution began.</summary> </Book> </Books> My directions key with type string remains empty... http://img248.imageshack.us/img248/5838/picture7y.png I never knew that if I just put in an enter it would have such an influence. Does anyone know a solution to this since my real xml file looks like this. ps. the funny thing is I can actually see ( when debugging)my directions string (NSString directions ) fill up with the currentElementValue in both cases.

Read the article
How to read XML parent node tag value

- by kaibuki

HI Guys, I have a java code to read XML nodes, I want to add in the addition and want to read the parent node value also. my XML file sample is below: <breakfast_menu><food id=1><name> Belgian Waffles </name><price> $5.95 </price><description> two of our famous Belgian Waffles with plenty of real maple syrup </description><calories> 650 </calories></food><food id=2><name>Strawberry Belgian waffles</name><price>$7.95</price><description>light Belgian waffles covered with strawberries and whipped cream</description><calories>900</calories></food></breakfast_menu> and my code for parsing xml is : public static String getProductItem(String pid, String item) { try { url = new URL(""); urlConnection = url.openConnection(); } catch (MalformedURLException e) { e.printStackTrace(); } catch (IOException e) { } try { dBuilder = dbFactory.newDocumentBuilder(); } catch (ParserConfigurationException e) { } try { doc = dBuilder.parse(urlConnection.getInputStream()); } catch (SAXException e) { } catch (IOException e) { } doc.getDocumentElement().normalize(); NodeList nList = doc.getElementsByTagName("food"); for (int temp = 0; temp < nList.getLength(); temp++) { Node nNode = nList.item(temp); if (nNode.getNodeType() == Node.ELEMENT_NODE) { Element eElement = (Element) nNode; data = getTagValue(item, eElement); } } doc = null; dBuilder = null; return data; } private static String getTagValue(String sTag, Element eElement) { NodeList nlList = eElement.getElementsByTagName(sTag).item(0) .getChildNodes(); Node nValue = (Node) nlList.item(0); return nValue.getNodeValue(); } What I want to do is to read the "id" value of food, so if if I am searching for a food, it only checks those food nodes, whose id matched the food node id. thanks in advance. cheers.. kai

Read the article
How can I create the XML::Simple data structure using a Perl XML SAX parser?

- by DVK

Summary: I am looking a fast XML parser (most likely a wrapper around some standard SAX parser) which will produce per-record data structure 100% identical to those produced by XML::Simple. Details: We have a large code infrastructure which depends on processing records one-by-one and expects the record to be a data structure in a format produced by XML::Simple since it always used XML::Simple since early Jurassic era. An example simple XML is: <root> <rec><f1>v1</f1><f2>v2</f2></rec> <rec><f1>v1b</f1><f2>v2b</f2></rec> <rec><f1>v1c</f1><f2>v2c</f2></rec> </root> And example rough code is: sub process_record { my ($obj, $record_hash) = @_; # do_stuff } my $records = XML::Simple->XMLin(@args)->{root}; foreach my $record (@$records) { $obj->process_record($record) }; As everyone knows XML::Simple is, well, simple. And more importantly, it is very slow and a memory hog—due to being a DOM parser and needing to build/store 100% of data in memory. So, it's not the best tool for parsing an XML file consisting of large amount of small records record-by-record. However, re-writing the entire code (which consist of large amount of "process_record"-like methods) to work with standard SAX parser seems like an big task not worth the resources, even at the cost of living with XML::Simple. I'm looking for an existing module which will probably be based on a SAX parser (or anything fast with small memory footprint) which can be used to produce $record hashrefs one by one based on the XML pictured above that can be passed to $obj->process_record($record) and be 100% identical to what XML::Simple's hashrefs would have been. I don't care much what the interface of the new module is; e.g whether I need to call next_record() or give it a callback coderef accepting a record.

Read the article
Python: How best to parse a simple grammar?

- by Rosarch

Ok, so I've asked a bunch of smaller questions about this project, but I still don't have much confidence in the designs I'm coming up with, so I'm going to ask a question on a broader scale. I am parsing pre-requisite descriptions for a course catalog. The descriptions almost always follow a certain form, which makes me think I can parse most of them. From the text, I would like to generate a graph of course pre-requisite relationships. (That part will be easy, after I have parsed the data.) Some sample inputs and outputs: "CS 2110" => ("CS", 2110) # 0 "CS 2110 and INFO 3300" => [("CS", 2110), ("INFO", 3300)] # 1 "CS 2110, INFO 3300" => [("CS", 2110), ("INFO", 3300)] # 1 "CS 2110, 3300, 3140" => [("CS", 2110), ("CS", 3300), ("CS", 3140)] # 1 "CS 2110 or INFO 3300" => [[("CS", 2110)], [("INFO", 3300)]] # 2 "MATH 2210, 2230, 2310, or 2940" => [[("MATH", 2210), ("MATH", 2230), ("MATH", 2310)], [("MATH", 2940)]] # 3 If the entire description is just a course, it is output directly. If the courses are conjoined ("and"), they are all output in the same list If the course are disjoined ("or"), they are in separate lists Here, we have both "and" and "or". One caveat that makes it easier: it appears that the nesting of "and"/"or" phrases is never greater than as shown in example 3. What is the best way to do this? I started with PLY, but I couldn't figure out how to resolve the reduce/reduce conflicts. The advantage of PLY is that it's easy to manipulate what each parse rule generates: def p_course(p): 'course : DEPT_CODE COURSE_NUMBER' p[0] = (p[1], int(p[2])) With PyParse, it's less clear how to modify the output of parseString(). I was considering building upon @Alex Martelli's idea of keeping state in an object and building up the output from that, but I'm not sure exactly how that is best done. def addCourse(self, str, location, tokens): self.result.append((tokens[0][0], tokens[0][1])) def makeCourseList(self, str, location, tokens): dept = tokens[0][0] new_tokens = [(dept, tokens[0][1])] new_tokens.extend((dept, tok) for tok in tokens[1:]) self.result.append(new_tokens) For instance, to handle "or" cases: def __init__(self): self.result = [] # ... self.statement = (course_data + Optional(OR_CONJ + course_data)).setParseAction(self.disjunctionCourses) def disjunctionCourses(self, str, location, tokens): if len(tokens) == 1: return tokens print "disjunction tokens: %s" % tokens How does disjunctionCourses() know which smaller phrases to disjoin? All it gets is tokens, but what's been parsed so far is stored in result, so how can the function tell which data in result corresponds to which elements of token? I guess I could search through the tokens, then find an element of result with the same data, but that feel convoluted... What's a better way to approach this problem?

Read the article

Search Results

Search found 4222 results on 169 pages for 'dtd parsing'.

Page 39/169 | < Previous Page | 35 36 37 38 39 40 41 42 43 44 45 46 | Next Page >

- by Jake M

- by Rosarch

- by Greg

- by Subb

- by Reveller

- by Warrior

- by melignus

- by Matthias Günther

- by Andrei

- by Davidrd91

- by Frank Krueger

- by Subb

- by pr0wl

- by Janis Peisenieks

- by Harikrishna

- by Barnabe

- by Chris

- by gnomed

- by pinkgothic

- by user2967411

- by cedar715

- by jovany

- by kaibuki

- by DVK

- by Rosarch

< Previous Page | 35 36 37 38 39 40 41 42 43 44 45 46 | Next Page >