dtd parsing - Page 15 - Developer IT

Generalized Bottom up Parser Combinators in Haskell

- by Panini Sai

I am wondered why there is no generalized parser combinators for Bottom-up parsing in Haskell like a Parsec combinators for top down parsing. ( I could find some research work went during 2004 but nothing after https://haskell-functional-parsing.googlecode.com/files/Ljunglof-2002a.pdf http://www.di.ubi.pt/~jpf/Site/Publications_files/technicalReport.pdf ) Is there any specific reason for not achieving it?

Read the article

Idea of an algorithm to detect a website's navigation structure?

- by Uwe Keim

Currently I am in the process of developing an importer of any existing, arbitrary (static) HTML website into the upcoming release of our CMS. While the downloading the files is solved successfully, I'm pulling my hair off when it comes to detect a site structure (pages and subpages) purely from the HTML files, without the user specifying additional hints. Basically I want to get a tree like: + Root page 1 + Child page 1 + Child page 2 + Child child page1 + Child page 3 + Root page 2 + Child page 4 + Root page 3 + ... I.e. I want to be able to detect the menu structure from the links inside the pages. This has not to be 100% accurate, but at least I want to achieve more than just a flat list. I thought of looking at multiple pages to see similar areas and identify these as menu areas and parse the links there, but after all I'm not that satisfied with this idea. My question: Can you imagine any algorithm when it comes to detecting such a structure? Update 1: What I'm looking for is not a web spider, but an algorithm do create a logical tree of the relationship of the pages to be able to create pages and subpages inside my CMS when importing them. Update 2: As of Robert's suggestion I'll solve this by starting at the root page, and then simply parse links as you go and treat every link inside a page simply as a child page. Probably I'll recurse not in a deep-first manner but rather in a breadth-first manner to get a more balanced navigation structure.

Read the article

How should I implement a command processing application?

- by Nini Michaels

I want to make a simple, proof-of-concept application (REPL) that takes a number and then processes commands on that number. Example: I start with 1. Then I write "add 2", it gives me 3. Then I write "multiply 7", it gives me 21. Then I want to know if it is prime, so I write "is prime" (on the current number - 21), it gives me false. "is odd" would give me true. And so on. Now, for a simple application with few commands, even a simple switch would do for processing the commands. But if I want extensibility, how would I need to implement the functionality? Do I use the command pattern? Do I build a simple parser/interpreter for the language? What if I want more complex commands, like "multiply 5 until >200" ? What would be an easy way to extend it (add new commands) without recompiling? Edit: to clarify a few things, my end goal would not be to make something similar to WolframAlpha, but rather a list (of numbers) processor. But I want to start slowly at first (on single numbers). I'm having in mind something similar to the way one would use Haskell to process lists, but a very simple version. I'm wondering if something like the command pattern (or equivalent) would suffice, or if I have to make a new mini-language and a parser for it to achieve my goals?

Read the article

Persisting NLP parsed data

- by tjb1982

I've recently started experimenting with NLP using Stanford's CoreNLP, and I'm wondering what are some of the standard ways to store NLP parsed data for something like a text mining application? One way I thought might be interesting is to store the children as an adjacency list and make good use of recursive queries (postgres supports this and I've found it works really well). Something like this: Component ( id, POS, parent_id ) Word ( id, raw, lemma, POS, NER ) CW_Map ( component_id, word_id, position int ) But I assume there are probably many standard ways to do this depending on what kind of analysis is being done that have been adopted by people working in the field over the years. So what are the standard persistence strategies for NLP parsed data and how are they used?

Read the article

How can I best manage making open source code releases from my company's confidential research code?

- by DeveloperDon

My company (let's call them Acme Technology) has a library of approximately one thousand source files that originally came from its Acme Labs research group, incubated in a development group for a couple years, and has more recently been provided to a handful of customers under non-disclosure. Acme is getting ready to release perhaps 75% of the code to the open source community. The other 25% would be released later, but for now, is either not ready for customer use or contains code related to future innovations they need to keep out of the hands of competitors. The code is presently formatted with #ifdefs that permit the same code base to work with the pre-production platforms that will be available to university researchers and a much wider range of commercial customers once it goes to open source, while at the same time being available for experimentation and prototyping and forward compatibility testing with the future platform. Keeping a single code base is considered essential for the economics (and sanity) of my group who would have a tough time maintaining two copies in parallel. Files in our current base look something like this: > // Copyright 2012 (C) Acme Technology, All Rights Reserved. > // Very large, often varied and restrictive copyright license in English and French, > // sometimes also embedded in make files and shell scripts with varied > // comment styles. > > > ... Usual header stuff... > > void initTechnologyLibrary() { > nuiInterface(on); > #ifdef UNDER_RESEARCH > holographicVisualization(on); > #endif > } And we would like to convert them to something like: > // GPL Copyright (C) Acme Technology Labs 2012, Some rights reserved. > // Acme appreciates your interest in its technology, please contact [email protected] > // for technical support, and www.acme.com/emergingTech for updates and RSS feed. > > ... Usual header stuff... > > void initTechnologyLibrary() { > nuiInterface(on); > } Is there a tool, parse library, or popular script that can replace the copyright and strip out not just #ifdefs, but variations like #if defined(UNDER_RESEARCH), etc.? The code is presently in Git and would likely be hosted somewhere that uses Git. Would there be a way to safely link repositories together so we can efficiently reintegrate our improvements with the open source versions? Advice about other pitfalls is welcome.

Read the article

What is the simplest human readable configuration file format?

- by Juha

Current configuration file is as follows: mainwindow.title = 'test' mainwindow.position.x = 100 mainwindow.position.y = 200 mainwindow.button.label = 'apply' mainwindow.button.size.x = 100 mainwindow.button.size.y = 30 logger.datarate = 100 logger.enable = True logger.filename = './test.log' This is read with python to a nested dictionary: { 'mainwindow':{ 'button':{ 'label': {'value':'apply'}, ... }, 'logger':{ datarate: {'value': 100}, enable: {'value': True}, filename: {'value': './test.log'} }, ... } Is there a better way of doing this? The idea is to get XML type of behavior and avoid XML as long as possible. The end user is assumed almost totally computer illiterate and basically uses notepad and copy-paste. Thus the python standard "header + variables" type is considered too difficult. The dummy user edits the config file, able programmers handle the dictionaries. Nested dictionary is chosen for easy splitting (logger does not need or even cannot have/edit mainwindow parameters).

Read the article

What is this algorithm for converting strings into numbers called?

- by CodexArcanum

I've been doing some work in Parsec recently, and for my toy language I wanted multi-based fractional numbers to be expressible. After digging around in Parsec's source a bit, I found their implementation of a floating-point number parser, and copied it to make the needed modifications. So I understand what this code does, and vaguely why (I haven't worked out the math fully yet, but I think I get the gist). But where did it come from? This seems like a pretty clever way to turn strings into floats and ints, is there a name for this algorithm? Or is it just something basic that's a hole in my knowledge? Did the folks behind Parsec devise it? Here's the code, first for integers: number' :: Integer -> Parser Integer number' base = do { digits <- many1 ( oneOf ( sigilRange base )) ; let n = foldl (\x d -> base * x + toInteger (convertDigit base d)) 0 digits ; seq n (return n) } So the basic idea here is that digits contains the string representing the whole number part, ie "192". The foldl converts each digit individually into a number, then adds that to the running total multiplied by the base, which means that by the end each digit has been multiplied by the correct factor (in aggregate) to position it. The fractional part is even more interesting: fraction' :: Integer -> Parser Double fraction' base = do { digits <- many1 ( oneOf ( sigilRange base )) ; let base' = fromIntegral base ; let f = foldr (\d x -> (x + fromIntegral (convertDigit base d))/base') 0.0 digits ; seq f (return f) Same general idea, but now a foldr and using repeated division. I don't quite understand why you add first and then divide for the fraction, but multiply first then add for the whole. I know it works, just haven't sorted out why. Anyway, I feel dumb not working it out myself, it's very simple and clever looking at it. Is there a name for this algorithm? Maybe the imperative version using a loop would be more familiar?

Read the article

How to add precedence to LALR parser like in YACC?

- by greenoldman

Please note, I am asking about writing LALR parser, not writing rules for LALR parser. What I need is... ...to mimic YACC precedence definitions. I don't know how it is implemented, and below I describe what I've done and read so far. For now I have basic LALR parser written. Next step -- adding precedence, so 2+3*4 could be parsed as 2+(3*4). I've read about precedence parsers, however I don't see how to fit such model into LALR. I don't understand two points: how to compute when insert parenthesis generator how to compute how many parenthesis the generator should create I insert generators when the symbols is taken from input and put at the stack, right? So let's say I have something like this (| denotes boundary between stack and input): ID = 5 | + ..., at this point I add open, so it gives ID = < 5 | + ..., then I read more input ID = < 5 + | 5 ... and more ID = < 5 + 5 | ; ... and more ID = < 5 + 5 ; | ... At this point I should have several reduce moves in normal LALR, but the open parenthesis does not match so I continue reading more input. Which does not make sense. So this was when problem. And about count, let's say I have such data < 2 + < 3 * 4 >. As human I can see that the last generator should create 2 parenthesis, but how to compute this? After all there could be two scenarios: ( 2 + ( 3 *4 )) -- parenthesis is used to show the outcome of generator or (2 + (( 3 * 4 ) ^ 5) because there was more input Please note that in both cases before 3 was open generator, and after 4 there was close generator. However in both cases, after reading 4 I have to reduce, so I have to know what generator "creates".

Read the article

Persisting natural language processing parsed data

- by tjb1982

I've recently started experimenting with natural language processing (NLP) using Stanford's CoreNLP, and I'm wondering what are some of the standard ways to store NLP parsed data for something like a text mining application? One way I thought might be interesting is to store the children as an adjacency list and make good use of recursive queries (Postgres supports this and I've found it works really well). But I assume there are probably many standard ways to do this depending on what kind of analysis is being done that have been adopted by people working in the field over the years. So what are the standard persistence strategies for NLP parsed data and how are they used?

Read the article

Custom Dreamweaver DocTypes

- by Hugh Guiney

Dreamweaver CS5 with Dreamweaver HTML5 Pack 1.2.7 Windows 7 x64 When I go to create a new document and select the HTML5 DocType, Dreamweaver gives me the legacy encoding/character set declaration: <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> I want to replace it with the new, abbreviated style: <meta charset="utf-8"> The relevant file seems to be %ProgramFiles(x86)%\Adobe\Adobe Dreamweaver CS5\configuration\DocumentTypes\NewDocuments\Default.html, which has a blank charset, that is then apparently replaced with the appropriate character set dynamically: <meta http-equiv="Content-Type" content="text/html; charset="> I changed it, but then new documents show up like this: <meta charset=""> <title>Untitled Document</title> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> It seems Dreamweaver added the legacy declaration back in after my modification—and as far as I can tell, there's no way to specify that the charset definition should go in-between the quotes, either. Additionally, any modifications to Default.html apply to every DocType, whereas I only want this change to apply to the HTML5 DocType. Is there anything in the configuration files that would allow me to make any of these customizations? If not, is there an extension that does it?

Read the article

LL(8) and left-recursion

- by Peregring-lk

I want to understand the relation between LL/LR grammars and the left-recursion problem (for any question I know parcially the answer, but I ask them as I don't know nothing, because I am a little confused now, and prefer complete answers) I'm happy with sintetized or short and direct answers (or just links solving it unambiguously): What type of language isn't LL(8) languages? LL(K) and LL(8) have problems with left-recursion? Or only LL(k) parsers? LALR(1) parser have troubles with left or right recursion? What type of troubles? Only in terms of the LL/LALR comparision. What is better, Bison (LALR(1)) or Boost.Spirit (LL(8))? (Let's suppose other features of them are irrelevant in this question) Why GCC use a (hand-made) LL(8) parser? Only for the "handling-error" problem?

Read the article

Generating Wrappers for REST APIs

- by Kyle

Would it be feasible to generate wrappers for REST APIs? An earlier question asked about machine readable descriptions of RESTful services addressed how we could write (and then read) API specifications in a standardized way which would lend itself well to generated wrappers. Could a first pass parser generate a decent wrapper that human intervention could fix up? Perhaps the first pass wouldn't be consistent, but would remove a lot of the grunt work and make it easy to flesh out the rest of the API and types. What would need to be considered? What's stopping people from doing this? Has it already been done and my google fu is weak for the day?

Read the article

Getting data from a webpage in a stable and efficient way

- by Mike Heremans

Recently I've learned that using a regex to parse the HTML of a website to get the data you need isn't the best course of action. So my question is simple: What then, is the best / most efficient and a generally stable way to get this data? I should note that: There are no API's There is no other source where I can get the data from (no databases, feeds and such) There is no access to the source files. (Data from public websites) Let's say the data is normal text, displayed in a table in a html page I'm currently using python for my project but a language independent solution/tips would be nice. As a side question: How would you go about it when the webpage is constructed by Ajax calls?

Read the article

Template syntax for users - is there a right way to do it?

- by RickM

Ok, I'm in the middle of building a saas system, and as part of that, the hosted clients need to be able to edit certain layout templates, baqsically just html, css and javascript files. I'm obviously going to be wanting to use a template syntax here as it would be dumb to let people execute PHP code, so in this instance template syntax does need to be used. I know that in the grand scale of things, this is a very minor thing, but what template syntax do you use, and why? Is there one that's considered better than others? I've seen all sorts being used with no real consistency, for example: Smarty Style: {$someVar} {foreach from="foo" item="bar"} {$bar.food} {/foreach} ASP Style: {% someVar %} {% foreach foo as bar %} {% bar.food %} {% endforeach %} HTML Style: <someVar> <foreach from="foo" item="bar"> <bar:food> </foreach> PyroCMS/FuelPHP "LEX" Style: {{ someVar }} {{ foreach from="foo" item="bar" }} {{ bar:food }} {{ endforeach }} Obviously these arent 100% accurate (for example, LEX is used alongside PHP for loops), and are only to give you an example of what I mean. What, in your opinion would be the best one (if any) to go with. I ask this bearing in mind that people using this are likely to be novice users. I did look around at a bunch of hosted CMS and E-Commerce systems as these seem to make use of user-editable templates, and most seem to be using some form of their own syntax. I should note that whatever style I end up going with, it will be with a custom template handler due to the complexity of the system and how template files are stored. Plus I'd not want to touch the likes of Smarty with a barge pole!

Read the article

First and Follow Sets for a Grammar

- by Aimee Jones

I'm studying for a Compiler Construction module I'm doing and I have a sample question as follows: Calculate the FIRST and FOLLOW sets for the following grammar.. S -> uBDz B -> Bv B -> w D -> EF E -> y E -> e F -> x F -> e I have tried to figure it out so far but I'm a bit unsure if I'm correct. Could someone verify if I'm doing it right, and if not, what am I missing? My answer is below: FIRST | FOLLOW S | {u} | {$} B | {w} | {y,x,v,z} D | {y,e,x} | {z} E | {y,e} | {x,z} F | {x,e} | {z}

Read the article

Basic questions while making a toy calculator

- by Jwan622

I am making a calculator to better understand how to program and I had a question about the following lines of code: I wanted to make my equals sign with this C# code: private void btnEquals_Click(object sender, EventArgs e) { if (plusButtonClicked == true) { total2 = total1 + Convert.ToDouble(txtDisplay.Text); //double.Parse(txtDisplay.Text); } else if (minusButtonClicked == { total2 = total1 - double.Parse(txtDisplay.Text) } } txtDisplay.Text = total2.ToString(); total1 = 0; However, my friend said this way of writing code was superior, with changes in the minus sign. private void btnEquals_Click(object sender, EventArgs e) { if (plusButtonClicked == true) { total2 = total1 + Convert.ToDouble(txtDisplay.Text); //double.Parse(txtDisplay.Text); } else if (minusButtonClicked == true) { double d1; if(double.TryParse(txtDisplay.Text, out d1)) { total2 = total1 - d1; } } txtDisplay.Text = total2.ToString(); total1 = 0; My questions: 1) What does the "out d1" section of this minus sign code mean? 2) My assumption here is that the "TryParse" code results in fewer systems crashes? If I just use "Double.Parse" and I don't put anything in the textbox, the program will crash sometimes right?

Read the article

Making a sldprt to PDB file converter?

- by user122083

I wanted to create a parser that can read a solidworks file and turn it into a protein data bank file. This has already been done in a program called DiamondCAD. http://www.zyvex.com/Research/DiamondCAD.html I waant to make a parser that can parse the data and then visualize it the same way as DiamondCAD. I have downloaded and opened solidworks files before and they make no sense to me with never before seen symbols and looks like ancient writing. Does anyone know how a sldprt. file is structured and how it can be parsed into a PDB file? (A software called VMD converts a PDB to Obj. file so it is proof of concept)

Read the article

Parsing XHTML with inline tags

- by user290796

Hi, I'm trying to parse an XHTML document using TBXML on the iPhone (although I would be happy to use either libxml2 or NSXMLParser if it would be easier). I need to extract the content of the body as a series of paragraphs and maintain the inline tags, for example: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"> <head> <title>Title</title> <link rel="stylesheet" href="css/style.css" type="text/css"/> <meta http-equiv="Content-Type" content="application/xhtml+xml; charset=utf-8"/> </head> <body> <div class="body"> <div> <h3>Title</h3> <p>Paragraph with <em>inline</em> tags</p> <img src="image.png" /> </div> </div> </body> </html> I need to extract the paragraph but maintain the <em>inline</em> content with the paragraph, all my testing so far has extracted that as a subelement without me knowing exactly where it fitted in the paragraph. Can anyone suggest a way to do this? Thanks.

Read the article

Parsing cookie headers in J2ME/BlackBerry apps

- by Marc Novakowski

When using an HttpConnection in a BlackBerry app, you often get HTTP cookies in the response headers. Unfortunately there are no built-in APIs to assist with the parsing of the cookie headers. Has anyone found a third-party library to assist with the parsing of the cookie header(s) into a more useful data object? Creating some custom code that just parses out the name and value of the cookie isn't too difficult, but I'd like to also consider other fields within the cookie such as the expiration and domain fields.

Read the article

iPhone SDK:HTML parsing standard way or example:

- by Clave Martin

Hi, I am accessing a web page using NSURLConnection and have a HTML data downloaded programmatically in my iPhone client application. I want to parse and pick some description data from the HTML tags..It is too dirty tags and my data is also here and there. I would like to ask you, is there a standard or easy way of parsing of HTML data on iPhone development. P.S: I know about XML parsing. thanks. Clave/

Read the article

Parsing date in Pascal

- by mnn

Hello, what's the better way of parsing date in Pascal than manually parsing character after character? Date should be in mm.dd.yyyy format.

Read the article

query about xml parsing

- by shishir.bobby

hi all ,i just want to knw,is there any boundations in xml parsing with characters like can we parse a word containing some characters like "frühe" containing "ü" "böser" containing "ö" while i am parsing my xml,which is few different languages, some characters are like the above. and wen i saw in console, it get interpted,exaactly wen it reacher "ü" becoz at console it prints "fr" so can someone provide me some ideas about this thing regards shishir

Read the article

jquery-like HTML parsing in Python?

- by Roy Tang

Is there any Python library that allows me to parse an HTML document similar to what jQuery does? i.e. I'd like to be able to use CSS selector syntax to grab an arbitrary set of nodes from the document, read their content/attributes, etc. The only Python HTML parsing lib I've used before was BeautifulSoup, and even though it's fine I keep thinking it would be faster to do my parsing if I had jQuery syntax available. :D

Read the article

Problems Running Cherokee Web Server Admin - config_reader.c:249 - Parsing error

- by Sebastian

I'm running Cherokee web server 0.99.30 on (Ubuntu Hardy) and I have been having some issues getting the admin to run property. When I run sudo cherokee-admin -b Login: User: admin One-time Password: {password} Web Interface: URL: http://localhost:9090/ [20/11/2009 22:57:29.733] (error) config_reader.c:249 - Parsing error Cherokee Web Server 0.99.30 (Nov 20 2009): Listening on port ALL:9090, TLS disabled, IPv6 disabled, using epoll, 4096 fds system limit, max. 2041 connections, caching I/O, single thread When I go to the admin page I get a 503 Service Unavailable error page. Any idea about how I could fix this? Thanks

Read the article

iPhone and Mac Twitter App Not Parsing HTML in Feeds

- by otakrosak

I've come across this problem where my tweets on the iPhone and Mac Twitter app are not parsing the HTML properly. In any browser, the HTML is parsed fine. It only happens to the apostrophe character (') and on the iPhone and Mac Twitter app. Also, I'm using the dlvr.it service to push my Drupal blog posts to my Twitter feed. My initial guess was that it was the RSS generated by Drupal, but if that were the case, then the feed displayed on the browser would be affected too. Any ideas anyone? Any help would be very much appreciated. P.S: I apologize beforehand if this question already exists. I did search for it but based on what I queried, nothing came up.

Search Results

Search found 4222 results on 169 pages for 'dtd parsing'.

Page 15/169 | < Previous Page | 11 12 13 14 15 16 17 18 19 20 21 22 | Next Page >

- by Panini Sai

- by Uwe Keim

- by Nini Michaels

- by tjb1982

- by DeveloperDon

- by Juha

- by CodexArcanum

- by greenoldman

- by tjb1982

- by Hugh Guiney

- by Peregring-lk

- by Kyle

- by Mike Heremans

- by RickM

- by Aimee Jones

- by Jwan622

- by user122083

- by user290796

- by Marc Novakowski

- by Clave Martin

- by mnn

- by shishir.bobby

- by Roy Tang

- by Sebastian

- by otakrosak

< Previous Page | 11 12 13 14 15 16 17 18 19 20 21 22 | Next Page >