Search Results

Search found 37381 results on 1496 pages for 'string parsing'.

Page 66/1496 | < Previous Page | 62 63 64 65 66 67 68 69 70 71 72 73 | Next Page >

Idea of an algorithm to detect a website's navigation structure?

- by Uwe Keim

Currently I am in the process of developing an importer of any existing, arbitrary (static) HTML website into the upcoming release of our CMS. While the downloading the files is solved successfully, I'm pulling my hair off when it comes to detect a site structure (pages and subpages) purely from the HTML files, without the user specifying additional hints. Basically I want to get a tree like: + Root page 1 + Child page 1 + Child page 2 + Child child page1 + Child page 3 + Root page 2 + Child page 4 + Root page 3 + ... I.e. I want to be able to detect the menu structure from the links inside the pages. This has not to be 100% accurate, but at least I want to achieve more than just a flat list. I thought of looking at multiple pages to see similar areas and identify these as menu areas and parse the links there, but after all I'm not that satisfied with this idea. My question: Can you imagine any algorithm when it comes to detecting such a structure? Update 1: What I'm looking for is not a web spider, but an algorithm do create a logical tree of the relationship of the pages to be able to create pages and subpages inside my CMS when importing them. Update 2: As of Robert's suggestion I'll solve this by starting at the root page, and then simply parse links as you go and treat every link inside a page simply as a child page. Probably I'll recurse not in a deep-first manner but rather in a breadth-first manner to get a more balanced navigation structure.

Read the article
How should I implement a command processing application?

- by Nini Michaels

I want to make a simple, proof-of-concept application (REPL) that takes a number and then processes commands on that number. Example: I start with 1. Then I write "add 2", it gives me 3. Then I write "multiply 7", it gives me 21. Then I want to know if it is prime, so I write "is prime" (on the current number - 21), it gives me false. "is odd" would give me true. And so on. Now, for a simple application with few commands, even a simple switch would do for processing the commands. But if I want extensibility, how would I need to implement the functionality? Do I use the command pattern? Do I build a simple parser/interpreter for the language? What if I want more complex commands, like "multiply 5 until >200" ? What would be an easy way to extend it (add new commands) without recompiling? Edit: to clarify a few things, my end goal would not be to make something similar to WolframAlpha, but rather a list (of numbers) processor. But I want to start slowly at first (on single numbers). I'm having in mind something similar to the way one would use Haskell to process lists, but a very simple version. I'm wondering if something like the command pattern (or equivalent) would suffice, or if I have to make a new mini-language and a parser for it to achieve my goals?

Read the article
Persisting NLP parsed data

- by tjb1982

I've recently started experimenting with NLP using Stanford's CoreNLP, and I'm wondering what are some of the standard ways to store NLP parsed data for something like a text mining application? One way I thought might be interesting is to store the children as an adjacency list and make good use of recursive queries (postgres supports this and I've found it works really well). Something like this: Component ( id, POS, parent_id ) Word ( id, raw, lemma, POS, NER ) CW_Map ( component_id, word_id, position int ) But I assume there are probably many standard ways to do this depending on what kind of analysis is being done that have been adopted by people working in the field over the years. So what are the standard persistence strategies for NLP parsed data and how are they used?

Read the article
How can I best manage making open source code releases from my company's confidential research code?

- by DeveloperDon

My company (let's call them Acme Technology) has a library of approximately one thousand source files that originally came from its Acme Labs research group, incubated in a development group for a couple years, and has more recently been provided to a handful of customers under non-disclosure. Acme is getting ready to release perhaps 75% of the code to the open source community. The other 25% would be released later, but for now, is either not ready for customer use or contains code related to future innovations they need to keep out of the hands of competitors. The code is presently formatted with #ifdefs that permit the same code base to work with the pre-production platforms that will be available to university researchers and a much wider range of commercial customers once it goes to open source, while at the same time being available for experimentation and prototyping and forward compatibility testing with the future platform. Keeping a single code base is considered essential for the economics (and sanity) of my group who would have a tough time maintaining two copies in parallel. Files in our current base look something like this: > // Copyright 2012 (C) Acme Technology, All Rights Reserved. > // Very large, often varied and restrictive copyright license in English and French, > // sometimes also embedded in make files and shell scripts with varied > // comment styles. > > > ... Usual header stuff... > > void initTechnologyLibrary() { > nuiInterface(on); > #ifdef UNDER_RESEARCH > holographicVisualization(on); > #endif > } And we would like to convert them to something like: > // GPL Copyright (C) Acme Technology Labs 2012, Some rights reserved. > // Acme appreciates your interest in its technology, please contact [email protected] > // for technical support, and www.acme.com/emergingTech for updates and RSS feed. > > ... Usual header stuff... > > void initTechnologyLibrary() { > nuiInterface(on); > } Is there a tool, parse library, or popular script that can replace the copyright and strip out not just #ifdefs, but variations like #if defined(UNDER_RESEARCH), etc.? The code is presently in Git and would likely be hosted somewhere that uses Git. Would there be a way to safely link repositories together so we can efficiently reintegrate our improvements with the open source versions? Advice about other pitfalls is welcome.

Read the article
C# String.format extension method

- by Paul Roe

With the addtion of Extension methods to C# we've seen a lot of them crop up in our group. One debate revolves around extension methods like this one: public static class StringExt { /// <summary> /// Shortcut for string.Format. /// </summary> /// <param name="str"></param> /// <param name="args"></param> /// <returns></returns> public static string Format(this string str, params object[] args) { if (str == null) return null; return string.Format(str, args); } } Does this extension method break any programming best practices that you can name? Would you use it anyway, if not why? If I renamed the function to "F" but left the xml comments would that be epic fail or just a wonderful savings of keystrokes?

Read the article
Why does glGetString returns a NULL string

- by snape

I am trying my hands at GLFW library. I have written a basic program to get OpenGL renderer and vendor string. Here is the code #include <GL/glew.h> #include <GL/glfw.h> #include <cstdio> #include <cstdlib> #include <string> using namespace std; void shutDown(int returnCode) { printf("There was an error in running the code with error %d\n",returnCode); GLenum res = glGetError(); const GLubyte *errString = gluErrorString(res); printf("Error is %s\n", errString); glfwTerminate(); exit(returnCode); } int main() { // start GL context and O/S window using GLFW helper library if (glfwInit() != GL_TRUE) shutDown(1); if (glfwOpenWindow(0, 0, 0, 0, 0, 0, 0, 0, GLFW_WINDOW) != GL_TRUE) shutDown(2); // start GLEW extension handler glewInit(); // get version info const GLubyte* renderer = glGetString (GL_RENDERER); // get renderer string const GLubyte* version = glGetString (GL_VERSION); // version as a string printf("Renderer: %s\n", renderer); printf("OpenGL version supported %s\n", version); // close GL context and any other GLFW resources glfwTerminate(); return 0; } I googled this error and found out that we have to initialize the OpenGL context before calling glGetString(). Although I have initialized OpenGL context using glfwInit() but still the function returns a NULL string. Any ideas? Edit I have updated the code with error checking mechanisms. This code on running outputs the following There was an error in running the code with error 2 Error is no error

Read the article
What is the simplest human readable configuration file format?

- by Juha

Current configuration file is as follows: mainwindow.title = 'test' mainwindow.position.x = 100 mainwindow.position.y = 200 mainwindow.button.label = 'apply' mainwindow.button.size.x = 100 mainwindow.button.size.y = 30 logger.datarate = 100 logger.enable = True logger.filename = './test.log' This is read with python to a nested dictionary: { 'mainwindow':{ 'button':{ 'label': {'value':'apply'}, ... }, 'logger':{ datarate: {'value': 100}, enable: {'value': True}, filename: {'value': './test.log'} }, ... } Is there a better way of doing this? The idea is to get XML type of behavior and avoid XML as long as possible. The end user is assumed almost totally computer illiterate and basically uses notepad and copy-paste. Thus the python standard "header + variables" type is considered too difficult. The dummy user edits the config file, able programmers handle the dictionaries. Nested dictionary is chosen for easy splitting (logger does not need or even cannot have/edit mainwindow parameters).

Read the article
How to add precedence to LALR parser like in YACC?

- by greenoldman

Please note, I am asking about writing LALR parser, not writing rules for LALR parser. What I need is... ...to mimic YACC precedence definitions. I don't know how it is implemented, and below I describe what I've done and read so far. For now I have basic LALR parser written. Next step -- adding precedence, so 2+3*4 could be parsed as 2+(3*4). I've read about precedence parsers, however I don't see how to fit such model into LALR. I don't understand two points: how to compute when insert parenthesis generator how to compute how many parenthesis the generator should create I insert generators when the symbols is taken from input and put at the stack, right? So let's say I have something like this (| denotes boundary between stack and input): ID = 5 | + ..., at this point I add open, so it gives ID = < 5 | + ..., then I read more input ID = < 5 + | 5 ... and more ID = < 5 + 5 | ; ... and more ID = < 5 + 5 ; | ... At this point I should have several reduce moves in normal LALR, but the open parenthesis does not match so I continue reading more input. Which does not make sense. So this was when problem. And about count, let's say I have such data < 2 + < 3 * 4 >. As human I can see that the last generator should create 2 parenthesis, but how to compute this? After all there could be two scenarios: ( 2 + ( 3 *4 )) -- parenthesis is used to show the outcome of generator or (2 + (( 3 * 4 ) ^ 5) because there was more input Please note that in both cases before 3 was open generator, and after 4 there was close generator. However in both cases, after reading 4 I have to reduce, so I have to know what generator "creates".

Read the article
Persisting natural language processing parsed data

- by tjb1982

I've recently started experimenting with natural language processing (NLP) using Stanford's CoreNLP, and I'm wondering what are some of the standard ways to store NLP parsed data for something like a text mining application? One way I thought might be interesting is to store the children as an adjacency list and make good use of recursive queries (Postgres supports this and I've found it works really well). But I assume there are probably many standard ways to do this depending on what kind of analysis is being done that have been adopted by people working in the field over the years. So what are the standard persistence strategies for NLP parsed data and how are they used?

Read the article
DataContractSerializer truncated string when used with MemoryStream,but works with StringWriter

- by Michael Freidgeim

We've used the following DataContractSerializeToXml method for a long time, but recently noticed, that it doesn't return full XML for a long object, but truncated it and returns XML string with the length of multiple-of-1024 , but the reminder is not included. internal static string DataContractSerializeToXml<T>(T obj) { string strXml = ""; Type type= obj.GetType();//typeof(T) DataContractSerializer serializer = new DataContractSerializer(type); System.IO.MemoryStream aMemStr = new System.IO.MemoryStream(); System.Xml.XmlTextWriter writer = new System.Xml.XmlTextWriter(aMemStr, null); serializer.WriteObject(writer, obj); strXml = System.Text.Encoding.UTF8.GetString(aMemStr.ToArray()); return strXml; } I tried to debug and searched Google for similar problems, but didn't find explanation of the error. The most closed http://forums.codeguru.com/showthread.php?309479-MemoryStream-allocates-size-multiple-of-1024-( talking about incorrect length, but not about truncated string.fortunately replacing MemoryStream to StringWriter according to http://billrob.com/archive/2010/02/09/datacontractserializer-converting-objects-to-xml-string.aspxfixed the issue. 1: var serializer = new DataContractSerializer(tempData.GetType()); 2: using (var backing = new System.IO.StringWriter()) 3: using (var writer = new System.Xml.XmlTextWriter(backing)) 4: { 5: serializer.WriteObject(writer, tempData); 6: data.XmlData = backing.ToString(); 7: }v

Read the article
How to find the occurrence of particular character in string - CHARINDEX

- by Vipin

Many times while writing SQL, we need to find if particular character is present in the column data. SQL server possesses an in-built function to do this job - CHARINDEX(character_to_search, string, [starting_position]) Returns the position of the first occurrence of the character in the string. NOTE - index starts with 1. So, if character is at the starting position, this function would return 1. Returns 0 if character is not found. Returns 0 if 'string' is empty. Returns NULL if string is NULL. A working example of the function is SELECT CHARINDEX('a', fname) a_First_occurence, CHARINDEX('a', fname, CHARINDEX('a', fname)) a_Second_occurrence FROM Users WHERE fname = 'aka unknown' OUTPUT ------- a_First_occurence a_Second_occurrence 1 3

Read the article
Removing ocurrances of characters in a string

- by DmainEvent

I am reading this book, programming Interviews exposed by John Wiley and sons and in chapter 6 they are discussing removing all instances of characters in a src string using a removal string... so removeChars(string str, string remove) In there writeup they sey the steps to accomplish this are to have a boolean lookup array with all values initially set to false, then loop through each character in remove setting the corresponding value in the lookup array to true (note: this could also be a hash if the possible character set where huge like Unicode-16 or something like that or if str and remove are both relatively small... < 100 characters I suppose). You then iterate through the str with a source and destination index, copying each character only if its corresponding value in the lookup array is false... Which makes sense... I don't understand the code that they use however... They have for(src = 0; src < len; ++src){ flags[r[src]] == true; } which is turning the flag value at the remove string indexed at src to true... so if you start out with PLEASE HELP as your str and LEA as your remove you will be setting in your flag table at 0,1,2... t|t|t but after that you will get an out of bounds exception because r doesn't have have anything greater than 2 in it... even using there example you get an out of bounds exception... Am is there code example unworkable?

Read the article
LL(8) and left-recursion

- by Peregring-lk

I want to understand the relation between LL/LR grammars and the left-recursion problem (for any question I know parcially the answer, but I ask them as I don't know nothing, because I am a little confused now, and prefer complete answers) I'm happy with sintetized or short and direct answers (or just links solving it unambiguously): What type of language isn't LL(8) languages? LL(K) and LL(8) have problems with left-recursion? Or only LL(k) parsers? LALR(1) parser have troubles with left or right recursion? What type of troubles? Only in terms of the LL/LALR comparision. What is better, Bison (LALR(1)) or Boost.Spirit (LL(8))? (Let's suppose other features of them are irrelevant in this question) Why GCC use a (hand-made) LL(8) parser? Only for the "handling-error" problem?

Read the article
Generating Wrappers for REST APIs

- by Kyle

Would it be feasible to generate wrappers for REST APIs? An earlier question asked about machine readable descriptions of RESTful services addressed how we could write (and then read) API specifications in a standardized way which would lend itself well to generated wrappers. Could a first pass parser generate a decent wrapper that human intervention could fix up? Perhaps the first pass wouldn't be consistent, but would remove a lot of the grunt work and make it easy to flesh out the rest of the API and types. What would need to be considered? What's stopping people from doing this? Has it already been done and my google fu is weak for the day?

Read the article
Getting data from a webpage in a stable and efficient way

- by Mike Heremans

Recently I've learned that using a regex to parse the HTML of a website to get the data you need isn't the best course of action. So my question is simple: What then, is the best / most efficient and a generally stable way to get this data? I should note that: There are no API's There is no other source where I can get the data from (no databases, feeds and such) There is no access to the source files. (Data from public websites) Let's say the data is normal text, displayed in a table in a html page I'm currently using python for my project but a language independent solution/tips would be nice. As a side question: How would you go about it when the webpage is constructed by Ajax calls?

Read the article
Template syntax for users - is there a right way to do it?

- by RickM

Ok, I'm in the middle of building a saas system, and as part of that, the hosted clients need to be able to edit certain layout templates, baqsically just html, css and javascript files. I'm obviously going to be wanting to use a template syntax here as it would be dumb to let people execute PHP code, so in this instance template syntax does need to be used. I know that in the grand scale of things, this is a very minor thing, but what template syntax do you use, and why? Is there one that's considered better than others? I've seen all sorts being used with no real consistency, for example: Smarty Style: {$someVar} {foreach from="foo" item="bar"} {$bar.food} {/foreach} ASP Style: {% someVar %} {% foreach foo as bar %} {% bar.food %} {% endforeach %} HTML Style: <someVar> <foreach from="foo" item="bar"> <bar:food> </foreach> PyroCMS/FuelPHP "LEX" Style: {{ someVar }} {{ foreach from="foo" item="bar" }} {{ bar:food }} {{ endforeach }} Obviously these arent 100% accurate (for example, LEX is used alongside PHP for loops), and are only to give you an example of what I mean. What, in your opinion would be the best one (if any) to go with. I ask this bearing in mind that people using this are likely to be novice users. I did look around at a bunch of hosted CMS and E-Commerce systems as these seem to make use of user-editable templates, and most seem to be using some form of their own syntax. I should note that whatever style I end up going with, it will be with a custom template handler due to the complexity of the system and how template files are stored. Plus I'd not want to touch the likes of Smarty with a barge pole!

Read the article
First and Follow Sets for a Grammar

- by Aimee Jones

I'm studying for a Compiler Construction module I'm doing and I have a sample question as follows: Calculate the FIRST and FOLLOW sets for the following grammar.. S -> uBDz B -> Bv B -> w D -> EF E -> y E -> e F -> x F -> e I have tried to figure it out so far but I'm a bit unsure if I'm correct. Could someone verify if I'm doing it right, and if not, what am I missing? My answer is below: FIRST | FOLLOW S | {u} | {$} B | {w} | {y,x,v,z} D | {y,e,x} | {z} E | {y,e} | {x,z} F | {x,e} | {z}

Read the article
Basic questions while making a toy calculator

- by Jwan622

I am making a calculator to better understand how to program and I had a question about the following lines of code: I wanted to make my equals sign with this C# code: private void btnEquals_Click(object sender, EventArgs e) { if (plusButtonClicked == true) { total2 = total1 + Convert.ToDouble(txtDisplay.Text); //double.Parse(txtDisplay.Text); } else if (minusButtonClicked == { total2 = total1 - double.Parse(txtDisplay.Text) } } txtDisplay.Text = total2.ToString(); total1 = 0; However, my friend said this way of writing code was superior, with changes in the minus sign. private void btnEquals_Click(object sender, EventArgs e) { if (plusButtonClicked == true) { total2 = total1 + Convert.ToDouble(txtDisplay.Text); //double.Parse(txtDisplay.Text); } else if (minusButtonClicked == true) { double d1; if(double.TryParse(txtDisplay.Text, out d1)) { total2 = total1 - d1; } } txtDisplay.Text = total2.ToString(); total1 = 0; My questions: 1) What does the "out d1" section of this minus sign code mean? 2) My assumption here is that the "TryParse" code results in fewer systems crashes? If I just use "Double.Parse" and I don't put anything in the textbox, the program will crash sometimes right?

Read the article
Generate GUID from any string using C#

- by Haitham Khedre

Some times you need to generate GUID from a string which is not valid for GUID constructor . so what we will do is to get a valid input from string that the GUID constructor will accept it. It is recommended to be sure that the string that you will generate a GUID from it some how unique. The Idea is simple is to convert the string to 16 byte Array which the GUID constructor will accept it. The code will talk : using System; using System.Text; namespace StringToGUID { class Program { static void Main(string[] args) { int tokenLength = 32; int guidByteSize = 16; string token = "BSNAItOawkSl07t77RKnMjYwYyG4bCt0g8DVDBv5m0"; byte[] b = new UTF8Encoding().GetBytes(token.Substring(token.Length - tokenLength, tokenLength).ToCharArray(), 0, guidByteSize); Guid g = new Guid(b); Console.WriteLine(g.ToString()); token = "BSNePf57YwhzeE9QfOyepPfIPao4UD5UohG_fI-#eda7d"; b = new UTF8Encoding().GetBytes(token.Substring(token.Length - tokenLength, tokenLength).ToCharArray(), 0, guidByteSize); g = new Guid(b); Console.WriteLine(g.ToString()); Console.Read(); } } } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } And The output: 37306c53-3774-5237-4b6e-4d6a59775979 66513945-794f-7065-5066-4950616f3455

Read the article
Making a sldprt to PDB file converter?

- by user122083

I wanted to create a parser that can read a solidworks file and turn it into a protein data bank file. This has already been done in a program called DiamondCAD. http://www.zyvex.com/Research/DiamondCAD.html I waant to make a parser that can parse the data and then visualize it the same way as DiamondCAD. I have downloaded and opened solidworks files before and they make no sense to me with never before seen symbols and looks like ancient writing. Does anyone know how a sldprt. file is structured and how it can be parsed into a PDB file? (A software called VMD converts a PDB to Obj. file so it is proof of concept)

Read the article
Analysis of various string reversal algorithms in C#

Analysis of various string reversal algorithms in C#

Read the article
Convert japaneese string to Romaji.

Convert Hiragana and Katakana string to Romaji.

Read the article
Where are strings more useful than a StringBuilder?

- by DJay

Lot of questions has been already asked about the differences between string and string builder and most of the people suggest that string builder is faster than string. I am curious to know if string builder is too good so why string is there? Moreover, can some body give me an example where string will be more usefull than string builder?

Read the article
Find all occurrences of a substring in Python

- by cru3l

Python has string.find() and string.rfind() to get the index of a substring in string. I wonder, maybe there is something like string.find_all() which can return all founded indexes (not only first from beginning or first from end)? For example: string = "test test test test" print string.find('test') # 0 print string.rfind('test') # 15 #that's the goal print string.find_all('test') # [0,5,10,15]

Read the article
Need help regarding one LALR(1) parsing.

- by AppleGrew

I am trying to parse a context-free language, called Context Free Art. I have created its parser in Javascript using a YACC-like JS LALR(1) parser generator JSCC. Take the example of following CFA (Context Free Art) code. This code is a valid CFA. startshape A rule A { CIRCLE { s 1} } Notice the A and s in above. s is a command to scale the CIRCLE, but A is just a name of this rule. In the language's grammar I have set s as token SCALE and A comes under token STRING (I have a regular expression to match string and it is at the bottom of of all tokens). This works fine, but in the below case it breaks. startshape s rule s { CIRCLE { s 1} } This too is a perfectly valid code, but since my parser marks s after rule as SCALE token so it errors out saying that it was expecting STRING. Now my question is, if there is any way to re-write the production rules of the parser to account for this? The related production rule is:- rule: RULE STRING '{' buncha_replacements '}' [* rule(%2, 1) *] | RULE STRING RATIONAL '{' buncha_replacements '}' [* rule(%2, 1*%3) *] ; One simple solution I can think of is create a copy of above rule with STRING replaced by SCALE, but this is just one of the many similar rules which would need such fixing. Furthermore there are many other terminals which can get matched to STRING. So that means way too many rules!

Read the article

< Previous Page | 62 63 64 65 66 67 68 69 70 71 72 73 | Next Page >