url parsing - Page 52 - Developer IT

Am I correctly handling duplicate URLs for my homepage?

- by Rob Goldstein

I own a Job Search site named www.conservationjobboard.com and have a concern about how the domain is viewed by search engines. The issue is that when the site was first designed, the default page was left as default.php, but the homepage was actually JobBoard.php. To handle this, the default.php page performed a redirect to the JobBoard.php file when www.conservationjobboard.com/ was requested. The main problem resulted because the redirect was a temporary redirect causing search engines to index conservationjobboard.com/ and conservationjobboard.com/JobBoard.php as 2 separate pages. This has since been corrected to use the .htaccess file so that JobBoard.php is now the default file for the root directory eliminating the need for the redirect. Problem is that search engines still show both URL's in search results (one including JobBoard.php and one that ends with /). Another potential problem is that some of my early backlinks are to conservationjobboard.com/JobBoard.php while the rest are to conservationjobboard.com The 2 outstanding questions are as follows: 1. Is my domain still being penalized by search engines like Google for having duplicate homepage URL's? 2. Are all of the back links to my homepage being considered as the same now or is the total number of back links being split between the 2 different URL's? If you think there are still issues with how we have this set-up, I was wondering if you could give me advice on what we should do differently. Thanks.

Read the article

Idea of an algorithm to detect a website's navigation structure?

- by Uwe Keim

Currently I am in the process of developing an importer of any existing, arbitrary (static) HTML website into the upcoming release of our CMS. While the downloading the files is solved successfully, I'm pulling my hair off when it comes to detect a site structure (pages and subpages) purely from the HTML files, without the user specifying additional hints. Basically I want to get a tree like: + Root page 1 + Child page 1 + Child page 2 + Child child page1 + Child page 3 + Root page 2 + Child page 4 + Root page 3 + ... I.e. I want to be able to detect the menu structure from the links inside the pages. This has not to be 100% accurate, but at least I want to achieve more than just a flat list. I thought of looking at multiple pages to see similar areas and identify these as menu areas and parse the links there, but after all I'm not that satisfied with this idea. My question: Can you imagine any algorithm when it comes to detecting such a structure? Update 1: What I'm looking for is not a web spider, but an algorithm do create a logical tree of the relationship of the pages to be able to create pages and subpages inside my CMS when importing them. Update 2: As of Robert's suggestion I'll solve this by starting at the root page, and then simply parse links as you go and treat every link inside a page simply as a child page. Probably I'll recurse not in a deep-first manner but rather in a breadth-first manner to get a more balanced navigation structure.

Read the article

How should I implement a command processing application?

- by Nini Michaels

I want to make a simple, proof-of-concept application (REPL) that takes a number and then processes commands on that number. Example: I start with 1. Then I write "add 2", it gives me 3. Then I write "multiply 7", it gives me 21. Then I want to know if it is prime, so I write "is prime" (on the current number - 21), it gives me false. "is odd" would give me true. And so on. Now, for a simple application with few commands, even a simple switch would do for processing the commands. But if I want extensibility, how would I need to implement the functionality? Do I use the command pattern? Do I build a simple parser/interpreter for the language? What if I want more complex commands, like "multiply 5 until >200" ? What would be an easy way to extend it (add new commands) without recompiling? Edit: to clarify a few things, my end goal would not be to make something similar to WolframAlpha, but rather a list (of numbers) processor. But I want to start slowly at first (on single numbers). I'm having in mind something similar to the way one would use Haskell to process lists, but a very simple version. I'm wondering if something like the command pattern (or equivalent) would suffice, or if I have to make a new mini-language and a parser for it to achieve my goals?

Read the article

Persisting NLP parsed data

- by tjb1982

I've recently started experimenting with NLP using Stanford's CoreNLP, and I'm wondering what are some of the standard ways to store NLP parsed data for something like a text mining application? One way I thought might be interesting is to store the children as an adjacency list and make good use of recursive queries (postgres supports this and I've found it works really well). Something like this: Component ( id, POS, parent_id ) Word ( id, raw, lemma, POS, NER ) CW_Map ( component_id, word_id, position int ) But I assume there are probably many standard ways to do this depending on what kind of analysis is being done that have been adopted by people working in the field over the years. So what are the standard persistence strategies for NLP parsed data and how are they used?

Read the article

How can I best manage making open source code releases from my company's confidential research code?

- by DeveloperDon

My company (let's call them Acme Technology) has a library of approximately one thousand source files that originally came from its Acme Labs research group, incubated in a development group for a couple years, and has more recently been provided to a handful of customers under non-disclosure. Acme is getting ready to release perhaps 75% of the code to the open source community. The other 25% would be released later, but for now, is either not ready for customer use or contains code related to future innovations they need to keep out of the hands of competitors. The code is presently formatted with #ifdefs that permit the same code base to work with the pre-production platforms that will be available to university researchers and a much wider range of commercial customers once it goes to open source, while at the same time being available for experimentation and prototyping and forward compatibility testing with the future platform. Keeping a single code base is considered essential for the economics (and sanity) of my group who would have a tough time maintaining two copies in parallel. Files in our current base look something like this: > // Copyright 2012 (C) Acme Technology, All Rights Reserved. > // Very large, often varied and restrictive copyright license in English and French, > // sometimes also embedded in make files and shell scripts with varied > // comment styles. > > > ... Usual header stuff... > > void initTechnologyLibrary() { > nuiInterface(on); > #ifdef UNDER_RESEARCH > holographicVisualization(on); > #endif > } And we would like to convert them to something like: > // GPL Copyright (C) Acme Technology Labs 2012, Some rights reserved. > // Acme appreciates your interest in its technology, please contact [email protected] > // for technical support, and www.acme.com/emergingTech for updates and RSS feed. > > ... Usual header stuff... > > void initTechnologyLibrary() { > nuiInterface(on); > } Is there a tool, parse library, or popular script that can replace the copyright and strip out not just #ifdefs, but variations like #if defined(UNDER_RESEARCH), etc.? The code is presently in Git and would likely be hosted somewhere that uses Git. Would there be a way to safely link repositories together so we can efficiently reintegrate our improvements with the open source versions? Advice about other pitfalls is welcome.

Read the article

Will adding top level directories with similar structure to existing directories change the SEO of my site?

- by Russell Sims

I've been pointed this way for SEO related questions and this one has had me pondering for a little while now. I'm recreating a site's structure. The website's content is generated through several feeds and unless I want to place each and every - of the 10,000 odd - venues into their own category manually, I can't avoid categorising each item by using its address. The current the structure looks like this Homepage > region > county > city/town > venue page and the URL looks like domain/region/county/city/venue/ I'm relatively happy to use this structure as it's not too convoluted. However we also promote deals and we also group the venues into their respective franchise, so that leads to URLs such as: domain/groups AND domain/deals My question is: how would the directory structure look with these new additions? Would I have a URL that looks like domain/deals/region/county/city/venue or domain/group/region/county/city/venue and just put a 301 or a canonical link tag on the page to prevent the duplicate pages competing with each other? Am I just worrying about it needlessly and perhaps link straight from domain/deals to the venue page URL domain/region/county/city/venue, this bothers me a bit though as the deals and groups will not be in the breadcrumbs.

Read the article

What is the simplest human readable configuration file format?

- by Juha

Current configuration file is as follows: mainwindow.title = 'test' mainwindow.position.x = 100 mainwindow.position.y = 200 mainwindow.button.label = 'apply' mainwindow.button.size.x = 100 mainwindow.button.size.y = 30 logger.datarate = 100 logger.enable = True logger.filename = './test.log' This is read with python to a nested dictionary: { 'mainwindow':{ 'button':{ 'label': {'value':'apply'}, ... }, 'logger':{ datarate: {'value': 100}, enable: {'value': True}, filename: {'value': './test.log'} }, ... } Is there a better way of doing this? The idea is to get XML type of behavior and avoid XML as long as possible. The end user is assumed almost totally computer illiterate and basically uses notepad and copy-paste. Thus the python standard "header + variables" type is considered too difficult. The dummy user edits the config file, able programmers handle the dictionaries. Nested dictionary is chosen for easy splitting (logger does not need or even cannot have/edit mainwindow parameters).

Read the article

Persisting natural language processing parsed data

- by tjb1982

I've recently started experimenting with natural language processing (NLP) using Stanford's CoreNLP, and I'm wondering what are some of the standard ways to store NLP parsed data for something like a text mining application? One way I thought might be interesting is to store the children as an adjacency list and make good use of recursive queries (Postgres supports this and I've found it works really well). But I assume there are probably many standard ways to do this depending on what kind of analysis is being done that have been adopted by people working in the field over the years. So what are the standard persistence strategies for NLP parsed data and how are they used?

Read the article

What is this algorithm for converting strings into numbers called?

- by CodexArcanum

I've been doing some work in Parsec recently, and for my toy language I wanted multi-based fractional numbers to be expressible. After digging around in Parsec's source a bit, I found their implementation of a floating-point number parser, and copied it to make the needed modifications. So I understand what this code does, and vaguely why (I haven't worked out the math fully yet, but I think I get the gist). But where did it come from? This seems like a pretty clever way to turn strings into floats and ints, is there a name for this algorithm? Or is it just something basic that's a hole in my knowledge? Did the folks behind Parsec devise it? Here's the code, first for integers: number' :: Integer -> Parser Integer number' base = do { digits <- many1 ( oneOf ( sigilRange base )) ; let n = foldl (\x d -> base * x + toInteger (convertDigit base d)) 0 digits ; seq n (return n) } So the basic idea here is that digits contains the string representing the whole number part, ie "192". The foldl converts each digit individually into a number, then adds that to the running total multiplied by the base, which means that by the end each digit has been multiplied by the correct factor (in aggregate) to position it. The fractional part is even more interesting: fraction' :: Integer -> Parser Double fraction' base = do { digits <- many1 ( oneOf ( sigilRange base )) ; let base' = fromIntegral base ; let f = foldr (\d x -> (x + fromIntegral (convertDigit base d))/base') 0.0 digits ; seq f (return f) Same general idea, but now a foldr and using repeated division. I don't quite understand why you add first and then divide for the fraction, but multiply first then add for the whole. I know it works, just haven't sorted out why. Anyway, I feel dumb not working it out myself, it's very simple and clever looking at it. Is there a name for this algorithm? Maybe the imperative version using a loop would be more familiar?

Read the article

How to add precedence to LALR parser like in YACC?

- by greenoldman

Please note, I am asking about writing LALR parser, not writing rules for LALR parser. What I need is... ...to mimic YACC precedence definitions. I don't know how it is implemented, and below I describe what I've done and read so far. For now I have basic LALR parser written. Next step -- adding precedence, so 2+3*4 could be parsed as 2+(3*4). I've read about precedence parsers, however I don't see how to fit such model into LALR. I don't understand two points: how to compute when insert parenthesis generator how to compute how many parenthesis the generator should create I insert generators when the symbols is taken from input and put at the stack, right? So let's say I have something like this (| denotes boundary between stack and input): ID = 5 | + ..., at this point I add open, so it gives ID = < 5 | + ..., then I read more input ID = < 5 + | 5 ... and more ID = < 5 + 5 | ; ... and more ID = < 5 + 5 ; | ... At this point I should have several reduce moves in normal LALR, but the open parenthesis does not match so I continue reading more input. Which does not make sense. So this was when problem. And about count, let's say I have such data < 2 + < 3 * 4 >. As human I can see that the last generator should create 2 parenthesis, but how to compute this? After all there could be two scenarios: ( 2 + ( 3 *4 )) -- parenthesis is used to show the outcome of generator or (2 + (( 3 * 4 ) ^ 5) because there was more input Please note that in both cases before 3 was open generator, and after 4 there was close generator. However in both cases, after reading 4 I have to reduce, so I have to know what generator "creates".

Read the article

LL(8) and left-recursion

- by Peregring-lk

I want to understand the relation between LL/LR grammars and the left-recursion problem (for any question I know parcially the answer, but I ask them as I don't know nothing, because I am a little confused now, and prefer complete answers) I'm happy with sintetized or short and direct answers (or just links solving it unambiguously): What type of language isn't LL(8) languages? LL(K) and LL(8) have problems with left-recursion? Or only LL(k) parsers? LALR(1) parser have troubles with left or right recursion? What type of troubles? Only in terms of the LL/LALR comparision. What is better, Bison (LALR(1)) or Boost.Spirit (LL(8))? (Let's suppose other features of them are irrelevant in this question) Why GCC use a (hand-made) LL(8) parser? Only for the "handling-error" problem?

Read the article

Getting data from a webpage in a stable and efficient way

- by Mike Heremans

Recently I've learned that using a regex to parse the HTML of a website to get the data you need isn't the best course of action. So my question is simple: What then, is the best / most efficient and a generally stable way to get this data? I should note that: There are no API's There is no other source where I can get the data from (no databases, feeds and such) There is no access to the source files. (Data from public websites) Let's say the data is normal text, displayed in a table in a html page I'm currently using python for my project but a language independent solution/tips would be nice. As a side question: How would you go about it when the webpage is constructed by Ajax calls?

Read the article

Generating Wrappers for REST APIs

- by Kyle

Would it be feasible to generate wrappers for REST APIs? An earlier question asked about machine readable descriptions of RESTful services addressed how we could write (and then read) API specifications in a standardized way which would lend itself well to generated wrappers. Could a first pass parser generate a decent wrapper that human intervention could fix up? Perhaps the first pass wouldn't be consistent, but would remove a lot of the grunt work and make it easy to flesh out the rest of the API and types. What would need to be considered? What's stopping people from doing this? Has it already been done and my google fu is weak for the day?

Read the article

First and Follow Sets for a Grammar

- by Aimee Jones

I'm studying for a Compiler Construction module I'm doing and I have a sample question as follows: Calculate the FIRST and FOLLOW sets for the following grammar.. S -> uBDz B -> Bv B -> w D -> EF E -> y E -> e F -> x F -> e I have tried to figure it out so far but I'm a bit unsure if I'm correct. Could someone verify if I'm doing it right, and if not, what am I missing? My answer is below: FIRST | FOLLOW S | {u} | {$} B | {w} | {y,x,v,z} D | {y,e,x} | {z} E | {y,e} | {x,z} F | {x,e} | {z}

Read the article

Template syntax for users - is there a right way to do it?

- by RickM

Ok, I'm in the middle of building a saas system, and as part of that, the hosted clients need to be able to edit certain layout templates, baqsically just html, css and javascript files. I'm obviously going to be wanting to use a template syntax here as it would be dumb to let people execute PHP code, so in this instance template syntax does need to be used. I know that in the grand scale of things, this is a very minor thing, but what template syntax do you use, and why? Is there one that's considered better than others? I've seen all sorts being used with no real consistency, for example: Smarty Style: {$someVar} {foreach from="foo" item="bar"} {$bar.food} {/foreach} ASP Style: {% someVar %} {% foreach foo as bar %} {% bar.food %} {% endforeach %} HTML Style: <someVar> <foreach from="foo" item="bar"> <bar:food> </foreach> PyroCMS/FuelPHP "LEX" Style: {{ someVar }} {{ foreach from="foo" item="bar" }} {{ bar:food }} {{ endforeach }} Obviously these arent 100% accurate (for example, LEX is used alongside PHP for loops), and are only to give you an example of what I mean. What, in your opinion would be the best one (if any) to go with. I ask this bearing in mind that people using this are likely to be novice users. I did look around at a bunch of hosted CMS and E-Commerce systems as these seem to make use of user-editable templates, and most seem to be using some form of their own syntax. I should note that whatever style I end up going with, it will be with a custom template handler due to the complexity of the system and how template files are stored. Plus I'd not want to touch the likes of Smarty with a barge pole!

Read the article

Basic questions while making a toy calculator

- by Jwan622

I am making a calculator to better understand how to program and I had a question about the following lines of code: I wanted to make my equals sign with this C# code: private void btnEquals_Click(object sender, EventArgs e) { if (plusButtonClicked == true) { total2 = total1 + Convert.ToDouble(txtDisplay.Text); //double.Parse(txtDisplay.Text); } else if (minusButtonClicked == { total2 = total1 - double.Parse(txtDisplay.Text) } } txtDisplay.Text = total2.ToString(); total1 = 0; However, my friend said this way of writing code was superior, with changes in the minus sign. private void btnEquals_Click(object sender, EventArgs e) { if (plusButtonClicked == true) { total2 = total1 + Convert.ToDouble(txtDisplay.Text); //double.Parse(txtDisplay.Text); } else if (minusButtonClicked == true) { double d1; if(double.TryParse(txtDisplay.Text, out d1)) { total2 = total1 - d1; } } txtDisplay.Text = total2.ToString(); total1 = 0; My questions: 1) What does the "out d1" section of this minus sign code mean? 2) My assumption here is that the "TryParse" code results in fewer systems crashes? If I just use "Double.Parse" and I don't put anything in the textbox, the program will crash sometimes right?

Read the article

Making a sldprt to PDB file converter?

- by user122083

I wanted to create a parser that can read a solidworks file and turn it into a protein data bank file. This has already been done in a program called DiamondCAD. http://www.zyvex.com/Research/DiamondCAD.html I waant to make a parser that can parse the data and then visualize it the same way as DiamondCAD. I have downloaded and opened solidworks files before and they make no sense to me with never before seen symbols and looks like ancient writing. Does anyone know how a sldprt. file is structured and how it can be parsed into a PDB file? (A software called VMD converts a PDB to Obj. file so it is proof of concept)

Read the article

Getting a Web Resource Url in non WebForms Applications

- by Rick Strahl

WebResources in ASP.NET are pretty useful feature. WebResources are resources that are embedded into a .NET assembly and can be loaded from the assembly via a special resource URL. WebForms includes a method on the ClientScriptManager (Page.ClientScript) and the ScriptManager object to retrieve URLs to these resources. For example you can do: ClientScript.GetWebResourceUrl(typeof(ControlResources), ControlResources.JQUERY_SCRIPT_RESOURCE); GetWebResourceUrl requires a type (which is used for the assembly lookup in which to find the resource) and the resource id to lookup. GetWebResourceUrl() then returns a nasty old long URL like this: WebResource.axd?d=-b6oWzgbpGb8uTaHDrCMv59VSmGhilZP5_T_B8anpGx7X-PmW_1eu1KoHDvox-XHqA1EEb-Tl2YAP3bBeebGN65tv-7-yAimtG4ZnoWH633pExpJor8Qp1aKbk-KQWSoNfRC7rQJHXVP4tC0reYzVw2&t=634533278261362212 While lately excessive resource usage has been frowned upon especially by MVC developers who tend to opt for content distributed as files, I still think that Web Resources have their place even in non-WebForms applications. Also if you have existing assemblies that include resources like scripts and common image links it sure would be nice to access them from non-WebForms pages like MVC views or even in plain old Razor Web Pages. Where's my Page object Dude? Unfortunately natively ASP.NET doesn't have a mechanism for retrieving WebResource Urls outside of the WebForms engine. It's a feature that's specifically baked into WebForms and that relies specifically on the Page HttpHandler implementation. Both Page.ClientScript (obviously) and ScriptManager rely on a hosting Page object in order to work and the various methods off these objects require control instances passed. The reason for this is that the script managers can inject scripts and links into Page content (think RegisterXXXX methods) and for that a Page instance is required. However, for many other methods - like GetWebResourceUrl() - that simply return resources or resource links the Page reference is really irrelevant. While there's a separate ClientScriptManager class, it's marked as sealed and doesn't have any public constructors so you can't create your own instance (without Reflection). Even if it did the internal constructor it does have requires a Page reference. No good… So, can we get access to a WebResourceUrl generically without running in a WebForms Page instance? We just have to create a Page instance ourselves and use it internally. There's nothing intrinsic about the use of the Page class in ClientScript, at least for retrieving resources and resource Urls so it's easy to create an instance of a Page for example in a static method. For our needs of retrieving ResourceUrls or even actually retrieving script resources we can use a canned, non-configured Page instance we create on our own. The following works just fine: public static string GetWebResourceUrl(Type type, string resource ) { Page page = new Page(); return page.ClientScript.GetWebResourceUrl(type, resource); } A slight optimization for this might be to cache the created Page instance. Page tends to be a pretty heavy object to create each time a URL is required so you might want to cache the instance: public class WebUtils { private static Page CachedPage { get { if (_CachedPage == null) _CachedPage = new Page(); return _CachedPage; } } private static Page _CachedPage; public static string GetWebResourceUrl(Type type, string resource) { return CachedPage.ClientScript.GetWebResourceUrl(type, resource); } } You can now use GetWebResourceUrl in a Razor page like this: <!DOCTYPE html> <html <head> <script src="@WebUtils.GetWebResourceUrl(typeof(ControlResources),ControlResources.JQUERY_SCRIPT_RESOURCE)"> </script> </head> <body> <div class="errordisplay"> <img src="@WebUtils.GetWebResourceUrl(typeof(ControlResources),ControlResources.WARNING_ICON_RESOURCE)" /> This is only a Test! </div> </body> </html> And voila - there you have WebResources served from a non-Page based application. WebResources may be a on the way out, but legacy apps have them embedded and for some situations, like fallback scripts and some common image resources I still like to use them. Being able to use them from non-WebForms applications should have been built into the core ASP.NETplatform IMHO, but seeing that it's not this workaround is easy enough to implement.© Rick Strahl, West Wind Technologies, 2005-2011Posted in ASP.NET MVC Tweet (function() { var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = 'https://apis.google.com/js/plusone.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s); })();

Read the article

[PHP] Sanitizing strings to make them URL and filename safe?

- by Xeoncross

I am trying to come up with a function that does a good job of sanitizing certain strings so that they are safe to use in the URL (like a post slug) and also safe to use as file names. For example, when someone uploads a file I want to make sure that I remove all dangerous characters from the name. So far I have come up with the following function which I hope solves this problem and also allows foreign UTF-8 data also. /** * Convert a string to the file/URL safe "slug" form * * @param string $string the string to clean * @param bool $is_filename TRUE will allow additional filename characters * @return string */ function sanitize($string = '', $is_filename = FALSE) { // Replace all weird characters with dashes preg_replace('/[^\w\-'. ($is_filename ? '*~_\.' : ''). ']+/u', '-', $string); // Only allow one dash separator at a time (and make string lowercase) return mb_strtolower(preg_replace('/--+/u', '-', $string), 'UTF-8'); } Does anyone have any tricky sample data I can run against this - or know of a better way to safeguard our apps from bad names?

Read the article

Android: howto parse URL String with spaces to URI object?

- by Mannaz

I have a string representing an URL containing spaces and want to convert it to an URI object. If is simple try to do String myString = "http://myhost.com/media/mp3s/9/Agenda of swine - 13. Persecution Ascension_ leave nothing standing.mp3"; URI myUri = new URI(myString); it gives me java.net.URISyntaxException: Illegal character in path at index X where index X is the position of the first space in the URL string. How can i parse myStringinto a URI object?

Read the article

How can I validate a website URL in Perl?

- by rekha-sri

I need a regular expression for validating the website URL using Perl.

Read the article

url with question mark considered as new http request?

- by Navin Leon

I am optimization my web page by implementing caching, so if I want the browser not to take data from cache, then I will append a dynamic number as query value. eg: google.com?val=823746 But some time, if I want to bring data from cache for the below url, the browser is making a new http request to server, its not taking data from cache. Is that because of the question mark in URL ? eg: http://google.com? Please provide some reference document link. Thanks in advance. Regards, Navin

Read the article

IIS7: URL Rewrite - can it be used to hide a CDN path?

- by Wild Thing

Hi, I am using Rackspace Cloud CDN (Limelight CDN) for my website. The URLs of the CDN are in the format http://cxxxxxx.cdn.cloudfiles.rackspacecloud.com/something.jpg My domain is mydomain.com. Can I use IIS URL rewriting to show http://cxxxxxx.cdn.cloudfiles.rackspacecloud.com/something.jpg as http://images.mydomain.com/something.jpg? Or is this impossible without the CDN setup accepting my CNAME? If so, can you please help create the URL rewrite rule? (Sorry, don't know how to use regular expressions) Thanks, WT

Read the article

How to open PHP socket onto some URL? (like www.ex.com:8080/mySock/)

- by Ole Jak

How to open PHP socket onto some URL? (like www.ex.com:8080/mySock/)

Read the article

Simple line matching using Regex

- by Joan Venge

I have this string stream: "do=whoposted&t=1934067" rel=nofollow>61</A></TD><TD class=alt2 align=middle>5,286</TD></TR><TR><TD id=td_threadstatusicon_1911046 class=alt1><IMG id=thread_statusicon_1911046 border=0 alt="" src="http://url.com/forum/images/statusicon/thread_new.gif"> </TD><TD class=alt2><IMG title=Node border=0 alt=Node src="http://url.com/forum/images/icons/new.png"></TD><TD id=td_threadtitle_1911046 class=alt1 title="http://lulzimg.com/i14/7bd11b.jpg 
 
Complete name : cool-thread...."><DIV><A id=thread_gotonew_1911046 href="http://url.com/forum/f80/cool-topic-new/"><IMG class=inlineimg title="Go to first new post" border=0 alt="Go to first new post" src="http://url.com/forum/images/buttons/firstnew.gif"></A> [MULTI] <A style="FONT-WEIGHT: bold" id=thread_title_1911046 href="http://url.com/forum/f80/cool-topic-name-1911046/">Cool Topic Name</A> </DIV><DIV class=smallfont><SPAN style="CURSOR: pointer" onclick="window.open('http://url.com/forum/members/u2031889/', '_self')">m3no</SPAN> </DIV></TD><TD class=alt2 title="Replies: 11, Views: 1,554"><DIV style="TEXT-ALIGN: right; WHITE-SPACE: nowrap" class=smallfont>Today <SPAN class=time>08:04 AM</SPAN><BR>by <A href="http://url.com/forum/members/u1131830/" rel=nofollow>karetsos</A> <A " The lines I am interested are similar to this: <A style="FONT-WEIGHT: bold" id=thread_title_1911046 href="http://url.com/forum/f80/cool-topic-name-1911046/">Cool Topic Name</A> From here all I am trying to extract are: Thread id: 1911046 (could be from either location in the string) Thread name: "Cool Topic Name" Thread link: "http://url.com/forum/f80/cool-topic-name-1911046/" Currently I use this: Regex pattern = new Regex ( "<A\\s+href=\"([^\"]*)\">([^\\x00]*?)\\s+id=thread_title_(\\S+)</A>" ); MatchCollection matches = pattern.Matches ( doc.ToString ( ) ); foreach ( Match match in matches ) { int id = Convert.ToInt32 ( match.Groups [ 1 ].Value ); string name = match.Groups [ 3 ].Value; string link = match.Groups [ 2 ].Value; ... } I would appreciate if someone can help me fix the pattern to match it. This used to work but it returns 0 matches.

Search Results

Search found 21350 results on 854 pages for 'url parsing'.

Page 52/854 | < Previous Page | 48 49 50 51 52 53 54 55 56 57 58 59 | Next Page >

- by Rob Goldstein

- by Uwe Keim

- by Nini Michaels

- by tjb1982

- by DeveloperDon

- by Russell Sims

- by Juha

- by tjb1982

- by CodexArcanum

- by greenoldman

- by Peregring-lk

- by Mike Heremans

- by Kyle

- by Aimee Jones

- by RickM

- by Jwan622

- by user122083

- by Rick Strahl

- by Xeoncross

- by Mannaz

- by rekha-sri

- by Navin Leon

- by Wild Thing

- by Ole Jak

- by Joan Venge

< Previous Page | 48 49 50 51 52 53 54 55 56 57 58 59 | Next Page >