Search Results

Search found 21350 results on 854 pages for 'url parsing'.

Page 130/854 | < Previous Page | 126 127 128 129 130 131 132 133 134 135 136 137  | Next Page >

  • Best way to get back to using the power of lxml after having to use a regex to find something in an

    - by PyNEwbie
    I am trying to rip some text out of a large number of html documents (numbers in the hundreds of thousands). The documents are really forms but they are prepared by a very large group of different organizations so there is significant variation in how they create the document. For example, the documents are divided into chapters. I might want to extract the contents of Chapter 5 from every document so I can analyze the content of the chapter. Initially I thought this would be easy but it turns out that the authors might use a set of non-nested tables throughout the document to hold the content so that Chapter n could be displayed using td tags inside a table. Or they might use other elements such as p tags H tags, div tags or any other block level element. After trying repeatedly to use lxml to help me identify the beginning and end of each chapter I have determined that it is a lot cleaner to use a regular expression because in every case, no matter what the enclosing html element is the chapter label is always in the form of >Chapter # It is a little more complicated in that there might be some white space or non-breaking space represented in different ways (  or   or just spaces). Nonetheless it was trivial to write a regular expression to identify the beginning of each section. (The beginning of one section is the end of the previous section.) But now I want to use lxml to get the text out. My thought is that I have really no choice but to walk along my string to find the close tag for the element that encloses the text I am using to find the relevant section. That is here is one example where the element holding the Chapter name is a div <div style="DISPLAY: block; MARGIN-LEFT: 0pt; TEXT-INDENT: 0pt; MARGIN-RIGHT: 0pt" align="left"><font style="DISPLAY: inline; FONT-WEIGHT: bold; FONT-SIZE: 10pt; FONT-FAMILY: Times New Roman">Chapter 1.&#160;&#160;&#160;Our Beginnings.</font></div> So I am imagining that I would begin at the location where I found the match for chapter 1 and set up a regular expressions to find the next </div|</td|</p|</h1 . . . So at this point I have identified the type of element holding my chapter heading I can use the same logic to find all of the text that is within that element that is set up a regular expression to help me mark from >Chapter 1.&#160;&#160;&#160;Our Beginnings.< So I have identified where my Chapter 1 begins I can do the same for chapter 2 (which is where Chapter 1 ends) Now I am imagining that I am going to snip the document beginning at the opening of the element that I identified as the element the indicates where chapter 1 begins and ending just before the opening of the element that I identified as the element that indicates where Chapter 2 begins. The string that I have identified will then be fed to lxml to use its power to get the content. I am going to all of this trouble because I have read over and over - never use a regular expression to extract content from html documents and I have not hit on a way to be as accurate with lxml to identify the starting and ending locations for the text I want to extract. For example, I can never be certain that the subtitle of Chapter 1 is Our Beginnings it could be Our Red Canary. Let me say that I spent two solid days trying with lxml to be confident that I had the beginning and ending elements and I could only be accurate <60% of the time but a very short regular expression has given me better than 95% success. I have a tendency to make things more complicated than necessary so I am wondering if anyone has seen or solved a similar problems and if they had an approach (not the details mind you) that they would like to offer.

    Read the article

  • how can I parse and group/do stats on user agent strings?

    - by user151841
    I have a database that has the various user-agent strings of visitors to our site. I'd like to do a 'survey' of them to see what browsers our users are using, so that I can know what features I can use in future development. Is there a tool to parse and run statistics on user-agent strings, or a bunch of strings like this? Ideally, I'd like to see a hierarchical grouping of the stats. For instance: Opera/9.80 (Windows Mobile; WCE; Opera Mobi/WMD-50301; U; en) Presto/2.4.13 Version/10.00 Opera/9.80 (J2ME/MIDP; Opera Mini/4.2.14912/1280; U; en) Presto/2.2.0 Opera/9.80 (J2ME/MIDP; Opera Mini/4.2.13918/812; U; en) Presto/2.2.0 Opera/9.64 (Macintosh; Intel Mac OS X; U; en) Presto/2.1.1 Opera/9.60 (J2ME/MIDP; Opera Mini/4.2.13918/786; U; en) Presto/2.2.0 Opera/9.60 (J2ME/MIDP; Opera Mini/4.0.10992/432; U; en) Presto/2.2.0 I'd like to see 6 entries for Opera, broken down into 3 for 9.80, 1 for 9.64 and 2 for 9.60, and so forth for all browsers. Other dimensions, such as OS, would cross the boundaries of the browser version hierarchy, but it might be nice to see also.

    Read the article

  • Is a switch statement ok for 30 or so conditions?

    - by DeanMc
    I am in the final stages of creating an MP4 tag parser in .Net. For those who have experience with tagging music you would be aware that there are an average of 30 or so tags. If tested out different types of loops and it seems that a switch statement with Const values seems to be the way to go with regard to catching the tags in binary. The switch allows me to search the binary without the need to know which order the tags are stored or if there are some not present but I wonder if anyone would be against using a switch statement for so many conditionals. Any insight is much appreciated.

    Read the article

  • Pros and Cons of Java HTML to XML cleaners

    - by cjavapro
    I am looking to allow HTML emails (and other HTML uploads) without letting in scripts and stuff. I plan to have a white list of safe tags and attributes as well as a whitelist of CSS tags and value regexes (to prevent automatic return receipt). I asked a question: Parse a badly formatted XML document (like an HTML file) I found there are many many ways to do this. Some systems have built in sanitizers (which I don't care so much about). I will post some answers and say Community Wiki. Please post any other options you like and say Community Wiki so they can be voted on. Also any comments or wiki edits on what part of a certain product is better and what is not would be greatly appreciated. This page is a very nice listing page but I get kinda lost http://java-source.net/open-source/html-parsers

    Read the article

  • PHP Fomatting Regex - BBCode

    - by Wayne
    To be honest, I suck at regex so much, I would use RegexBuddy, but I'm working on my Mac and sometimes it doesn't help much (for me). Well, for what I need to do is a function in php function replaceTags($n) { $n = str_replace("[[", "<b>", $n); $n = str_replace("]]", "</b>", $n); } Although this is a bad example in case someone didn't close the tag by using ]] or [[, anyway, could you help with regex of: [[ ]] = Bold format ** ** = Italic format (( )) = h2 heading Those are all I need, thanks :) P.S - Is there any software like RegexBuddy available for Mac (Snow Leopard)?

    Read the article

  • Amazon access key showing in URL for Carrierwave and Fog

    - by kcurtin
    I just switched from storing my images uploaded via Carrierwave locally to using Amazon s3 via the fog gem in my Rails 3.1 app. While images are being added, when I click on an image in my application, the URL is providing my access key and a signature. Here is a sample URL (XXX replaced the string with the info): https://s3.amazonaws.com/bucketname/uploads/photo/image/2/IMG_4842.jpg?AWSAccessKeyId=XXX&Signature=XXX%3D&Expires=1332093418 This is happening in development (localhost:3000) and when I am using heroku for production. Here is my uploader: class ImageUploader < CarrierWave::Uploader::Base include CarrierWave::RMagick storage :fog def store_dir "uploads/#{model.class.to_s.underscore}/#{mounted_as}/#{model.id}" end process :convert => :jpg process :resize_to_limit => [640, 640] version :thumb do process :convert => :jpg process :resize_to_fill => [280, 205] end version :avatar do process :convert => :jpg process :resize_to_fill => [120, 120] end end And my config/initializers/fog.rb : CarrierWave.configure do |config| config.fog_credentials = { :provider => 'AWS', :aws_access_key_id => 'XXX', :aws_secret_access_key => 'XXX', } config.fog_directory = 'bucketname' config.fog_public = false end Anyone know how to make sure this information isn't available?

    Read the article

  • SQL error - Cannot convert nvarchar to decimal

    - by jakesankey
    I have a C# application that simply parses all of the txt documents within a given network directory and imports the data to a SQL server db. Everything was cruising along just fine until about the 1800th file when it happend to have a few blanks in columns that are called out as DBType.Decimal (and the value is usually zero in the files, not blank). So I got this error, "cannot convert nvarchar to decimal". I am wondering how I could tell the app to simply skip the lines that have this issue?? Perhaps I could even just change the column type to varchar even tho values are numbers (what problems could this create?) Thanks for any help! using System; using System.Data; using System.Data.SQLite; using System.IO; using System.Text.RegularExpressions; using System.Threading; using System.Collections.Generic; using System.Linq; using System.Data.SqlClient; namespace JohnDeereCMMDataParser { internal class Program { public static List<string> GetImportedFileList() { List<string> ImportedFiles = new List<string>(); using (SqlConnection connect = new SqlConnection(@"Server=FRXSQLDEV;Database=RX_CMMData;Integrated Security=YES")) { connect.Open(); using (SqlCommand fmd = connect.CreateCommand()) { fmd.CommandText = @"SELECT FileName FROM CMMData;"; fmd.CommandType = CommandType.Text; SqlDataReader r = fmd.ExecuteReader(); while (r.Read()) { ImportedFiles.Add(Convert.ToString(r["FileName"])); } } } return ImportedFiles; } private static void Main(string[] args) { Console.Title = "John Deere CMM Data Parser"; Console.WriteLine("Preparing CMM Data Parser... done"); Console.WriteLine("Scanning for new CMM data..."); Console.ForegroundColor = ConsoleColor.Gray; using (SqlConnection con = new SqlConnection(@"Server=FRXSQLDEV;Database=RX_CMMData;Integrated Security=YES")) { con.Open(); using (SqlCommand insertCommand = con.CreateCommand()) { Console.WriteLine("Connecting to SQL server..."); SqlCommand cmdd = con.CreateCommand(); string[] files = Directory.GetFiles(@"C:\Documents and Settings\js91162\Desktop\CMM WENZEL\", "*_*_*.txt", SearchOption.AllDirectories); List<string> ImportedFiles = GetImportedFileList(); insertCommand.Parameters.Add(new SqlParameter("@FeatType", DbType.String)); insertCommand.Parameters.Add(new SqlParameter("@FeatName", DbType.String)); insertCommand.Parameters.Add(new SqlParameter("@Axis", DbType.String)); insertCommand.Parameters.Add(new SqlParameter("@Actual", DbType.Decimal)); insertCommand.Parameters.Add(new SqlParameter("@Nominal", DbType.Decimal)); insertCommand.Parameters.Add(new SqlParameter("@Dev", DbType.Decimal)); insertCommand.Parameters.Add(new SqlParameter("@TolMin", DbType.Decimal)); insertCommand.Parameters.Add(new SqlParameter("@TolPlus", DbType.Decimal)); insertCommand.Parameters.Add(new SqlParameter("@OutOfTol", DbType.Decimal)); foreach (string file in files.Except(ImportedFiles)) { var FileNameExt1 = Path.GetFileName(file); cmdd.Parameters.Clear(); cmdd.Parameters.Add(new SqlParameter("@FileExt", FileNameExt1)); cmdd.CommandText = @" IF (EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = 'RX_CMMData' AND TABLE_NAME = 'CMMData')) BEGIN SELECT COUNT(*) FROM CMMData WHERE FileName = @FileExt; END"; int count = Convert.ToInt32(cmdd.ExecuteScalar()); con.Close(); con.Open(); if (count == 0) { Console.WriteLine("Preparing to parse CMM data for SQL import..."); if (file.Count(c => c == '_') > 5) continue; insertCommand.CommandText = @" INSERT INTO CMMData (FeatType, FeatName, Axis, Actual, Nominal, Dev, TolMin, TolPlus, OutOfTol, PartNumber, CMMNumber, Date, FileName) VALUES (@FeatType, @FeatName, @Axis, @Actual, @Nominal, @Dev, @TolMin, @TolPlus, @OutOfTol, @PartNumber, @CMMNumber, @Date, @FileName);"; string FileNameExt = Path.GetFullPath(file); string RNumber = Path.GetFileNameWithoutExtension(file); int index2 = RNumber.IndexOf("~"); Match RNumberE = Regex.Match(RNumber, @"^(R|L)\d{6}(COMP|CRIT|TEST|SU[1-9])(?=_)", RegexOptions.IgnoreCase); Match RNumberD = Regex.Match(RNumber, @"(?<=_)\d{3}[A-Z]\d{4}|\d{3}[A-Z]\d\w\w\d(?=_)", RegexOptions.IgnoreCase); Match RNumberDate = Regex.Match(RNumber, @"(?<=_)\d{8}(?=_)", RegexOptions.IgnoreCase); string RNumE = Convert.ToString(RNumberE); string RNumD = Convert.ToString(RNumberD); if (RNumberD.Value == @"") continue; if (RNumberE.Value == @"") continue; if (RNumberDate.Value == @"") continue; if (index2 != -1) continue; DateTime dateTime = DateTime.ParseExact(RNumberDate.Value, "yyyyMMdd", Thread.CurrentThread.CurrentCulture); string cmmDate = dateTime.ToString("dd-MMM-yyyy"); string[] lines = File.ReadAllLines(file); bool parse = false; foreach (string tmpLine in lines) { string line = tmpLine.Trim(); if (!parse && line.StartsWith("Feat. Type,")) { parse = true; continue; } if (!parse || string.IsNullOrEmpty(line)) { continue; } Console.WriteLine(tmpLine); foreach (SqlParameter parameter in insertCommand.Parameters) { parameter.Value = null; } string[] values = line.Split(new[] { ',' }); for (int i = 0; i < values.Length - 1; i++) { if (i = "" || i = null) continue; SqlParameter param = insertCommand.Parameters[i]; if (param.DbType == DbType.Decimal) { decimal value; param.Value = decimal.TryParse(values[i], out value) ? value : 0; } else { param.Value = values[i]; } } insertCommand.Parameters.Add(new SqlParameter("@PartNumber", RNumE)); insertCommand.Parameters.Add(new SqlParameter("@CMMNumber", RNumD)); insertCommand.Parameters.Add(new SqlParameter("@Date", cmmDate)); insertCommand.Parameters.Add(new SqlParameter("@FileName", FileNameExt)); insertCommand.ExecuteNonQuery(); insertCommand.Parameters.RemoveAt("@PartNumber"); insertCommand.Parameters.RemoveAt("@CMMNumber"); insertCommand.Parameters.RemoveAt("@Date"); insertCommand.Parameters.RemoveAt("@FileName"); } } } Console.WriteLine("CMM data successfully imported to SQL database..."); } con.Close(); } } } }

    Read the article

  • how to detect an escape sequence in a string

    - by mix
    Given a string named line whose raw version has this value: \rRAWSTRING how can I detect if it has the escape character \r? What I've tried is: if repr(line).startswith('\r'): blah... but it doesn't catch it. I also tried find, such as: if repr(line).find('\r') != -1: blah doesn't work either. What am I missing? thx! EDIT: thanks for all the replies and the corrections re terminolgy and sorry for the confusion. OK, if i do this print repr(line) then what it prints is: '\rSET ENABLE ACK\n' (including the single quotes). i have tried all the suggestions, including: line.startswith(r'\r') line.startswith('\\r') each of which returns False. also tried: line.find(r'\r') line.find('\\r') each of which returns -1

    Read the article

  • nokogiri vs hpricot?

    - by roshan
    Which one would you choose? My important attributes are (not in order) Support & Future enhancements Community & general knowledge base (on the Internet) Comprehensive (i.e proven to parse a wide range of *.*ml pages) Performance Memory Footprint (runtime, not the code-base)

    Read the article

  • Input string was not in the correct format using int.Parse

    - by JDWebs
    I have recently been making a login 'representation' which is not secure. So before answering, please note I am aware of security risks etc., and this will not be on a live site. Also note I am a beginner :P. For my login representation, I am using LINQ to compare values of a DDL to select a username and a Textbox to enter a password, when a login button is clicked. However, an error is thrown 'Input string was not in the correct format', when using int.Parse. Front End: <%@ Page Language="C#" AutoEventWireup="true" CodeFile="Login_Test.aspx.cs" Inherits="Login_Login_Test" %> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head runat="server"> <title>Login Test</title> </head> <body> <form id="LoginTest" runat="server"> <div> <asp:DropDownList ID="DDL_Username" runat="server" Height="20px" DataTextField="txt"> </asp:DropDownList> <br /> <asp:TextBox ID="TB_Password" runat="server" TextMode="Password"></asp:TextBox> <br /> <asp:Button ID="B_Login" runat="server" onclick="B_Login_Click" Text="Login" /> <br /> <asp:Literal ID="LI_Result" runat="server"></asp:Literal> </div> </form> </body> </html> Back End: using System; using System.Collections.Generic; using System.Linq; using System.Web; using System.Web.UI; using System.Web.UI.WebControls; public partial class Login_Login_Test : System.Web.UI.Page { protected void Page_Load(object sender, EventArgs e) { if (!Page.IsPostBack) { Binder(); } } private void Binder() { using (DataClassesDataContext db = new DataClassesDataContext()) { DDL_Username.DataSource = from x in db.DT_Honeys select new { id = x.UsernameID, txt = x.Username }; DDL_Username.DataValueField = "id"; DDL_Username.DataTextField = "txt"; DDL_Username.DataBind(); } } protected void B_Login_Click(object sender, EventArgs e) { if (TB_Password.Text != "") { using (DataClassesDataContext db = new DataClassesDataContext()) { DT_Honey blah = new DT_Honey(); blah = db.DT_Honeys.SingleOrDefault(x => x.UsernameID == int.Parse(DDL_Username.SelectedValue.ToString())); if (blah == null) { LI_Result.Text = "Something went wrong :/"; } if (blah.Password == TB_Password.Text) { LI_Result.Text = "Credentials recognised :-)"; } else { LI_Result.Text = "Error with credentials :-("; } } } } } I am aware this problem is very common, but none of the help I have found online is useful/relevant. Any help/suggestions appreciated; thank you for your time :-).

    Read the article

  • How to write a bison grammer for WDI?

    - by Rizo
    I need some help in bison grammar construction. From my another question: I'm trying to make a meta-language for writing markup code (such as xml and html) wich can be directly embedded into C/C++ code. Here is a simple sample written in this language, I call it WDI (Web Development Interface): /* * Simple wdi/html sample source code */ #include <mySite> string name = "myName"; string toCapital(string str); html { head { title { mySiteTitle; } link(rel="stylesheet", href="style.css"); } body(id="default") { // Page content wrapper div(id="wrapper", class="some_class") { h1 { "Hello, " + toCapital(name) + "!"; } // Lists post ul(id="post_list") { for(post in posts) { li { a(href=post.getID()) { post.tilte; } } } } } } } Basically it is a C source with a user-friendly interface for html. As you can see the traditional tag-based style is substituted by C-like, with blocks delimited by curly braces. I need to build an interpreter to translate this code to html and posteriorly insert it into C, so that it can be compiled. The C part stays intact. Inside the wdi source it is not necessary to use prints, every return statement will be used for output (in printf function). The program's output will be clean html code. So, for example a heading 1 tag would be transformed like this: h1 { "Hello, " + toCapital(name) + "!"; } // would become: printf("<h1>Hello, %s!</h1>", toCapital(name)); My main goal is to create an interpreter to translate wdi source to html like this: tag(attributes) {content} = <tag attributes>content</tag> Secondly, html code returned by the interpreter has to be inserted into C code with printfs. Variables and functions that occur inside wdi should also be sorted in order to use them as printf parameters (the case of toCapital(name) in sample source). Here are my flex/bison files: id [a-zA-Z_]([a-zA-Z0-9_])* number [0-9]+ string \".*\" %% {id} { yylval.string = strdup(yytext); return(ID); } {number} { yylval.number = atoi(yytext); return(NUMBER); } {string} { yylval.string = strdup(yytext); return(STRING); } "(" { return(LPAREN); } ")" { return(RPAREN); } "{" { return(LBRACE); } "}" { return(RBRACE); } "=" { return(ASSIGN); } "," { return(COMMA); } ";" { return(SEMICOLON); } \n|\r|\f { /* ignore EOL */ } [ \t]+ { /* ignore whitespace */ } . { /* return(CCODE); Find C source */ } %% %start wdi %token LPAREN RPAREN LBRACE RBRACE ASSIGN COMMA SEMICOLON CCODE QUOTE %union { int number; char *string; } %token <string> ID STRING %token <number> NUMBER %% wdi : /* empty */ | blocks ; blocks : block | blocks block ; block : head SEMICOLON | head body ; head : ID | ID attributes ; attributes : LPAREN RPAREN | LPAREN attribute_list RPAREN ; attribute_list : attribute | attribute COMMA attribute_list ; attribute : key ASSIGN value ; key : ID {$$=$1} ; value : STRING {$$=$1} /*| NUMBER*/ /*| CCODE*/ ; body : LBRACE content RBRACE ; content : /* */ | blocks | STRING SEMICOLON | NUMBER SEMICOLON | CCODE ; %% I am having difficulties on defining a proper grammar for the language, specially in splitting WDI and C code . I just started learning language processing techniques so I need some orientation. Could someone correct my code or give some examples of what is the right way to solve this problem?

    Read the article

  • Sanitize Content: removing markup from Amazon's content

    - by StackOverflowNewbie
    I'm using Amazon Web Service to get product descriptions of various items. The problem is that Amazon's content contains mark up that is sometimes destructive to the layout of my web page (e.g. unclosed DIVs, etc.). I want to sanitize the content I get from Amazon. My solution would be to do the following (my initial list so far): Remove unnecessary tags such as div, span, etc. while keeping tags like p, ul, ol, etc. Remove all attributes from all the tags (e.g. seems like there are style attributes in some of the tags) Remove excess white space (e.g. multiple spaces, carriage returns, new lines, tabs, etc.) Etc. Before I go off trying to build my solution, I'm wondering if anyone has a better idea (or an already existing solution). Thanks.

    Read the article

  • What should the standard be for ReSTful URLS?

    - by gargantaun
    Since I can't find a chuffing job, I've been reading up on ReST and creating web services. The way I've interpreted it, the future is all about creating a web service for all your data before you build the web app. Which seems like a good idea. However, there seems to be a lot of contradictory thoughts on what the best scheme is for ReSTful URLs. Some people advocate simple pretty urls http://api.myapp.com/resource/1 In addition, some people like to add the API version to the url like so http://api.myapp.com/v1/resource/1 And to make things even more confusing, some people advocate adding the content-type to get requests http://api.myapp.com/v1/resource/1.xml http://api.myapp.com/v1/resource/1.json http://api.myapp.com/v1/resource/1.txt Whereas others think the content-type should be sent in the HTTP header. Soooooooo.... That's a lot of variation, which has left me unsure of what the best URL scheme is. I personally see the merits of the most comprehensive URL that includes a version number, resource locator and content-type, but I'm new to this so I could be wrong. On the other hand, you could argue that you should do "whatever works best for you". But that doesn't really fit with the ReST mentality as far as I can tell since the aim is to have a standard. And since a lot of you people will have more experience than me with ReST, I thought I'd ask for some guidance. So, with all that in mind... What should the standard be for ReSTful URLS?

    Read the article

  • java version of python-dateutil

    - by elhefe
    Python has a very handy package that can parse nearly any unambiguous date and provides helpful error messages on a parse failure, python-dateutil. Comparison to the SimpleDateFormat class is not favorable - AFAICT SimpleDateFormat can only handle one exact date format and the error messages have no granularity. I've looked through the Joda API but it appears Joda is the same way - only one explicit format can be parsed at a time. Is there any package or library that reproduces the python-dateutil behavior? Or am I missing something WRT Joda/SimpleDateFormat?

    Read the article

  • Linux: shell builtin string matching

    - by gmatt
    I am trying to become more familiar with using the builtin string matching stuff available in shells in linux. I came across this guys posting, and he showed an example a="abc|def" echo ${a#*|} # will yield "def" echo ${a%|*} # will yield "abc" I tried it out and it does what its advertised to do, but I don't understand what the $,{},#,*,| are doing, I tried looking for some reference online or in the manuals but I couldn't find anything. Can anyone explain to me what's going on here?

    Read the article

  • Parse usable Street Address, City, State, Zip from a string

    - by Rob Allen
    Problem: I have an address field from an Access database which has been converted to Sql Server 2005. This field has everything all in one field. I need to parse out the individual sections of the address into their appropriate fields in a normalized table. I need to do this for approximately 4,000 records and it needs to be repeatable. Here are the rules for this exercise: 1 - no whining about how this should have been separate fields in the first place, we are often confronted with less than ideal situations and have to make the best of them 2- for this post, use any language you want 3- feel free to play code golf 4 - Assume an address in the US (for now) 5 - assume that the input string will sometimes contain an addressee (the person being addressed) and/or a second street address (i.e. Suite B) 6 - states may be abbreviated 7 - zip code could be standard 5 digit or zip+4 8 - there are typos in some instances UPDATE: In response to the questions posed, standards were not universally followed, I need need to store the individual values, not just geocode and errors means typo (corrected above) Sample Data: A. P. Croll & Son 2299 Lewes-Georgetown Hwy, Georgetown, DE 19947 11522 Shawnee Road, Greenwood DE 19950 144 Kings Highway, S.W. Dover, DE 19901 Intergrated Const. Services 2 Penns Way Suite 405 New Castle, DE 19720 Humes Realty 33 Bridle Ridge Court, Lewes, DE 19958 Nichols Excavation 2742 Pulaski Hwy Newark, DE 19711 2284 Bryn Zion Road, Smyrna, DE 19904 VEI Dover Crossroads, LLC 1500 Serpentine Road, Suite 100 Baltimore MD 21 580 North Dupont Highway Dover, DE 19901 P.O. Box 778 Dover, DE 19903

    Read the article

  • Counting total sum of each value in one column w.r.t another in Perl

    - by sfactor
    I have tab delimited data with multiple columns. I have OS names in column 31 and data bytes in columns 6 and 7. What I want to do is count the total volume of each unique OS. So, I did something in Perl like this: #!/usr/bin/perl use warnings; my @hhfilelist = glob "*.txt"; my %count = (); for my $f (@hhfilelist) { open F, $f || die "Cannot open $f: $!"; while (<F>) { chomp; my @line = split /\t/; # counting volumes in col 6 and 7 for 31 $count{$line[30]} = $line[5] + $line[6]; } close (F); } my $w = 0; foreach $w (sort keys %count) { print "$w\t$count{$w}\n"; } So, the result would be something like Windows 100000 Linux 5000 Mac OSX 15000 Android 2000 But there seems to be some error in this code because the resulting values I get aren't as expected. What am I doing wrong?

    Read the article

  • Twitter O-Auth Callback url

    - by jtymann
    I am having a problem with Twitter's oauth authentication and using a callback url. I am coding in php and using the sample code referenced by the twitter wiki, http://github.com/abraham/twitteroauth I got that code, and tried a simple test and it worked nicely. However I want to programatically specify the callback url, and the example did not support that. So I quickly modified the getRequestToken() method to take in a parameter and now it looks like this: function getRequestToken($params = array()) { $r = $this->oAuthRequest($this->requestTokenURL(), $params); $token = $this->oAuthParseResponse($r); $this->token = new OAuthConsumer($token['oauth_token'], $token['oauth_token_secret']); return $token; } and my call looks like this $tok = $to->getRequestToken(array('oauth_callback' => 'http://127.0.0.1/twitter_prompt/index.php')); This is the only change I made, and the redirect works like a charm, however I am getting an error when I then try and use my newly granted access to try and make a call. I get a "Could not authenticate you" error. Also the application never actually gets added to the users authorized connections. Now I read the specs and I thought all I had to do was specify the parameter when getting the request token. Could someone a little more seasoned in oauth and twitter possibly give me a hand? Thank You

    Read the article

  • PHP Regex to match lines with all-caps with occaisional hyphens.

    - by Yaaqov
    I'm trying to to convert an existing PHP Regular Expression match case to apply to a slightly different style of document. Here's the original style of the document: **FOODS - TYPE A** ___________________________________ **PRODUCT** 1) Mi Pueblito Queso Fresco Authentic Mexican Style Fresh Cheese; 2) La Fe String Cheese **CODE** Sell by date going back to February 1, 2009 And the successfully-running PHP Regex match code that only returns "true" if the line is surrounded by asterisks, and stores each side of the "-" as $m[1] and $m[2], respectively. if ( preg_match('#^\*\*([^-]+)(?:-(.*))?\*\*$#', $line, $m) ) { // only for **header - subheader** $m[2] is set. if ( isset($m[2]) ) { return array(TYPE_HEADER, array(trim($m[1]), trim($m[2]))); } else { return array(TYPE_KEY, array($m[1])); } } So, for line 1: $m[1] = "FOODS" AND $m[2] = "TYPE A"; Line 2 would be skipped; Line 3: $m[1] = "PRODUCT", etc. The question: How would I re-write the above regex match if the headers did not have the asterisks, but still was all-caps, and was at least 4 characters long? For example: FOODS - TYPE A ___________________________________ PRODUCT 1) Mi Pueblito Queso Fresco Authentic Mexican Style Fresh Cheese; 2) La Fe String Cheese CODE Sell by date going back to February 1, 2009 Thank you.

    Read the article

  • PHP: What is an efficient way to parse a text file containing very long lines?

    - by Shaun
    I'm working on a parser in php which is designed to extract MySQL records out of a text file. A particular line might begin with a string corresponding to which table the records (rows) need to be inserted into, followed by the records themselves. The records are delimited by a backslash and the fields (columns) are separated by commas. For the sake of simplicity, let's assume that we have a table representing people in our database, with fields being First Name, Last Name, and Occupation. Thus, one line of the file might be as follows [People] = "\Han,Solo,Smuggler\Luke,Skywalker,Jedi..." Where the ellipses (...) could be additional people. One straightforward approach might be to use fgets() to extract a line from the file, and use preg_match() to extract the table name, records, and fields from that line. However, let's suppose that we have an awful lot of Star Wars characters to track. So many, in fact, that this line ends up being 200,000+ characters/bytes long. In such a case, taking the above approach to extract the database information seems a bit inefficient. You have to first read hundreds of thousands of characters into memory, then read back over those same characters to find regex matches. Is there a way, similar to the Java String next(String pattern) method of the Scanner class constructed using a file, that allows you to match patterns in-line while scanning through the file? The idea is that you don't have to scan through the same text twice (to read it from the file into a string, and then to match patterns) or store the text redundantly in memory (in both the file line string and the matched patterns). Would this even yield a significant increase in performance? It's hard to tell exactly what PHP or Java are doing behind the scenes.

    Read the article

  • Right recursive grammar or left recursive?

    - by user2485710
    I have little to no knowledge of what I'm about to ask, so I would like a suggestion based on the level of skills required to implemented a parser for the given grammar ( since I'm a beginner in this kind of formal approach to parsers and languages ). Just by going back of a couple of years, this situation reminds me a little of Pascal grammar vs C/C++ grammar, this left vs right stuff. But I'm not going to do any of that, my purpose is to implement a simple parser for a markup language for documents like Markdown. So considering that I'm starting with a markup language in mind, I want to keep things simple, which is the easiest one to handle between this 2 options and why . Another kind of grammar could be an easier option for me ? If yes which one do you suggest ?

    Read the article

  • What's the best way to retrieve two pieces of data from an XML file?

    - by Morinar
    I've got an XML document that is in either a pre or post FO transformed state that I need to extract some information from. In the pre-case, I need to pull out two tags that represent the pageWidth and pageHeight and in the post case I need to extract the page-height and page-width parameters from a specific tag (I forget which one it is off the top of my head). What I'm looking for is an efficient/easily maintainable way to grab these two elements. I'd like to only read the document a single time fetching the two things I need. I initially started writing something that would use BufferedReader + FileReader, but then I'm doing string searching and it gets messy when the tags span multiple lines. I then looked at the DOMParser, which seems like it would be ideal, but I don't want to have to read the entire file into memory if I could help it as the files could potentially be large and the tags I'm looking for will nearly always be close to the top of the file. I then looked into SAXParser, but that seems like a big pile of complicated overkill for what I'm trying to accomplish. Anybody have any advice? Or simple implementations that would accomplish my goal? Thanks.

    Read the article

  • Is XMLReader a SAX parser, a DOM parser, or neither?

    - by Renesis
    I am testing various methods to read (possibly large, and very often) XML configuration files in PHP. No writing is ever needed. I have two successful implementations, one using SimpleXML (which I know is a DOM parser) and one using XMLReader. I know that a DOM reader must read the whole tree and therefore uses more memory. My tests reflect that. I also know that A SAX parser is an "event-based" parser that uses less memory because it reads each node from the stream without checking what is next. XMLReader also reads from a stream with the cursor providing data about the node it is currently at. So, it definitely sounds like XMLReader (http://us2.php.net/xmlreader) is not a DOM parser, but my question is, is it a SAX parser, or something else? It seems like XMLReader behaves the way a SAX parser does but does not throw the events themselves (in other words, can you construct a SAX parser with XMLReader?) If it is something else, does the classification it's in have a name?

    Read the article

  • Rails form with a better URL

    - by Sam
    Wow, switching to REST is a different paradigm for sure and is mainly a headache right now. view <% form_tag (businesses_path, :method => "get") do %> <%= select_tag :business_category_id, options_for_select(@business_categories.collect {|bc| [bc.name, bc.id ]}.insert(0, ["All Containers", 0]), which_business_category(@business_category) ), { :onchange => "this.form.submit();"} %> <% end %> controller def index @business_categories = BusinessCategory.find(:all) if params[:business_category_id].to_i != 0 @business_category = BusinessCategory.find(params[:business_category_id]) @businesses = @business_category.businesses else @businesses = Business.all end respond_to do |format| format.html # index.html.erb format.xml { render :xml => @businesses } end end routes map.resources What I want to to is get a better URL than what this form is presenting which is the following: http://localhost:3000/businesses?business_category_id=1 Without REST I would have do something like http://localhost:3000/business/view/bbq bbq as permalink or I would have done http://localhost:300/business_categories/view/bbq and get the business that are associated with the category but I don't really know the best way of doing this. So the two questions are what is the best logic of finding a business by its categories using the latter form and number two how to get that in a pretty URL all through RESTful routes in Rails.

    Read the article

< Previous Page | 126 127 128 129 130 131 132 133 134 135 136 137  | Next Page >