Search Results

Search found 29753 results on 1191 pages for 'best practices'.

Page 294/1191 | < Previous Page | 290 291 292 293 294 295 296 297 298 299 300 301  | Next Page >

  • Best way to get back to using the power of lxml after having to use a regex to find something in an

    - by PyNEwbie
    I am trying to rip some text out of a large number of html documents (numbers in the hundreds of thousands). The documents are really forms but they are prepared by a very large group of different organizations so there is significant variation in how they create the document. For example, the documents are divided into chapters. I might want to extract the contents of Chapter 5 from every document so I can analyze the content of the chapter. Initially I thought this would be easy but it turns out that the authors might use a set of non-nested tables throughout the document to hold the content so that Chapter n could be displayed using td tags inside a table. Or they might use other elements such as p tags H tags, div tags or any other block level element. After trying repeatedly to use lxml to help me identify the beginning and end of each chapter I have determined that it is a lot cleaner to use a regular expression because in every case, no matter what the enclosing html element is the chapter label is always in the form of >Chapter # It is a little more complicated in that there might be some white space or non-breaking space represented in different ways (  or   or just spaces). Nonetheless it was trivial to write a regular expression to identify the beginning of each section. (The beginning of one section is the end of the previous section.) But now I want to use lxml to get the text out. My thought is that I have really no choice but to walk along my string to find the close tag for the element that encloses the text I am using to find the relevant section. That is here is one example where the element holding the Chapter name is a div <div style="DISPLAY: block; MARGIN-LEFT: 0pt; TEXT-INDENT: 0pt; MARGIN-RIGHT: 0pt" align="left"><font style="DISPLAY: inline; FONT-WEIGHT: bold; FONT-SIZE: 10pt; FONT-FAMILY: Times New Roman">Chapter 1.&#160;&#160;&#160;Our Beginnings.</font></div> So I am imagining that I would begin at the location where I found the match for chapter 1 and set up a regular expressions to find the next </div|</td|</p|</h1 . . . So at this point I have identified the type of element holding my chapter heading I can use the same logic to find all of the text that is within that element that is set up a regular expression to help me mark from >Chapter 1.&#160;&#160;&#160;Our Beginnings.< So I have identified where my Chapter 1 begins I can do the same for chapter 2 (which is where Chapter 1 ends) Now I am imagining that I am going to snip the document beginning at the opening of the element that I identified as the element the indicates where chapter 1 begins and ending just before the opening of the element that I identified as the element that indicates where Chapter 2 begins. The string that I have identified will then be fed to lxml to use its power to get the content. I am going to all of this trouble because I have read over and over - never use a regular expression to extract content from html documents and I have not hit on a way to be as accurate with lxml to identify the starting and ending locations for the text I want to extract. For example, I can never be certain that the subtitle of Chapter 1 is Our Beginnings it could be Our Red Canary. Let me say that I spent two solid days trying with lxml to be confident that I had the beginning and ending elements and I could only be accurate <60% of the time but a very short regular expression has given me better than 95% success. I have a tendency to make things more complicated than necessary so I am wondering if anyone has seen or solved a similar problems and if they had an approach (not the details mind you) that they would like to offer.

    Read the article

  • Is Prince the best way to PDF in Ruby on Rails?

    - by Angela
    After several Google searches, it appears that the way to create PDF's in Rails from HTML and CSS (versus a new markup language) is to use Prince. With licensing at $3800 for my non-big-commercial app, I'm wondering if this is, in fact, consensus or people have an alternative they can share the what's and how's.

    Read the article

  • How to best deal with photos passed to IFilter?

    - by sharptooth
    I'm implementing an IFilter for indexing image formats. One problem is photos - many users have tons of photos, photos are huge and loading and searching for text on them is time consuming. Yes, sometimes people use cameras instead of scanners for digitizing documents, but the potential problems IMO far outweight the possibility of encountering a document digitized with a photo camera. So my implementation will not extract text from photos at all. What should the IFilter do once it detects that a given file is a photo image - indicate an error or return empty text?

    Read the article

  • What is the best way to store incremental downloaded data?

    - by afriza
    Inspired by Chromium's sha1 class, I am thinking to store incrementally downloaded data using std::string // pseudo-code char buff[BUFF_SIZE]; std::string data; do { size = ReadInternetFileTo(buff,BUFF_SIZE); data.append(buff,size); } while (not_finished); Any foreseeable problems with this method or better way to do it?

    Read the article

  • What's the best SOAP client library for Python, and where is the documentation for it?

    - by blackrobot
    I've never used SOAP before and I'm sort of new to Python. I'm doing this to get myself acquainted with both technologies. I've installed SOAPlib and I've tried to read their Client documentation, but I don't understand it too well. Is there anything else I can look into which is more suited for being a SOAP Client library for Python? Edit: Just in case it helps, I'm using Python 2.6.

    Read the article

  • Best way to implement refusing a value change by the user in Swing?

    - by Michael Borgwardt
    I have a JCheckBox that should not be checked by the user when a certain other field is empty. So now I want to have an error popup and then reset the checkbox (I've considered disabling the checkbox, but the connection to the other field is non-obvious, and a tooltip text IMO not visible enough). What's the correct way to do that in Swing? Through a PropertyVetoException? Where do I throw it and where do I catch it? My first (probably ugly) idea would be to add a ChangeListener that itself shows the popup and resets the value.

    Read the article

  • Which sector in IT industry best suites my career needs?

    - by Shailesh Tainwala
    I am a student of software engineering and will be graduating in a years time. I want to get a few years of work experience before considering further studies. I like the idea of working on projects developing end-to-end systems for medium/large enterprises in different domains. My area of special interest is AI and data-mining. ERP and MIS are terms that closely resemble what I am driving at. What type of companies should I be ideally looking at?

    Read the article

  • How do I best run a search on Date when it is not a :has_many association?

    - by Angela
    I have a number of activities that have a calculated scheduled date. The activities, for example, Email, have a email.days method which is the days from a Contact.start_date on which it should be sent. This means contact.start_date + email.days yields a date on which email is sent to contact. I would like to use link_to around the date, so I can see all the emails and associated contacts that are to be scheduled on that date. However, this "date" is not an attribute or an associate, so I'm not linking to a model's view. It's calculated. So: 1) What should the actual "format" of the date that gets passed in the URl be? What is the method to do the consistent conversion? 2) How do I (find) all instances, because this "date" is not an actual attribute, is it a calculated value which changes depending on the two associated models of Contact and Email. Thanks.

    Read the article

  • What is the best practice for mvc2 confirm password field?

    - by Andrey
    I have asked a similar question recently but getting no answers i am taking a step back with a more broad approach. I am looking to create a confirm password field using asp.net MVC2 that works on the client. All my other client validation is done with MicrosoftMvcValidation.js by just adding the Html.EnableClientValidation(); call. Some of my considerations. Should the confirm password be part of the model object? Using that approach i have created server side validation by creating my own model binder. Are there any projects out there that have done this?

    Read the article

  • What is the best way to implement an object cache with Entity Framework?

    - by Harshal
    Say I have a table of "BlogPosts" in a database and i want to be able to cache the ones that were retrieved already in memory, for further reads, I can just use a standard hashtable type memory cache like System.Web.Caching.Cache, but if i then need to update a property on one of these blog posts e.g. blogPost.Title and update the record in DB, i cannot do this without fetching it again from database as the Entity Framework context used to fetch this record when it was loaded into my cache is already disposed? How do I write code so that I am getting an object from my cache, updating one property and just calling the SaveChanges method without incurring an extra read.

    Read the article

  • What is the best way to call a method right AFTER a form loads?

    - by Jordan S
    I have a C# windows forms application. The way I currently have it set up, when Form1_Load() runs it checks for recovered unsaved data and if it finds some it prompts the user if they want to open that data. When the program runs it works alright but the message box is shown right away and the main program form (Form1) does not show until after the user clicks yes or no. I would like the Form1 to pop up first and then the message box prompt. Now to get around this problem before I have created a timer in my Form, started the timer in the Form1_Load() method, and then performed the check and user prompt in the first Timer Tick Event. This technique solves the problem but is seems like there might be a better way. Do you guys have any better ideas? Edit: I think I have also used a background worker to do something similar. It just seems kinda goofy to go through all the trouble of invoking the method to back to the form thread and all that crap just to have it delayed a couple milliseconds!

    Read the article

  • What is the best way to interoperably serialize a message?

    - by iwein
    I'm considering message serialization support for spring-integration. This would be useful for various wire level transports to implement guaranteed delivery, but also to allow interoperability with other messaging systems (e.g. through AMQP). The fundamental problem that arises is that a message containing Java object in it's payload and headers should be converted to a byte[] and/or written to a stream. Java's own serialization is clearly not going to cut it because that is not interoperable. My preference would be to create an interface that allows the user to implement the needed logic for all Objects that take part in serialization. Is this a sensible idea and what would the interface look like? Is there a standard interoperable way to serialize Objects that would make sense in this context?

    Read the article

  • What is the best solution to do Reporting on Object data for .NET ?

    - by Peter Fox
    Hi, Our projects are using objects as the data source to reports. Our business layer is returning single objects or IEnumerable. Our reports (quite complex) need to display value-type properties of the object, and its related objects. Typical case would be, from a List, display a master report with category data, then a subreport with data for each Product inside each Category, then a subreport for each Part of each Product, and so on. Reporting from the database is not an option for us. We have tried so far - Reporting Services : works but have to mess around with the XML definition of the report to define the datasource classes, very hard to work with if you use an object datasource, architecturally not too clean - Telerik Reports : quite nice (esp., nice architecture) but seems to have problems with complex reports (master/sub), does not give great paging control, rumored to have performance/crash problems (immature product). Does anyone know a good reporting solution that can be integrated in an ASP.NET application and works well with objects as datasources ?

    Read the article

  • What's the best way to validate EntityFramwork 4.0 classes?

    - by lsb
    Hi! I've done a fair amount of searching but I've yet to find an easy way to validate EntityFramework 4.0 entities passed accross the wire via WCF Data Services. Basically, I want to do something on the client like: Proxy.MyEntities entities = new Proxy.MyEntities( new Uri("http://localhost:2679/Service.svc")); Proxy.Vendor vendor = new Proxy.Vendor(); vendor.Code = "ABC/XYZ"; vendor.Status = "ACTIVE"; // I'd like to do something like the following: vendor.Validate(); entities.AddToVendors(vendor); entities.SaveChanges(); Any help in this regard would be greatly appreciated!

    Read the article

  • How to best future proof my application that needs to connect to Outlook?

    - by Troy
    I have a contact management application written in Delphi which has a “Sync with Outlook” feature that I developed 10 years ago. Now, I’m going back to add some features and fix some bugs. This sync feature uses the Outlook object model to get started, but it has an optional mode called “Use MAPI Enhancements” where it uses pure MAPI to speed up how it looks for changes, and it allows notes to be synced w/ RTF instead of just plain text. I'm wondering if supporting two parallel paths of execution is a good idea or not. If I went with all MAPI, I believe I'd avoid some security prompts, and I'd avoid situations where anti-virus has "script-blocking" features which block my app from connecting to Outlook. But I believe that on the down side, my 32-bit app would not be able to to connect with 64-bit Outlook 2010 using MAPI. And I wonder about the future of MAPI in general. If I stick with the Outlook object model, will my 32-bit app be able to connect to the Outlook object model (since it's out of process COM)? If so, this is a compelling reason to keep my Outlook object model execution path in place. But if not, and if my app needs to be compiled for x64, then why not just go with pure MAPI?

    Read the article

  • ZipArchive php Class, would this be the best approach??

    - by SoLoGHoST
    Ok, just wondering on the versions of PHP that this class is built into. And if they are built into all platforms (OS's). I'm wanting an approach to search through a zip file and place files using file_put_contents in different filepaths within the webroot. In any case, I'm familiar with how to do this with the ZipArchive class, but I'm wondering if using this class would be a good solution and support MOST, if not ALL servers?? I mean, I'd rather not use a method that requires the Server to have it installed. I'm looking for a solution to this that will support at least MOST servers without having to install the class... Thanks :) Also, I'd like to support opening tar.gz and/or .tgz files if possible, but I don't think the ZipArchive class supports this, but perhaps a different built-in php class does??

    Read the article

  • What's the best Mac custom disk image creation app?

    - by Lawrence Johnston
    I'm looking for a custom disk image creation app that I can integrate into the build process for my app (which means I need to be able to run it from the command line if possible). My desired features are that it will size the image for me, let me set the location of my icons when the image is opened, set a custom background/icon, etc. Free would be nice but if there's something that does exactly what I need I'll pay for it.

    Read the article

< Previous Page | 290 291 292 293 294 295 296 297 298 299 300 301  | Next Page >