Search Results

Search found 7251 results on 291 pages for 'pdf parsing'.

Page 4/291 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >

  • How to turn a pdf into a text searchable pdf?

    - by don.joey
    I have a number of scanned documents in pdf and I want to be able to search them. How can I do that? Essentially I have to OCR the pdf and then blend the extracted text back into a new pdf. I have unsuccesfully tried pdfocr (which gives me this issue: https://github.com/gkovacs/pdfocr/issues/7) pdfsandwich (of which the software center says it is a poor package and I should not install it) Is there a software package I am unaware of? Or a script that does this?

    Read the article

  • Convert PDF to PNG using ImageMagick

    - by StackOverflowNewbie
    using ImageMagick, what command should i use to convert a PDF to PNG? I need highest quality, smallest file size. this is what I have so far (very slow by the way): convert -density 300 -depth 8 -quality 85 a.pdf a.png Looking at what Gmail does when a user "view" a PDF, the quality is awesome and the file size very minimal. The DPI is just 96 (I have to set a density of 300 to get anything decent). Anyone know how GMail does it? Thanks.

    Read the article

  • Silent Printing of PDF From Within Java

    - by Paul Reiners
    We are looking into silent printing of PDF documents from within Java. The printing will be invoked from the desktop and not through a browser so we cannot use JavaScript. PDF Renderer is an operational solution but their rendering quality is not acceptable. iText does not seem to be pluggable with the Java print service. There are some commercial Java libraries, jPDFPrint by Qoppa, JPedal, and ICEpdf which we have not tried out yet. Does anybody have any experience with PDF silent printing from Java?

    Read the article

  • Parsing a string, Grammar file.

    - by defn
    How would I separate the below string into its parts. What I need to separate is each < Word including the angle brackets from the rest of the string. So in the below case I would end up with several strings 1. "I have to break up with you because " 2. "< reason " (without the spaces) 3. " . But Let's still " 4. "< disclaimer " 5. " ." I have to break up with you because <reason> . But let's still <disclaimer> . below is what I currently have (its ugly...) boolean complete = false; int begin = 0; int end = 0; while (complete == false) { if (s.charAt(end) == '<'){ stack.add(new Terminal(s.substring(begin, end))); begin = end; } else if (s.charAt(end) == '>') { stack.add(new NonTerminal(s.substring(begin, end))); begin = end; end++; } else if (end == s.length()){ if (isTerminal(getSubstring(s, begin, end))){ stack.add(new Terminal(s.substring(begin, end))); } else { stack.add(new NonTerminal(s.substring(begin, end))); } complete = true; } end++;

    Read the article

  • Parsing tab delimited file with double quotes in Perl

    - by sfactor
    I have a data set that is tab delimited with the user-agent strings in double quotes. I need to parse each of these columns and based on the answer of my other post I used the Text::CSV module. 94410634 0 GET "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; GTB6.6; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; AskTB5.5)" 1 The code is a simple one. #!/usr/bin/perl use strict; use warnings; use Text::CSV; my $csv = Text::CSV->new(sep_char => "\t"); while (<>) { if ($csv->parse($_)) { my @columns = $csv->fields(); print "@columns\n"; } else { my $err = $csv->error_input; print "Failed to parse line: $err"; } } But i get the Failed to parse line: error when I try it on this dataset. what am I doing wrong? I need to extract the 4th column containing the user-agent strings for further processing.

    Read the article

  • Perl: parsing string enclosed by double quotes

    - by sfactor
    I need to parse tab/space delimited files that have a lot of columns in Perl. The values are such that the there are large strings enclosed within double quotes. These strings can have any characters such as tabs and spaces or anything else. When I try to parse them with the split function it splits these strings as well. Now how can I make perl understand that the strings within the " " are a single column entry? A simple example is, 12 345546.67677 "Hello World!!!" -567.55656 0.5465767 "Hello_Again; "

    Read the article

  • Subscription service or software to handle a Magazine's PDF

    - by Paolo
    I'm looking for an installable or hosted software (service) to handle the process of public users subscribing to the Magazine and receiving the PDF automatically upon an admin upload the new one. The system will have to: handle the money part (PayPal&Co. are OK) let user buy old issues of the Magazine warn user on subscription expiring, etc. PDF stamping and WordPress integration (user credential sharing, page access of subriscrebed goods, etc) will be a big plus.

    Read the article

  • Looking for .NET library to create PDF

    - by aximili
    We are looking for a .NET PDF creator. It needs to be .NET, so we can just copy the file(s) onto the server, not having to install anything. We only need to create a PDF with some text and images and a heading, that's all. Anyone know a good one? We are happy to buy if there is a good one that is easy to use. Thanks in advance.

    Read the article

  • Print/save full webpage as PDF

    - by Oliver
    I need a method to be able to print/save the current full webpage as a PDF. I know it can be done if I download a PDF printer and print to that; but I need it to be done without the user having to do anything other than click a button in a webpage. I can't do it via PHP as the page is all client side content, so I'm guessing an ActiveX component? Any ideas would be greatly appreciated! Many thanks

    Read the article

  • Bloated PDF created by TCPDF

    - by Yogi Yang 007
    In a web app developed in PHP we are generating Quotations and Invoices (which are very simple and of single page) using TCPDF lib. The lib is working just great but it seems to generate very large PDF files. For example in our case it is generating PDF files as large as 4 MB (+/- a few KB). How to reduce this bloating of PDF files generated by TCPDF? Here is code snippet that I am using ob_start(); include('quote_view_bag_pdf.php'); //This file is valid HTML file with PHP code to insert data from DB $quote = ob_get_contents(); //Capture the content of 'quote_view_bag_pdf.php' file and store in variable ob_end_clean(); //Code to generate PDF file for this Quote //This line is to fix a few errors in tcpdf $k_path_url=''; require_once('tcpdf/config/lang/eng.php'); require_once('tcpdf/tcpdf.php'); // create new PDF document $pdf = new TCPDF(); // remove default header/footer $pdf->setPrintHeader(false); $pdf->setPrintFooter(false); // add a page $pdf->AddPage(); // print html formated text $pdf->writeHtml($quote, true, 0, true, 0); //Insert Variables contents here. //Build Out File Name $pdf_out_file = "pdf/Quote_".$_POST['quote_id']."_.pdf"; //Close and output PDF document $pdf->Output($pdf_out_file, 'F'); $pdf->Output($pdf_out_file, 'I'); /////////////// enter code here Hope this code fragment will give some idea?

    Read the article

  • Cropping a PDF File's Margin During Printing

    - by JavaMan
    I'm using the free Acrobat Reader to print out some pdf documents having very large top/bottom/left/right margins. I want to remove the margins (which are wasting too much space and making the fonts too small). I used to use Acrobat (the paid version having edit features) to crop the src pdf file manually. But since it is an old version it does not support new pdf format and I don't want to upgrade for such a simple use. Is there any free way to crop/remove unwanted white margins from the printed pdf? I am thinking to print the pdf files to a PDF Printer like the Bullzip PDF Printer and enlarge the output file manually so as to remove any white margin. But there does not seem to be such a feature in Bullzip PDF Printer. Is there any other virtual printer software that can be used for this purpose?

    Read the article

  • Populating PDF Fields using FDFACX

    - by NWilliams
    I was recently asked to preform some updates to an existing PDF document. The changes required were completed using Adobe Designer (the only tool I have available to me). These changes included alignment, and new text. Note that there were fillable form fields on the forms, and they were left untouched. The saved version of the form was then put into our ASP.NET application, which pre-populates the form fields were applicable (things like name, address etc... things we have in our database). For some reason, the new form does not populate. I've confirmed that the form fields have the correct names, that the actual file (the pdf) that is being pre-populated has the same permissions as others that are working. There are no errors thrown, and no difference in a step through with a working form and a non-working form. This is a legacy project and I have no real experience with the PDF populator they are using ... FDFACX .NET? And can't find a lot of info on it online. Any ideas?

    Read the article

  • Mutating PDF editable fields programatically

    - by Chris
    Out of tons of questions and answers here about manipulating PDF's with PHP, but none of them seem to fit my requirement. Programmatically, I want to be able to update the content of editable fields. Preferably with PHP. If it matters, the PDF files will be initially hand crafted (as sort of 'template' files that will be copied and filled in over and over again). The list of PDF_* functions on php.net doesn't give me anything that looks (directly) promising. Is this possible with PHP? How?

    Read the article

  • Generate HTML To PDF Control for the .NET application

    - by Karan
    Has anyone used any open source or paid .NET Control which does the conversion job from html to pdf file? At the moment, i am using Winnovative convertor control. But it has a performance limitation during the generation of bulk pages (like more than 1000) in the pdf. The limitation comes when we use bigger images in the html content. From last 4 months i've been working on the winnovative control and found plenty of major bugs in it. For a small application and usage. winnovative is good but not for the level where application will be used by thousands of clients. Please suggest.

    Read the article

  • Converting MS Word Documents to PDF in ASP.NET

    - by glaxaco
    Similar questions have been asked, but nothing exactly like mine, so here goes. We have a collection of Microsoft Word documents on an ASP.NET web server with merge fields whose values are filled in as a result of user form submissions. After the field merge, the server must convert the document to PDF and stream it down to the browser. Our first inclination was to use the Visual Studio Tools for Office API; however, we ran into this warning from Microsoft: Microsoft does not currently recommend, and does not support, Automation of Microsoft Office applications from any unattended, non-interactive client application or component (including ASP, ASP.NET, DCOM, and NT Services), because Office may exhibit unstable behavior and/or deadlock when Office is run in this environment. It looks like the field manipulation can be done using the Open XML SDK, but what's the best way to convert Word 2007 documents to PDF without opening Word? The optimal solution would be low-cost, scalable, have a low memory footprint, be easy to deploy, and have a .NET API.

    Read the article

  • Language parsing to find important words

    - by Matt Huggins
    I'm looking for some input and theory on how to approach a lexical topic. Let's say I have a collection of strings, which may just be one sentence or potentially multiple sentences. I'd like to parse these strings to and rip out the most important words, perhaps with a score that denotes how likely the word is to be important. Let's look at a few examples of what I mean. Example #1: "I really want a Keurig, but I can't afford one!" This is a very basic example, just one sentence. As a human, I can easily see that "Keurig" is the most important word here. Also, "afford" is relatively important, though it's clearly not the primary point of the sentence. The word "I" appears twice, but it is not important at all since it doesn't really tell us any information. I might expect to see a hash of word/scores something like this: "Keurig" => 0.9 "afford" => 0.4 "want" => 0.2 "really" => 0.1 etc... Example #2: "Just had one of the best swimming practices of my life. Hopefully I can maintain my times come the competition. If only I had remembered to take of my non-waterproof watch." This example has multiple sentences, so there will be more important words throughout. Without repeating the point exercise from example #1, I would probably expect to see two or three really important words come out of this: "swimming" (or "swimming practice"), "competition", & "watch" (or "waterproof watch" or "non-waterproof watch" depending on how the hyphen is handled). Given a couple examples like this, how would you go about doing something similar? Are there any existing (open source) libraries or algorithms in programming that already do this?

    Read the article

  • Okular can't read pdf files

    - by hoang anh Nguyen
    I recently have installed Okular on my Ubuntu 14.04. The problem is when I open pdf files, okular gives me the error "Can not find a plugin which is able to handle the document being passed." When I ran Okular by Terminal, this is the message I get. okular(14100)/kdeui (KIconLoader): Error: standard icon theme "oxygen" not found! okular(14100)/kdeui (KIconLoader): Error: standard icon theme "oxygen" not found! okular(14100) KPixmapSequence::Private::loadSequence: Invalid pixmap specified. okular(14100) KPixmapSequence::Private::loadSequence: Invalid pixmap specified. okular(14100) KPixmapSequence::frameSize: No frame loaded okular(14100) KPixmapSequence::Private::loadSequence: Invalid pixmap specified. okular(14100) KPixmapSequence::frameSize: No frame loaded okular(14100) KPixmapSequence::Private::loadSequence: Invalid pixmap specified. okular(14100) KPixmapSequence::frameSize: No frame loaded okular(14100) KPixmapSequence::Private::loadSequence: Invalid pixmap specified. okular(14100) KPixmapSequence::frameSize: No frame loaded okular(14100) KPixmapSequence::Private::loadSequence: Invalid pixmap specified. okular(14100) KPixmapSequence::frameSize: No frame loaded okular(14100) KPixmapSequence::Private::loadSequence: Invalid pixmap specified. okular(14100) KPixmapSequence::frameSize: No frame loaded okular(14100): No ksycoca4 database available! okular(14100)/kdecore (trader) KServiceTypeTrader::defaultOffers: KServiceTypeTrader: serviceType "okular/Generator" not found okular(14100)/kdecore (KConfigSkeleton) KCoreConfigSkeleton::writeConfig: okular(14100)/kdecore (KConfigSkeleton) KCoreConfigSkeleton::writeConfig: okular(14100)/kdecore (KConfigSkeleton) KCoreConfigSkeleton::writeConfig: okular(14100)/kdecore (KConfigSkeleton) KCoreConfigSkeleton::writeConfig: okular(14100)/kdecore (KConfigSkeleton) KCoreConfigSkeleton::writeConfig: okular(14100): No ksycoca4 database available! okular(14100)/kdecore (trader) mimeTypeSycocaServiceOffers: KMimeTypeTrader: mimeType "application/pdf" not found okular(14100): No ksycoca4 database available! okular(14100)/kdecore (trader): KMimeTypeTrader: couldn't find service type "okular/Generator" Please ensure that the .desktop file for it is installed; then run kbuildsycoca4. okular(14100)/okular (app) Okular::Document::openDocument: No plugin for mimetype '"application/pdf"'. okular(14100): Couldn't start knotify from knotify4.desktop: "KLauncher could not be reached via D-Bus. Error when calling start_service_by_desktop_path: The name org.kde.klauncher was not provided by any .service files " okular(14100)/kdeui (KNotification) KNotification::slotReceivedIdError: Error while contacting notify daemon "The name org.kde.knotify was not provided by any .service files" X Error: BadWindow (invalid Window parameter) 3 Major opcode: 20 (X_GetProperty) Resource id: 0x2a0002e okular(14110) KPixmapSequence::Private::loadSequence: Invalid pixmap specified. okular(14110) KPixmapSequence::frameSize: No frame loaded okular(14110) KPixmapSequence::Private::loadSequence: Invalid pixmap specified. okular(14110) KPixmapSequence::frameSize: No frame loaded okular(14110) KPixmapSequence::Private::loadSequence: Invalid pixmap specified. okular(14110) KPixmapSequence::frameSize: No frame loaded X Error: BadWindow (invalid Window parameter) 3 Major opcode: 20 (X_GetProperty) Resource id: 0x2a0001d X Error: BadWindow (invalid Window parameter) 3 Major opcode: 20 (X_GetProperty) Resource id: 0x2a0001d I would be much appreciated for any suggestion to solve this problem. Thanks a lot :)

    Read the article

  • Parsing a website's source

    - by Davlog
    I want to create an application and maybe upload it to the play store but I am not sure if that what my app does is legal or not. I am downloading a page's source from a website to get some information I need. For example if I download a page about the song "Ramble On" by Led Zeppelin and parse this page source to get the song's name, maybe a link to an image and the lyrics. Would that be illegal or can I display these information to my users without getting any problem? Also the website says it's an "open 'wiki-style' [...].It's completely user built by people like you and used every day by fans and developers alike."

    Read the article

  • Parsing mathematical experssions with two values that have parenthesis and minus signs

    - by user45921
    I'm trying to parse equations like these which only has two values or the square root of a certain value from a text file: 100+100 -100-100 -(100)+(-100) sqrt(100) by the minues signs, parenthesis and the operator symbol in the middle and the square root, and I have no idea how to start off... I've got the file part done and the simple calculation parts except that I couldnt get my program to solve the equations in the above. #include <stdio.h> #include <string.h> #include <stdlib.h> #include <math.h> main(){ FILE *fp; char buff[255], sym,sym2,del1,del2,del3,del4; double num1, num2; int ret; fp = fopen("input.txt","r"); while(fgets(buff,sizeof(buff),fp)!=NULL){ char *tok = buff; sscanf(tok,"%lf%c%lf",&num1,&sym,&num2); switch(sym){ case '+': printf("%lf\n", num1+num2); break; case '-': printf("%lf\n", num1-num2); break; case '*': printf("%lf\n", num1*num2); break; case '/': printf("%lf\n", num1/num2); break; default: printf("The input value is not correct\n"); break; } } fclose(fp); } that is what have I written for the other basic operations without parenthesis and the minus sign for the second value and it works great for the simple ones. I'm using a switch method to calculate the add, sub, mul and divide but I'm not sure how to properly use the sscanf function (if I am not using it properly) or if there is another way using a function like strtok to properly parse the parenthesis and the minus signs. Any kind help?

    Read the article

  • PDF Text Extraction Approach Using OCR

    - by Jon
    Has anybody attempted to extract text from a PDF using an OCR library and Java? What did you find to be the most reliable library for text extraction. Most of the approaches I've seen (tesseract, GOCR) are C libraries that would require some JNI code to be written. I'm familiar with pdfbox, which is now an Apache incubator project at version 0.8.x, but it's text extraction isn't always accurate. I'm looking for an alternative approach that is somewhat more reliable. I've not tried Asprise JavaPDF yet, in the process of trying that, but wanted to know more about the OCR approach (if it's possible). Any help would be appreciated.

    Read the article

  • View a pdf with quick webview though apache proxy

    - by Musa
    I have a site(IIS) that is accessed via a proxy in apache(on an IBM i). This site serves PDFs which has quick web view and if I access a pdf directly from the IIS server the PDFs starts to display immediately but if I go through the proxy I have to wait until the entire pdf downloads before I can view it. In the apache config file I use ProxyPass /path/ http://xxx.xxx.xxx.xxx/ <LocationMatch "/path/"> Header set Cache-Control "no-cache" </LocationMatch> I tried adding SetEnv proxy-sendcl to LocationMatch directive this had no effect. The PDFs that view quickly makes a lot of partial requests This is the initial request and response headers GET http://xxx.xxx.xxx.xxx/xxx.PDF HTTP/1.1 Host: xxx.xxx.xxx.xxx Proxy-Connection: keep-alive Cache-Control: no-cache Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8 Pragma: no-cache User-Agent: Mozilla/5.0 (Windows NT 6.2; rv:9.0.1) Gecko/20100101 Firefox/9.0.1 Accept-Encoding: gzip,deflate,sdch Accept-Language: en-US,en;q=0.8 Cookie: chocolatechip HTTP/1.1 200 OK Via: 1.1 xxxxxxxx Connection: Keep-Alive Proxy-Connection: Keep-Alive Content-Length: 15330238 Date: Mon, 25 Aug 2014 12:48:31 GMT Content-Type: application/pdf ETag: "b6262940bbecf1:0" Server: Microsoft-IIS/7.5 Last-Modified: Fri, 22 Aug 2014 13:16:14 GMT Accept-Ranges: bytes X-Powered-By: ASP.NET This is a partial request and response GET http://xxx.xxx.xxx.xxx/xxx.PDF HTTP/1.1 Host: xxx.xxx.xxx.xxx Proxy-Connection: keep-alive Cache-Control: no-cache Pragma: no-cache User-Agent: Mozilla/5.0 (Windows NT 6.2; rv:9.0.1) Gecko/20100101 Firefox/9.0.1 Accept: */* Referer: http://xxx.xxx.xxx.xxx/xxxx.PDF Accept-Encoding: gzip,deflate,sdch Accept-Language: en-US,en;q=0.8 Cookie: chocolatechip Range: bytes=0-32767 HTTP/1.1 206 Partial Content Via: 1.1 xxxxxxxx Connection: Keep-Alive Proxy-Connection: Keep-Alive Content-Length: 32768 Date: Mon, 25 Aug 2014 12:48:31 GMT Content-Range: bytes 0-32767/15330238 Content-Type: application/pdf ETag: "b6262940bbecf1:0" Server: Microsoft-IIS/7.5 Last-Modified: Fri, 22 Aug 2014 13:16:14 GMT Accept-Ranges: bytes X-Powered-By: ASP.NET These are the headers I get if I go through he proxy GET /path/xxx.PDF HTTP/1.1 Host: domain:xxxx Connection: keep-alive Cache-Control: no-cache Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8 Pragma: no-cache User-Agent: Mozilla/5.0 (Windows NT 6.2; rv:9.0.1) Gecko/20100101 Firefox/9.0.1 Accept-Encoding: gzip,deflate,sdch Accept-Language: en-US,en;q=0.8 HTTP/1.1 200 OK Date: Mon, 25 Aug 2014 13:28:42 GMT Server: Microsoft-IIS/7.5 Content-Type: application/pdf Last-Modified: Fri, 22 Aug 2014 13:16:14 GMT Accept-Ranges: bytes ETag: "b6262940bbecf1:0"-gzip X-Powered-By: ASP.NET Cache-Control: no-cache Expires: Thu, 24 Aug 2017 13:28:42 GMT Vary: Accept-Encoding Content-Encoding: gzip Keep-Alive: timeout=300, max=100 Connection: Keep-Alive Transfer-Encoding: chunked I'm guessing its because the proxy uses Transfer-Encoding: chunked but I'm not sure and wasn't able to turn it off to check. Browser Chrome 36.0.1985.143 m Using the native PDF viewer Any help to get the pdf quick web view through the proxy working would be appreciated.

    Read the article

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >