pdf scraping - Page 4 - Developer IT

How to Use Ghostscript DLL to convert PDF to PDF/A

- by imgen

How to user GhostScript DLL to convert PDF to PDF/A. I know I kind of have to call the exported function of gsdll32.dll whose name is gsapi_init_with_args, but how do i pass the right arguments? BTW, i'm using C#.

Read the article

android: open a pdf from my app using the built in pdf viewer

- by mtmurdock

I want to be able to open a pdf file in my app using the android's built in pdf viewer app, but i dont know how to start other apps. I'm sure i have to call start activity, i just dont know how to identify the app im opening and how to pass the file to that specific app. Anyone have a clue?

Read the article

Convert Docx or Odt to Pdf

- by luxifer

Hi there, I need to find a way to convert docx or odt files to pdf on a linux web server. Therefore I'm not willing to install openoffice.org for obvious reasons. I've tried Google but it failed for me, so I'm here :-) I can't imagine there's no other solution to this problem than to install a huge chunk of binaries given that a) there are (or at least should be) lot's of packages which can read docx or at least odt and b) there are as many packages which can write pdf files What am I missing here? scratching head Regards, luxifer ps edit: I don't want to use a web service - neither free or paid edit 2: at this point it would also help to convert the docx back to doc so I could use wvpdf to generate the pdf... edit 3: of course it would also help if i could do search and replace on a doc file in the first place; or xps for that matter

Read the article

How to turn a pdf into a text searchable pdf?

- by don.joey

I have a number of scanned documents in pdf and I want to be able to search them. How can I do that? Essentially I have to OCR the pdf and then blend the extracted text back into a new pdf. I have unsuccesfully tried pdfocr (which gives me this issue: https://github.com/gkovacs/pdfocr/issues/7) pdfsandwich (of which the software center says it is a poor package and I should not install it) Is there a software package I am unaware of? Or a script that does this?

Read the article

Convert PDF to PNG using ImageMagick

- by StackOverflowNewbie

using ImageMagick, what command should i use to convert a PDF to PNG? I need highest quality, smallest file size. this is what I have so far (very slow by the way): convert -density 300 -depth 8 -quality 85 a.pdf a.png Looking at what Gmail does when a user "view" a PDF, the quality is awesome and the file size very minimal. The DPI is just 96 (I have to set a density of 300 to get anything decent). Anyone know how GMail does it? Thanks.

Read the article

Silent Printing of PDF From Within Java

- by Paul Reiners

We are looking into silent printing of PDF documents from within Java. The printing will be invoked from the desktop and not through a browser so we cannot use JavaScript. PDF Renderer is an operational solution but their rendering quality is not acceptable. iText does not seem to be pluggable with the Java print service. There are some commercial Java libraries, jPDFPrint by Qoppa, JPedal, and ICEpdf which we have not tried out yet. Does anybody have any experience with PDF silent printing from Java?

Read the article

Subscription service or software to handle a Magazine's PDF

- by Paolo

I'm looking for an installable or hosted software (service) to handle the process of public users subscribing to the Magazine and receiving the PDF automatically upon an admin upload the new one. The system will have to: handle the money part (PayPal&Co. are OK) let user buy old issues of the Magazine warn user on subscription expiring, etc. PDF stamping and WordPress integration (user credential sharing, page access of subriscrebed goods, etc) will be a big plus.

Read the article

Looking for .NET library to create PDF

- by aximili

We are looking for a .NET PDF creator. It needs to be .NET, so we can just copy the file(s) onto the server, not having to install anything. We only need to create a PDF with some text and images and a heading, that's all. Anyone know a good one? We are happy to buy if there is a good one that is easy to use. Thanks in advance.

Read the article

How to embed an image in a PDF using cfpdfform

- by JGrimm

I'm dynamically generating a PDF with a few variables but also need to be able to embed an image on the PDF. Anyone have any experience doing this using ?

Read the article

Print/save full webpage as PDF

- by Oliver

I need a method to be able to print/save the current full webpage as a PDF. I know it can be done if I download a PDF printer and print to that; but I need it to be done without the user having to do anything other than click a button in a webpage. I can't do it via PHP as the page is all client side content, so I'm guessing an ActiveX component? Any ideas would be greatly appreciated! Many thanks

Read the article

Bloated PDF created by TCPDF

- by Yogi Yang 007

In a web app developed in PHP we are generating Quotations and Invoices (which are very simple and of single page) using TCPDF lib. The lib is working just great but it seems to generate very large PDF files. For example in our case it is generating PDF files as large as 4 MB (+/- a few KB). How to reduce this bloating of PDF files generated by TCPDF? Here is code snippet that I am using ob_start(); include('quote_view_bag_pdf.php'); //This file is valid HTML file with PHP code to insert data from DB $quote = ob_get_contents(); //Capture the content of 'quote_view_bag_pdf.php' file and store in variable ob_end_clean(); //Code to generate PDF file for this Quote //This line is to fix a few errors in tcpdf $k_path_url=''; require_once('tcpdf/config/lang/eng.php'); require_once('tcpdf/tcpdf.php'); // create new PDF document $pdf = new TCPDF(); // remove default header/footer $pdf->setPrintHeader(false); $pdf->setPrintFooter(false); // add a page $pdf->AddPage(); // print html formated text $pdf->writeHtml($quote, true, 0, true, 0); //Insert Variables contents here. //Build Out File Name $pdf_out_file = "pdf/Quote_".$_POST['quote_id']."_.pdf"; //Close and output PDF document $pdf->Output($pdf_out_file, 'F'); $pdf->Output($pdf_out_file, 'I'); /////////////// enter code here Hope this code fragment will give some idea?

Read the article

Cropping a PDF File's Margin During Printing

- by JavaMan

I'm using the free Acrobat Reader to print out some pdf documents having very large top/bottom/left/right margins. I want to remove the margins (which are wasting too much space and making the fonts too small). I used to use Acrobat (the paid version having edit features) to crop the src pdf file manually. But since it is an old version it does not support new pdf format and I don't want to upgrade for such a simple use. Is there any free way to crop/remove unwanted white margins from the printed pdf? I am thinking to print the pdf files to a PDF Printer like the Bullzip PDF Printer and enlarge the output file manually so as to remove any white margin. But there does not seem to be such a feature in Bullzip PDF Printer. Is there any other virtual printer software that can be used for this purpose?

Read the article

Populating PDF Fields using FDFACX

- by NWilliams

I was recently asked to preform some updates to an existing PDF document. The changes required were completed using Adobe Designer (the only tool I have available to me). These changes included alignment, and new text. Note that there were fillable form fields on the forms, and they were left untouched. The saved version of the form was then put into our ASP.NET application, which pre-populates the form fields were applicable (things like name, address etc... things we have in our database). For some reason, the new form does not populate. I've confirmed that the form fields have the correct names, that the actual file (the pdf) that is being pre-populated has the same permissions as others that are working. There are no errors thrown, and no difference in a step through with a working form and a non-working form. This is a legacy project and I have no real experience with the PDF populator they are using ... FDFACX .NET? And can't find a lot of info on it online. Any ideas?

Read the article

Creating a new Pdf by Merging Pdf documents using TCPDF [php]

- by LuRsT

How can I create a new document using other pdfs that I'm generating? I have methods to create some documents, and I want to merge them all in a big pdf, how can I do that with TCPDF? I do not want to use other libs.

Read the article

Remove or hide PDF layer using ABCPdf?

- by Junior Developer

Is is possible to remove or hide a layer from a PDF using ABCPdf or another framework?

Read the article

Mutating PDF editable fields programatically

- by Chris

Out of tons of questions and answers here about manipulating PDF's with PHP, but none of them seem to fit my requirement. Programmatically, I want to be able to update the content of editable fields. Preferably with PHP. If it matters, the PDF files will be initially hand crafted (as sort of 'template' files that will be copied and filled in over and over again). The list of PDF_* functions on php.net doesn't give me anything that looks (directly) promising. Is this possible with PHP? How?

Read the article

Generate HTML To PDF Control for the .NET application

- by Karan

Has anyone used any open source or paid .NET Control which does the conversion job from html to pdf file? At the moment, i am using Winnovative convertor control. But it has a performance limitation during the generation of bulk pages (like more than 1000) in the pdf. The limitation comes when we use bigger images in the html content. From last 4 months i've been working on the winnovative control and found plenty of major bugs in it. For a small application and usage. winnovative is good but not for the level where application will be used by thousands of clients. Please suggest.

Read the article

Converting MS Word Documents to PDF in ASP.NET

- by glaxaco

Similar questions have been asked, but nothing exactly like mine, so here goes. We have a collection of Microsoft Word documents on an ASP.NET web server with merge fields whose values are filled in as a result of user form submissions. After the field merge, the server must convert the document to PDF and stream it down to the browser. Our first inclination was to use the Visual Studio Tools for Office API; however, we ran into this warning from Microsoft: Microsoft does not currently recommend, and does not support, Automation of Microsoft Office applications from any unattended, non-interactive client application or component (including ASP, ASP.NET, DCOM, and NT Services), because Office may exhibit unstable behavior and/or deadlock when Office is run in this environment. It looks like the field manipulation can be done using the Open XML SDK, but what's the best way to convert Word 2007 documents to PDF without opening Word? The optimal solution would be low-cost, scalable, have a low memory footprint, be easy to deploy, and have a .NET API.

Read the article

Generation PDF from HTML (component for .NET)

- by Mio18

Can you please point me to open source or a reasonably priced comercial product capable of generating PDF from HTML?

Read the article

Screen-scraping of a secure page of any site on https:// with asp.net in C#

- by Ajit

I've done site scraping of secure page of any site on http:// but when I am trying to scrap any site on https:// then i always scrape the login page not secure page. Please advice what should i do for scraping a secure page of any site on https://.

Read the article

Okular can't read pdf files

- by hoang anh Nguyen

I recently have installed Okular on my Ubuntu 14.04. The problem is when I open pdf files, okular gives me the error "Can not find a plugin which is able to handle the document being passed." When I ran Okular by Terminal, this is the message I get. okular(14100)/kdeui (KIconLoader): Error: standard icon theme "oxygen" not found! okular(14100)/kdeui (KIconLoader): Error: standard icon theme "oxygen" not found! okular(14100) KPixmapSequence::Private::loadSequence: Invalid pixmap specified. okular(14100) KPixmapSequence::Private::loadSequence: Invalid pixmap specified. okular(14100) KPixmapSequence::frameSize: No frame loaded okular(14100) KPixmapSequence::Private::loadSequence: Invalid pixmap specified. okular(14100) KPixmapSequence::frameSize: No frame loaded okular(14100) KPixmapSequence::Private::loadSequence: Invalid pixmap specified. okular(14100) KPixmapSequence::frameSize: No frame loaded okular(14100) KPixmapSequence::Private::loadSequence: Invalid pixmap specified. okular(14100) KPixmapSequence::frameSize: No frame loaded okular(14100) KPixmapSequence::Private::loadSequence: Invalid pixmap specified. okular(14100) KPixmapSequence::frameSize: No frame loaded okular(14100) KPixmapSequence::Private::loadSequence: Invalid pixmap specified. okular(14100) KPixmapSequence::frameSize: No frame loaded okular(14100): No ksycoca4 database available! okular(14100)/kdecore (trader) KServiceTypeTrader::defaultOffers: KServiceTypeTrader: serviceType "okular/Generator" not found okular(14100)/kdecore (KConfigSkeleton) KCoreConfigSkeleton::writeConfig: okular(14100)/kdecore (KConfigSkeleton) KCoreConfigSkeleton::writeConfig: okular(14100)/kdecore (KConfigSkeleton) KCoreConfigSkeleton::writeConfig: okular(14100)/kdecore (KConfigSkeleton) KCoreConfigSkeleton::writeConfig: okular(14100)/kdecore (KConfigSkeleton) KCoreConfigSkeleton::writeConfig: okular(14100): No ksycoca4 database available! okular(14100)/kdecore (trader) mimeTypeSycocaServiceOffers: KMimeTypeTrader: mimeType "application/pdf" not found okular(14100): No ksycoca4 database available! okular(14100)/kdecore (trader): KMimeTypeTrader: couldn't find service type "okular/Generator" Please ensure that the .desktop file for it is installed; then run kbuildsycoca4. okular(14100)/okular (app) Okular::Document::openDocument: No plugin for mimetype '"application/pdf"'. okular(14100): Couldn't start knotify from knotify4.desktop: "KLauncher could not be reached via D-Bus. Error when calling start_service_by_desktop_path: The name org.kde.klauncher was not provided by any .service files " okular(14100)/kdeui (KNotification) KNotification::slotReceivedIdError: Error while contacting notify daemon "The name org.kde.knotify was not provided by any .service files" X Error: BadWindow (invalid Window parameter) 3 Major opcode: 20 (X_GetProperty) Resource id: 0x2a0002e okular(14110) KPixmapSequence::Private::loadSequence: Invalid pixmap specified. okular(14110) KPixmapSequence::frameSize: No frame loaded okular(14110) KPixmapSequence::Private::loadSequence: Invalid pixmap specified. okular(14110) KPixmapSequence::frameSize: No frame loaded okular(14110) KPixmapSequence::Private::loadSequence: Invalid pixmap specified. okular(14110) KPixmapSequence::frameSize: No frame loaded X Error: BadWindow (invalid Window parameter) 3 Major opcode: 20 (X_GetProperty) Resource id: 0x2a0001d X Error: BadWindow (invalid Window parameter) 3 Major opcode: 20 (X_GetProperty) Resource id: 0x2a0001d I would be much appreciated for any suggestion to solve this problem. Thanks a lot :)

Read the article

View a pdf with quick webview though apache proxy

- by Musa

I have a site(IIS) that is accessed via a proxy in apache(on an IBM i). This site serves PDFs which has quick web view and if I access a pdf directly from the IIS server the PDFs starts to display immediately but if I go through the proxy I have to wait until the entire pdf downloads before I can view it. In the apache config file I use ProxyPass /path/ http://xxx.xxx.xxx.xxx/ <LocationMatch "/path/"> Header set Cache-Control "no-cache" </LocationMatch> I tried adding SetEnv proxy-sendcl to LocationMatch directive this had no effect. The PDFs that view quickly makes a lot of partial requests This is the initial request and response headers GET http://xxx.xxx.xxx.xxx/xxx.PDF HTTP/1.1 Host: xxx.xxx.xxx.xxx Proxy-Connection: keep-alive Cache-Control: no-cache Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8 Pragma: no-cache User-Agent: Mozilla/5.0 (Windows NT 6.2; rv:9.0.1) Gecko/20100101 Firefox/9.0.1 Accept-Encoding: gzip,deflate,sdch Accept-Language: en-US,en;q=0.8 Cookie: chocolatechip HTTP/1.1 200 OK Via: 1.1 xxxxxxxx Connection: Keep-Alive Proxy-Connection: Keep-Alive Content-Length: 15330238 Date: Mon, 25 Aug 2014 12:48:31 GMT Content-Type: application/pdf ETag: "b6262940bbecf1:0" Server: Microsoft-IIS/7.5 Last-Modified: Fri, 22 Aug 2014 13:16:14 GMT Accept-Ranges: bytes X-Powered-By: ASP.NET This is a partial request and response GET http://xxx.xxx.xxx.xxx/xxx.PDF HTTP/1.1 Host: xxx.xxx.xxx.xxx Proxy-Connection: keep-alive Cache-Control: no-cache Pragma: no-cache User-Agent: Mozilla/5.0 (Windows NT 6.2; rv:9.0.1) Gecko/20100101 Firefox/9.0.1 Accept: */* Referer: http://xxx.xxx.xxx.xxx/xxxx.PDF Accept-Encoding: gzip,deflate,sdch Accept-Language: en-US,en;q=0.8 Cookie: chocolatechip Range: bytes=0-32767 HTTP/1.1 206 Partial Content Via: 1.1 xxxxxxxx Connection: Keep-Alive Proxy-Connection: Keep-Alive Content-Length: 32768 Date: Mon, 25 Aug 2014 12:48:31 GMT Content-Range: bytes 0-32767/15330238 Content-Type: application/pdf ETag: "b6262940bbecf1:0" Server: Microsoft-IIS/7.5 Last-Modified: Fri, 22 Aug 2014 13:16:14 GMT Accept-Ranges: bytes X-Powered-By: ASP.NET These are the headers I get if I go through he proxy GET /path/xxx.PDF HTTP/1.1 Host: domain:xxxx Connection: keep-alive Cache-Control: no-cache Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8 Pragma: no-cache User-Agent: Mozilla/5.0 (Windows NT 6.2; rv:9.0.1) Gecko/20100101 Firefox/9.0.1 Accept-Encoding: gzip,deflate,sdch Accept-Language: en-US,en;q=0.8 HTTP/1.1 200 OK Date: Mon, 25 Aug 2014 13:28:42 GMT Server: Microsoft-IIS/7.5 Content-Type: application/pdf Last-Modified: Fri, 22 Aug 2014 13:16:14 GMT Accept-Ranges: bytes ETag: "b6262940bbecf1:0"-gzip X-Powered-By: ASP.NET Cache-Control: no-cache Expires: Thu, 24 Aug 2017 13:28:42 GMT Vary: Accept-Encoding Content-Encoding: gzip Keep-Alive: timeout=300, max=100 Connection: Keep-Alive Transfer-Encoding: chunked I'm guessing its because the proxy uses Transfer-Encoding: chunked but I'm not sure and wasn't able to turn it off to check. Browser Chrome 36.0.1985.143 m Using the native PDF viewer Any help to get the pdf quick web view through the proxy working would be appreciated.

Read the article

Printing on Windows 8 with PDF viewer (Adobe Reader) from network

- by Bongo

i have a problem with the Adobe Reader 8, but the problem seems to be equally bad with other pdf viewers. Here is the configuration: My PDF viewer is located on network drive "Z:" which is the network adress \dgs-main\progs. I tried to start the adobe reader from here - \\dgs-main\progs\Adobe\Reader 8.0\Reader\AcroRd32.exe and open the PDF from here - C:\Users\ServiceDesk\AppData\Local\Temp\GeneratedPDF.pdf The problem is as follows, if i open the PDF with a local PDF viewer everything works fine and i can print the document. If i open the PDF with the Network PDF viewer then it opens, but printing is impossible. The error message states: "Unable to start print job. Is printer available?" As mentioned above, it works with a local pdf viewer. In both cases i use the same printer. The Printer is a network printer but even with a local printer it fails. The error occurs only on Windows 8 machines. On windows 7 it works fine. I Hope somebody can tell me what the problem is. Thanks in advance and have a fine day.

Read the article

PowerPoint 2007 slides are only partially converted to PDF since SP3

- by Tim Pietzcker

EDIT: Microsoft support has confirmed that it's a bug with PowerPoint 2007 SP3. I have recently encountered a problem with the "Save as PDF/XPS" add-in for PowerPoint 2007. When I use "Save as PDF/XPS" to create a PDF version of my presentation, some slides are only partially included in the resulting PDF file. For example, this: (download the PPTX file here) is reduced to this (in Adobe Reader X or Acrobat Pro X (both 10.1.1)): (download the PDF file here) So far, I have only encountered this with slides that contain animation elements, but which part of the elements remain in the PDF version appears not to have anything to do with the order in which the animated elements appear, so that might just be a coincidence. Update: The problem persists even if I "un-animate" the slides (removing the animation but leaving the previously animated elements intact). When viewing the affected slides in Acrobat Reader, it sometimes complains about the file containing invalid elements, and that I should complain to whoever generated the PDF file... Update 2: I have just installed Office 2007 on a new Windows 7 x64 PC. With the original Office version (12.0.4518.1014 MSO 12.0.6562.5003), a correct PDF file is generated. After installation of SP3 (12.0.6606.1000 SP3 MSO 12.0.6607.1000) a corrupt PDF file is generated. Today's Microsoft Updates (to PowerPoint version 12.0.6654.5000) haven't changed anything, by the way. Update 3: I have opened a tech support incident with Microsoft. They have confirmed the "limitation", as they called it, and it is indeed limited to 2007 SP 3 only. They are going to pass it on to the developers but they can't say when or even if a fix would be forthcoming, so I guess I'll upgrade to 2010...

Read the article

methods for preventing large scale data scraping from REST api

- by Simon Kenyon Shepard

I know the immediate answer to this is going to be there is no 100% reliable method of doing this. But I'd like to create a question that details the different possibilities, the difficulty of implementing them and success rates. I would like to go from simple software ip/request speed analysis to high end sophisticated soft/hardware tools, e.g. neural networks. With a goal of predicting and preventing bogus requests and attempts to scrape the service. Many Thanks.

Search Results

Search found 4479 results on 180 pages for 'pdf scraping'.

Page 4/180 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

- by imgen

- by mtmurdock

- by luxifer

- by don.joey

- by StackOverflowNewbie

- by Paul Reiners

- by Paolo

- by aximili

- by JGrimm

- by Oliver

- by Yogi Yang 007

- by JavaMan

- by NWilliams

- by LuRsT

- by Junior Developer

- by Chris

- by Karan

- by glaxaco

- by Mio18

- by Ajit

- by hoang anh Nguyen

- by Musa

- by Bongo

- by Tim Pietzcker

- by Simon Kenyon Shepard

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >