pdf scraping - Page 18 - Developer IT

How to debug a corrupt pdf file?

- by Joelio

Hi, im generating pdf files using a ruby library called "prawn". I have one particular file that seems to be considered "Corrupt" by adobe reader. It shows up fine in both preview and in adobe reader. It gives errors like: Sometimes I get: "Could not find the XObject named '%s'. Othertimes I get: "Could not find the XObject named "Im4". Then always I get: "An error exists on this page. Acrobat may not display the page correctly. Please contact the person who created the PDF document to correct the problem." Is there a way to open a pdf with some tool and have it tell you what is technically wrong with the pdf? Im sure I could figure it out quickly with something like this... thanks Joel

Read the article

Display Pdf Stream on page load with Binary Write

- by Israfel

I have a pdf file being generated on the fly which I want to display inline on pageload, as below Response.Clear() Response.AppendHeader("Content-Disposition", "inline; filename=_Bulk_Print.pdf") Response.ContentType = "application/pdf" Response.BinaryWrite(docData) Response.End() If I put this in a click event it works perfectly but when on page load I Just get a blank aspx page despite the fact it's stepping through that code and the generation of the DocData no problem. Does anyone know the reason for this or a workaround, thanks for your help.

Read the article

Extract text from PDF (google app engine)

- by Miroslav Bajtoš

Is there any free Java library for extracting text from PDF, that is compatible with Google Application Engine? I've read about PDFJet, but it can't read PDF, can it? Is there perhaps other way how to extract text from PDF? I tried http://www.pdfdownload.org/, unfortunately they don't handle non-English characters correctly.

Read the article

JAVA - Download PDF file from Webserver

- by Augusto Picciani

I need to download a pdf file from a webserver to my pc and save it locally. I used Httpclient to connect to webserver and get the content body: HttpEntity entity=response.getEntity(); InputStream in=entity.getContent(); String stream = CharStreams.toString(new InputStreamReader(in)); int size=stream.length(); System.out.println("stringa html page LENGTH:"+stream.length()); System.out.println(stream); SaveToFile(stream); Then i save content in a file: //check CRLF (i don't know if i need to to this) String[] fix=stream.split("\r\n"); File file=new File("C:\\Users\\augusto\\Desktop\\progetti web\\test\\test2.pdf"); PrintWriter out = new PrintWriter(new FileWriter(file)); for (int i = 0; i < fix.length; i++) { out.print(fix[i]); out.print("\n"); } out.close(); I also tried to save a String content to file directly: OutputStream out=new FileOutputStream("pathPdfFile"); out.write(stream.getBytes()); out.close(); But the result is always the same: I can open pdf file but i can see white pages only. Does the mistake is around pdf stream and endstream charset encoding? Does pdf content between stream and endStream need to be manipulate in some others way?

Read the article

Weird problem, with ghostscript and pdf files.

- by kofucii

Hello, am using ghostscript to create pdf file from postscript file. My PS file, doesn't have orientation instructions, so when I want to create landscape pdf file, I'm using ghostscript to rotate the page. The problem is, that ghostscript rotates only the first page, and when my pdf file is more than 1 page, the others, are not rotated correctly. Here is the command I'm using: cat $psinput | gs -sPAPERSIZE=a4 -sDEVICE=pdfwrite -sOuputFile="/tmp/pdf" \ -dAutoRotatePages="/None" -c "<< /Orientation 3 >> setpagedevice" \ 90 rotate 0 -595 translate -dNOPAUSE -dEPSCrop -f - -c -quit Does anybody have an idea how to correct this?

Read the article

OutOfMemoryError during the pdf merge

- by Vijay

the below code merges the pdf files and returns the combined pdf data. while this code runs, i try to combine the 100 files with each file approximately around 500kb, i get outofmemory error in the line document.close();. this code runs in the web environment, is the memory available to webspehere server is the problem? i read in an article to use freeReader method, but i cannot get how to use it my scenario. protected ByteArrayOutputStream joinPDFs(List<InputStream> pdfStreams, boolean paginate) { Document document = new Document(); ByteArrayOutputStream mergedPdfStream = new ByteArrayOutputStream(); try { //List<InputStream> pdfs = pdfStreams; List<PdfReader> readers = new ArrayList<PdfReader>(); int totalPages = 0; //Iterator<InputStream> iteratorPDFs = pdfs.iterator(); Iterator<InputStream> iteratorPDFs = pdfStreams.iterator(); // Create Readers for the pdfs. while (iteratorPDFs.hasNext()) { InputStream pdf = iteratorPDFs.next(); if (pdf == null) continue; PdfReader pdfReader = new PdfReader(pdf); readers.add(pdfReader); totalPages += pdfReader.getNumberOfPages(); } //clear this pdfStreams = null; //WeakReference ref = new WeakReference(pdfs); //ref.clear(); // Create a writer for the outputstream PdfWriter writer = PdfWriter.getInstance(document, mergedPdfStream); writer.setFullCompression(); document.open(); BaseFont bf = BaseFont.createFont(BaseFont.HELVETICA, BaseFont.CP1252, BaseFont.NOT_EMBEDDED); PdfContentByte cb = writer.getDirectContent(); // Holds the PDF // data PdfImportedPage page; int currentPageNumber = 0; int pageOfCurrentReaderPDF = 0; Iterator<PdfReader> iteratorPDFReader = readers.iterator(); // Loop through the PDF files and add to the output. while (iteratorPDFReader.hasNext()) { PdfReader pdfReader = iteratorPDFReader.next(); // Create a new page in the target for each source page. while (pageOfCurrentReaderPDF < pdfReader.getNumberOfPages()) { pageOfCurrentReaderPDF++; document.setPageSize(pdfReader .getPageSizeWithRotation(pageOfCurrentReaderPDF)); document.newPage(); // pageOfCurrentReaderPDF++; currentPageNumber++; page = writer.getImportedPage(pdfReader, pageOfCurrentReaderPDF); cb.addTemplate(page, 0, 0); // Code for pagination. if (paginate) { cb.beginText(); cb.setFontAndSize(bf, 9); cb.showTextAligned(PdfContentByte.ALIGN_CENTER, "" + currentPageNumber + " of " + totalPages, 520, 5, 0); cb.endText(); } } pageOfCurrentReaderPDF = 0; System.out.println("now the size is: "+pdfReader.getFileLength()); } mergedPdfStream.flush(); document.close(); mergedPdfStream.close(); return mergedPdfStream; } catch (Exception e) { e.printStackTrace(); } finally { if (document.isOpen()) document.close(); try { if (mergedPdfStream != null) mergedPdfStream.close(); } catch (IOException ioe) { ioe.printStackTrace(); } } return mergedPdfStream; } Thanks V

Read the article

Jasper report: disable pdf toolbar when generating a pdf.

- by pietervn

Hi All Is it possible to disable the pdf toolbar when generating a pdf from a Jasper Report? Thank you in advance P

Read the article

Storing and retrieving dynamically created pdf in sql

- by mwright

I have been playing around with creation of pdf documents for a project that I'm working on. I would like to store the generated pdf document in a SQL database and then later be able to retrieve this pdf as well. What are some suggestions for doing this? Can the document be stored in the database without physically creating the document on the server?

Read the article

XML to PDF best approach?

- by MMAmail.com

I have some xml files which are used to generate my webpages, however I need to be able to allow the user to select a number of pages then combine them into one PDF. This pdf needs to have different styling to the actual web page.(the content is kept in xml files ;) p.s. the pdf must have table of contents... and will include images taken from the website.

Read the article

have PDF form, need to port to website

- by Alex

So here is what I have: a PDF form (job application) that a client is requesting to put on their website as a form and the data gets sent to them when an applicant on the site fills the form out. My idea is as follows: dissecting the PDF, taking its fields and making the HTML form, then processing on the server side, creating the new PDF and emailing as an attachment to the client. However, something tells me that there is a better, more effective way of doing it. Is that so?

Read the article

Convert PDF to Word offline?

- by Mrgreen

Is there any way to convert a PDF to Word document via code? I'm aware of several online sites that will do it however we cannot use them due to security concerns. Opening the PDF in Adobe, copying all of the text and pasting into Word will not work as all of the text ends up jumbled around the place. Is there any kind of utility that might accomplish converting PDF to Word (or rtf)?

Read the article

pdf external streams in Max OS X Preview

- by olpa

According to the specification, a part of a PDF document can reside in an external file. An example for an image: 2 0 obj << /Type /XObject /Subtype /Image /Width 117 /Height 117 /BitsPerComponent 8 /Length 0 /ColorSpace /DeviceRGB /FFilter /DCTDecode /F (pinguine.jpg) >> stream endstream endobj I found that this functionality does work in Adobe Acrobat 5.0 for Windows (sample PDF with the image), also I managed to view this file in Adobe Acrobat Reader 8.1.3 for Mac OS X after I found the setting "Allow external content". Unfortunately, it seems that non-Adobe tools ignore the external stream feature. I hope I'm wrong, therefore ask the question: How to enable external streams in Mac OS X? (I think that all the system Mac OS X tools use the same library, therefore say "Mac OS X" instead of "Preview".) Or maybe there could be a programming hook to emulate external streams? My task is: store a big set of images (total ˜300Mb) outside of a small PDF (˜1Mb). At some moment, I want to filter PDF through a quartz filter and get a PDF with the images embedded. Any suggestions are welcome.

Read the article

Edit PDF files dynamically from Flash or Flex

- by TandemAdam

I am planning to do a CD-ROM in either Flash or Flex, possibly using the Adobe AIR runtime. This CD interactive will have a bunch of forms on it for the user to fill out. After they fill in a form, they will have the option of saving or printing a PDF that is based on there information. I am trying to find a way of editing the content of the PDF in Flash, so when the user fills out the form, the application will fill in the PDF with there details from the form fields. Is this is possible? It would be great if there was some way of having template PDFs (either on the CD as there own files, or in a Flash library), then flash could come along and fill in the specific fields inside the PDF. Can Adobe AIR help me in any way here?

Read the article

PDF Thumbnail display

- by yvartan

How to display PDF thumbnails in ASP page

Read the article

Are there any PDF libraries for J2ME?

- by ovolko

We need to create a basic PDF reader running on J2ME. While there are several PDF libraries for Java, I'm not sure whether they support J2ME. Does anyone know a working J2ME PDF library? If not, why it's so hard to make it?

Read the article

Storing and retrieving dynamically created pdf in sql using c#

- by mwright

I have been playing around with creation of pdf documents for a project that I'm working on. I would like to store the generated pdf document in a SQL database and then later be able to retrieve this pdf as well. What are some suggestions for doing this? Can the document be stored in the database without physically creating the document on the server?

Read the article

Convert PDF File to HTML in C#

- by Jepe d Hepe

I had a problem highlighting text in a pdf file embedded in webbrowser control and highlighting text using PDFLibNet.pdfwrapper so i'm shifting to another process where i'll just convert the pdf to html so i can manipulate the source code to highlight text. How can i convert pdf files to html files? Is there a better way? Thanks, Jepe

Read the article

PDF to LaTex Linux

- by Mawnster

I know how make a pdf from LaTex. Is there a way to extract the LaTex from a PDF I created earlier? How about if someone sends me a PDF and I like the formatting. Can I extract the LaTex from it?

Read the article

detect pdf tampering

- by sean717

Hi, the web app I am currently working on generates a PDF file and sends to user who will use this PDF as a certificate. My question is, how to make sure that this PDF file is impossible to be tampered by the user? Thanks,

Read the article

Ccnvert PDF File to HTML in C#

- by Jepe d Hepe

I had a problem highlighting text in a pdf file embedded in webbrowser control and highlighting text using PDFLibNet.pdfwrapper so i'm shifting to another process where i'll just convert the pdf to html so i can manipulate the source code to highlight text. How can i convert pdf files to html files? Is there a better way? Thanks, Jepe

Read the article

Getting all pdf files from a domain (for example *.adomain.com)

- by Zack

I need to download all pdf files from a certain domain. There are about 6000 pdf on that domain and most of them don't have an html link (either they have removed the link or they never put one in the first place). I know there are about 6000 files because I'm googling: filetype:pdf site:*.adomain.com However, Google lists only the first 1000 results. I believe there are two ways to achieve this: a) Use Google. However, how I can get all 6000 results from Google? Maybe a scraper? (tried scroogle, no luck) b) Skip Google and search directly on domain for pdf files. How do I do that when most them are not linked?

Read the article

How do I determine if Android can handle PDF

- by jasonshah

Hi all, I know Android cannot handle PDFs natively. However, the Nexus One (and possibly other phones) come pre-installed with QuickOffice Viewer. How would I determine whether the user has a PDF viewer installed? Currently, the code to start the PDF download looks pretty simple: Intent intent = new Intent(Intent.ACTION_VIEW); intent.setData(Uri.parse(url)); startActivity(intent); After download, the user clicks on the downloaded file to invoke the viewer. However, if there is no PDF viewer, Android reports "Cannot download. The content is not supported on the phone." I want to determine if the user will get this message, and if so, direct them to PDF apps in the Android Market. Thanks!

Read the article

Generating PDF results in single page only?

- by A T

Generating a PDF from an email (Zurb Ink templated); but am always presented with a single page PDF. Runnable test-case: from weasyprint import HTML, CSS from urllib2 import urlopen if __name__ == '__main__': html = urlopen('http://zurb.com/ink/downloads/templates/basic.html').read() html = html.replace('<p class=\"lead\">', '{0}<p class=\"lead\">'.format( '<p class=\"lead\">{0}</p>'.format("foobar " * 50) * 50)) HTML(string=html).write_pdf('foo.pdf', stylesheets=[ CSS(string='@page { size: A4; margin: 2cm };' '* { float: none !important; };' '@media print { nav { display: none; } }') ]) How do I get a multi-page PDF?

Read the article

Generate a pdf thumbnail (open source/free)

- by AndrewB

Looking at other posts for this could not find an adequate solution that for my needs. Trying to just get the first page of a pdf document as a thumbnail. This is to be run as a server application so would not want to write out a pdf document to file to then call a third application that reads the pdf to generate the image on disk. doc = new PDFdocument("some.pdf"); page = doc.page(1); Image image = page.image; Thanks.

Read the article

Opening PDF String in new window with javascript

- by DaveC

Hello, I have a formatted PDF string that looks like %PDF-1.73 0 obj<<< /Type /Group /S /Transparency /CS /DeviceRGB >> /Resources 2 0 R/Contents 4 0 R>> endobj4 0 obj<> streamx??R=o?0??+??=|vL?R???l?-???,???Ge?JK????{???Y5?????Z?k?vf?a??`G????Asf?z????`%??aI#?!;?t???GD?!???<?????B?b?? ... 00000 n 0000000703 00000 n 0000000820 00000 n 0000000926 00000 n 0000001206 00000 n 0000001649 00000 n trailer << /Size 11 /Root 10 0 R /Info 9 0 R >>startxref2015%%EOF I am trying to open up this string in a new window as a PDF file. Whenever I use window.open() and write the string to the new tab it thinks that the text should be the contents of an HTML document. I want it to recognize that this is a PDF file. Any help is much appreciated

Search Results

Search found 4479 results on 180 pages for 'pdf scraping'.

Page 18/180 | < Previous Page | 14 15 16 17 18 19 20 21 22 23 24 25 | Next Page >

- by Joelio

- by Israfel

- by Miroslav Bajtoš

- by Augusto Picciani

- by kofucii

- by Vijay

- by pietervn

- by mwright

- by MMAmail.com

- by Alex

- by Mrgreen

- by olpa

- by TandemAdam

- by yvartan

- by ovolko

- by mwright

- by Jepe d Hepe

- by Mawnster

- by sean717

- by Jepe d Hepe

- by Zack

- by jasonshah

- by A T

- by AndrewB

- by DaveC

< Previous Page | 14 15 16 17 18 19 20 21 22 23 24 25 | Next Page >