Search Results

Search found 4479 results on 180 pages for 'pdf scraping'.

Page 18/180 | < Previous Page | 14 15 16 17 18 19 20 21 22 23 24 25  | Next Page >

  • How to debug a corrupt pdf file?

    - by Joelio
    Hi, im generating pdf files using a ruby library called "prawn". I have one particular file that seems to be considered "Corrupt" by adobe reader. It shows up fine in both preview and in adobe reader. It gives errors like: Sometimes I get: "Could not find the XObject named '%s'. Othertimes I get: "Could not find the XObject named "Im4". Then always I get: "An error exists on this page. Acrobat may not display the page correctly. Please contact the person who created the PDF document to correct the problem." Is there a way to open a pdf with some tool and have it tell you what is technically wrong with the pdf? Im sure I could figure it out quickly with something like this... thanks Joel

    Read the article

  • JAVA - Download PDF file from Webserver

    - by Augusto Picciani
    I need to download a pdf file from a webserver to my pc and save it locally. I used Httpclient to connect to webserver and get the content body: HttpEntity entity=response.getEntity(); InputStream in=entity.getContent(); String stream = CharStreams.toString(new InputStreamReader(in)); int size=stream.length(); System.out.println("stringa html page LENGTH:"+stream.length()); System.out.println(stream); SaveToFile(stream); Then i save content in a file: //check CRLF (i don't know if i need to to this) String[] fix=stream.split("\r\n"); File file=new File("C:\\Users\\augusto\\Desktop\\progetti web\\test\\test2.pdf"); PrintWriter out = new PrintWriter(new FileWriter(file)); for (int i = 0; i < fix.length; i++) { out.print(fix[i]); out.print("\n"); } out.close(); I also tried to save a String content to file directly: OutputStream out=new FileOutputStream("pathPdfFile"); out.write(stream.getBytes()); out.close(); But the result is always the same: I can open pdf file but i can see white pages only. Does the mistake is around pdf stream and endstream charset encoding? Does pdf content between stream and endStream need to be manipulate in some others way?

    Read the article

  • Extract text from PDF (google app engine)

    - by Miroslav Bajtoš
    Is there any free Java library for extracting text from PDF, that is compatible with Google Application Engine? I've read about PDFJet, but it can't read PDF, can it? Is there perhaps other way how to extract text from PDF? I tried http://www.pdfdownload.org/, unfortunately they don't handle non-English characters correctly.

    Read the article

  • Display Pdf Stream on page load with Binary Write

    - by Israfel
    I have a pdf file being generated on the fly which I want to display inline on pageload, as below Response.Clear() Response.AppendHeader("Content-Disposition", "inline; filename=_Bulk_Print.pdf") Response.ContentType = "application/pdf" Response.BinaryWrite(docData) Response.End() If I put this in a click event it works perfectly but when on page load I Just get a blank aspx page despite the fact it's stepping through that code and the generation of the DocData no problem. Does anyone know the reason for this or a workaround, thanks for your help.

    Read the article

  • have PDF form, need to port to website

    - by Alex
    So here is what I have: a PDF form (job application) that a client is requesting to put on their website as a form and the data gets sent to them when an applicant on the site fills the form out. My idea is as follows: dissecting the PDF, taking its fields and making the HTML form, then processing on the server side, creating the new PDF and emailing as an attachment to the client. However, something tells me that there is a better, more effective way of doing it. Is that so?

    Read the article

  • OutOfMemoryError during the pdf merge

    - by Vijay
    the below code merges the pdf files and returns the combined pdf data. while this code runs, i try to combine the 100 files with each file approximately around 500kb, i get outofmemory error in the line document.close();. this code runs in the web environment, is the memory available to webspehere server is the problem? i read in an article to use freeReader method, but i cannot get how to use it my scenario. protected ByteArrayOutputStream joinPDFs(List<InputStream> pdfStreams, boolean paginate) { Document document = new Document(); ByteArrayOutputStream mergedPdfStream = new ByteArrayOutputStream(); try { //List<InputStream> pdfs = pdfStreams; List<PdfReader> readers = new ArrayList<PdfReader>(); int totalPages = 0; //Iterator<InputStream> iteratorPDFs = pdfs.iterator(); Iterator<InputStream> iteratorPDFs = pdfStreams.iterator(); // Create Readers for the pdfs. while (iteratorPDFs.hasNext()) { InputStream pdf = iteratorPDFs.next(); if (pdf == null) continue; PdfReader pdfReader = new PdfReader(pdf); readers.add(pdfReader); totalPages += pdfReader.getNumberOfPages(); } //clear this pdfStreams = null; //WeakReference ref = new WeakReference(pdfs); //ref.clear(); // Create a writer for the outputstream PdfWriter writer = PdfWriter.getInstance(document, mergedPdfStream); writer.setFullCompression(); document.open(); BaseFont bf = BaseFont.createFont(BaseFont.HELVETICA, BaseFont.CP1252, BaseFont.NOT_EMBEDDED); PdfContentByte cb = writer.getDirectContent(); // Holds the PDF // data PdfImportedPage page; int currentPageNumber = 0; int pageOfCurrentReaderPDF = 0; Iterator<PdfReader> iteratorPDFReader = readers.iterator(); // Loop through the PDF files and add to the output. while (iteratorPDFReader.hasNext()) { PdfReader pdfReader = iteratorPDFReader.next(); // Create a new page in the target for each source page. while (pageOfCurrentReaderPDF < pdfReader.getNumberOfPages()) { pageOfCurrentReaderPDF++; document.setPageSize(pdfReader .getPageSizeWithRotation(pageOfCurrentReaderPDF)); document.newPage(); // pageOfCurrentReaderPDF++; currentPageNumber++; page = writer.getImportedPage(pdfReader, pageOfCurrentReaderPDF); cb.addTemplate(page, 0, 0); // Code for pagination. if (paginate) { cb.beginText(); cb.setFontAndSize(bf, 9); cb.showTextAligned(PdfContentByte.ALIGN_CENTER, "" + currentPageNumber + " of " + totalPages, 520, 5, 0); cb.endText(); } } pageOfCurrentReaderPDF = 0; System.out.println("now the size is: "+pdfReader.getFileLength()); } mergedPdfStream.flush(); document.close(); mergedPdfStream.close(); return mergedPdfStream; } catch (Exception e) { e.printStackTrace(); } finally { if (document.isOpen()) document.close(); try { if (mergedPdfStream != null) mergedPdfStream.close(); } catch (IOException ioe) { ioe.printStackTrace(); } } return mergedPdfStream; } Thanks V

    Read the article

  • XML to PDF best approach?

    - by MMAmail.com
    I have some xml files which are used to generate my webpages, however I need to be able to allow the user to select a number of pages then combine them into one PDF. This pdf needs to have different styling to the actual web page.(the content is kept in xml files ;) p.s. the pdf must have table of contents... and will include images taken from the website.

    Read the article

  • Storing and retrieving dynamically created pdf in sql

    - by mwright
    I have been playing around with creation of pdf documents for a project that I'm working on. I would like to store the generated pdf document in a SQL database and then later be able to retrieve this pdf as well. What are some suggestions for doing this? Can the document be stored in the database without physically creating the document on the server?

    Read the article

  • Weird problem, with ghostscript and pdf files.

    - by kofucii
    Hello, am using ghostscript to create pdf file from postscript file. My PS file, doesn't have orientation instructions, so when I want to create landscape pdf file, I'm using ghostscript to rotate the page. The problem is, that ghostscript rotates only the first page, and when my pdf file is more than 1 page, the others, are not rotated correctly. Here is the command I'm using: cat $psinput | gs -sPAPERSIZE=a4 -sDEVICE=pdfwrite -sOuputFile="/tmp/pdf" \ -dAutoRotatePages="/None" -c "<< /Orientation 3 >> setpagedevice" \ 90 rotate 0 -595 translate -dNOPAUSE -dEPSCrop -f - -c -quit Does anybody have an idea how to correct this?

    Read the article

  • pdf external streams in Max OS X Preview

    - by olpa
    According to the specification, a part of a PDF document can reside in an external file. An example for an image: 2 0 obj << /Type /XObject /Subtype /Image /Width 117 /Height 117 /BitsPerComponent 8 /Length 0 /ColorSpace /DeviceRGB /FFilter /DCTDecode /F (pinguine.jpg) >> stream endstream endobj I found that this functionality does work in Adobe Acrobat 5.0 for Windows (sample PDF with the image), also I managed to view this file in Adobe Acrobat Reader 8.1.3 for Mac OS X after I found the setting "Allow external content". Unfortunately, it seems that non-Adobe tools ignore the external stream feature. I hope I'm wrong, therefore ask the question: How to enable external streams in Mac OS X? (I think that all the system Mac OS X tools use the same library, therefore say "Mac OS X" instead of "Preview".) Or maybe there could be a programming hook to emulate external streams? My task is: store a big set of images (total ˜300Mb) outside of a small PDF (˜1Mb). At some moment, I want to filter PDF through a quartz filter and get a PDF with the images embedded. Any suggestions are welcome.

    Read the article

  • Convert PDF to Word offline?

    - by Mrgreen
    Is there any way to convert a PDF to Word document via code? I'm aware of several online sites that will do it however we cannot use them due to security concerns. Opening the PDF in Adobe, copying all of the text and pasting into Word will not work as all of the text ends up jumbled around the place. Is there any kind of utility that might accomplish converting PDF to Word (or rtf)?

    Read the article

  • Edit PDF files dynamically from Flash or Flex

    - by TandemAdam
    I am planning to do a CD-ROM in either Flash or Flex, possibly using the Adobe AIR runtime. This CD interactive will have a bunch of forms on it for the user to fill out. After they fill in a form, they will have the option of saving or printing a PDF that is based on there information. I am trying to find a way of editing the content of the PDF in Flash, so when the user fills out the form, the application will fill in the PDF with there details from the form fields. Is this is possible? It would be great if there was some way of having template PDFs (either on the CD as there own files, or in a Flash library), then flash could come along and fill in the specific fields inside the PDF. Can Adobe AIR help me in any way here?

    Read the article

  • Convert PDF File to HTML in C#

    - by Jepe d Hepe
    I had a problem highlighting text in a pdf file embedded in webbrowser control and highlighting text using PDFLibNet.pdfwrapper so i'm shifting to another process where i'll just convert the pdf to html so i can manipulate the source code to highlight text. How can i convert pdf files to html files? Is there a better way? Thanks, Jepe

    Read the article

  • Storing and retrieving dynamically created pdf in sql using c#

    - by mwright
    I have been playing around with creation of pdf documents for a project that I'm working on. I would like to store the generated pdf document in a SQL database and then later be able to retrieve this pdf as well. What are some suggestions for doing this? Can the document be stored in the database without physically creating the document on the server?

    Read the article

  • PDF to LaTex Linux

    - by Mawnster
    I know how make a pdf from LaTex. Is there a way to extract the LaTex from a PDF I created earlier? How about if someone sends me a PDF and I like the formatting. Can I extract the LaTex from it?

    Read the article

  • detect pdf tampering

    - by sean717
    Hi, the web app I am currently working on generates a PDF file and sends to user who will use this PDF as a certificate. My question is, how to make sure that this PDF file is impossible to be tampered by the user? Thanks,

    Read the article

  • Ccnvert PDF File to HTML in C#

    - by Jepe d Hepe
    I had a problem highlighting text in a pdf file embedded in webbrowser control and highlighting text using PDFLibNet.pdfwrapper so i'm shifting to another process where i'll just convert the pdf to html so i can manipulate the source code to highlight text. How can i convert pdf files to html files? Is there a better way? Thanks, Jepe

    Read the article

  • Getting all pdf files from a domain (for example *.adomain.com)

    - by Zack
    I need to download all pdf files from a certain domain. There are about 6000 pdf on that domain and most of them don't have an html link (either they have removed the link or they never put one in the first place). I know there are about 6000 files because I'm googling: filetype:pdf site:*.adomain.com However, Google lists only the first 1000 results. I believe there are two ways to achieve this: a) Use Google. However, how I can get all 6000 results from Google? Maybe a scraper? (tried scroogle, no luck) b) Skip Google and search directly on domain for pdf files. How do I do that when most them are not linked?

    Read the article

  • How do I determine if Android can handle PDF

    - by jasonshah
    Hi all, I know Android cannot handle PDFs natively. However, the Nexus One (and possibly other phones) come pre-installed with QuickOffice Viewer. How would I determine whether the user has a PDF viewer installed? Currently, the code to start the PDF download looks pretty simple: Intent intent = new Intent(Intent.ACTION_VIEW); intent.setData(Uri.parse(url)); startActivity(intent); After download, the user clicks on the downloaded file to invoke the viewer. However, if there is no PDF viewer, Android reports "Cannot download. The content is not supported on the phone." I want to determine if the user will get this message, and if so, direct them to PDF apps in the Android Market. Thanks!

    Read the article

  • Generating PDF results in single page only?

    - by A T
    Generating a PDF from an email (Zurb Ink templated); but am always presented with a single page PDF. Runnable test-case: from weasyprint import HTML, CSS from urllib2 import urlopen if __name__ == '__main__': html = urlopen('http://zurb.com/ink/downloads/templates/basic.html').read() html = html.replace('<p class=\"lead\">', '{0}<p class=\"lead\">'.format( '<p class=\"lead\">{0}</p>'.format("foobar " * 50) * 50)) HTML(string=html).write_pdf('foo.pdf', stylesheets=[ CSS(string='@page { size: A4; margin: 2cm };' '* { float: none !important; };' '@media print { nav { display: none; } }') ]) How do I get a multi-page PDF?

    Read the article

  • Generate a pdf thumbnail (open source/free)

    - by AndrewB
    Looking at other posts for this could not find an adequate solution that for my needs. Trying to just get the first page of a pdf document as a thumbnail. This is to be run as a server application so would not want to write out a pdf document to file to then call a third application that reads the pdf to generate the image on disk. doc = new PDFdocument("some.pdf"); page = doc.page(1); Image image = page.image; Thanks.

    Read the article

  • Writing PDF reader Library

    - by Stefano
    I have searched for PDF reader library that is licenced under LGPL or the like but could not find. I found only GPLs. Now I need a help to write my own library to read the PDF file and display it in my app. I have downloaded PDF Specs 1.7 from Adobe and I'm trying to search out a beginner tutorial but I'm yet to find one. Is there a beginner tutorial for writing my own reader library (only reader)? Thanks

    Read the article

< Previous Page | 14 15 16 17 18 19 20 21 22 23 24 25  | Next Page >