Search Results

Search found 4479 results on 180 pages for 'pdf scraping'.

Page 34/180 | < Previous Page | 30 31 32 33 34 35 36 37 38 39 40 41 | Next Page >

Screen scraping C application without using OCR or DOM?

- by Mrgreen

We have a legacy system that is essentially a glorified telnet interface. We cannot use an alternative telnet client program to connect to the system since there are special features built into the client software they have provided us. I want to be able to screen scrape from this program, however that's proving very difficult. I have tried using WindowSpy and Spy++ to check the window text and it comes up blank. It's a custom C program written by the vendor (they have even disabled selecting text). I'm really looking for a free option and something I may perhaps be able to use in conjuction with a scripting language. It seems the only ways to grab text is directly from the Windows GDI or from memory, but that seems a little extreme. Can anyone recommend any software/DLLs that might be able to accomplish this? I'd be extremely appreciative.

Read the article
What's the best way to write a maintainable web scraping app?

- by Benj

I wrote a perl script a while ago which logged into my online banking and emailed me my balance and a mini-statement every day. I found it very useful for keeping track of my finances. The only problem is that I wrote it just using perl and curl and it was quite complicated and hard to maintain. After a few instances of my bank changing their webpage I got fed up of debugging it to keep it up to date. So what's the best way of writing such a program in such a way that it's easy to maintain? I'd like to write a nice well engineered version in either Perl or Java which will be easy to update when the bank inevitably fiddle with their web site.

Read the article
How to shift PDF page using perl (CAM::PDF, PDF::API2)?

- by est

Hi all, I have a PDF document which I need to shift the pages to the right several inches. I.e like putting a margin on the left hand side of the page. Can either CAM::PDF or PDF::API2 do it? Or is there anyone have experience with it? Thanks.

Read the article
Is there a free PDF printer / distiller that creates signable documents?

- by Coderer

I've used various methods (mentioned elsewhere on this site) to create PDFs, using a printer driver or converting from PostScript, etc. The common problem is that if I open any of the output files in the newer versions of Adobe Reader, there's an option to "Place Signature" but it's greyed out, or gives an error message that the feature has been disabled for this document. As far as I can tell, there's an option set somewhere in the document metadata that tells Reader "allow the user to sign this document", or don't. None of the free/open source tools that have been been linked to in other SU posts have had this listed as an option (though to be fair I haven't actually downloaded and tried all of them). Is there a tool that does this? Can I just poke a bit with a hex editor somewhere to turn on this functionality? I can sometimes get access to Acrobat Professional to turn on this option, but doing it for every desired case would be more work than I care to do. The current workaround for single-page documents is: Print the document to PDF (possibly via postscript) Open a single-page blank PDF with the "signable" bit turned on in Reader create a custom "stamp" using the Reader markup tools, by importing the printed-to document "stamp" an image of the printed document on the blank page, hoping to get it centered about right place a signature over the document-but-not-really you just stamped This obviously does not scale well at all. It would be much better if I could: Print the document to PDF Drag the document to a simple shortcut / tool / whatever Open the document in Reader Place a signature in the document ETA: Sorry, maybe I should have been clearer -- I'm talking about the certificate-based digital signing available in Adobe Reader, not adding a virtual ink signature. Also, any solution really would have to be available offline.

Read the article
How to detect if a file is PDF or TIFF ?

- by eviljack

Please bear with me as I've been thrown into the middle of this project without knowing all the background. If you've got WTF questions, trust me, I have them too. Here is the scenario: I've got a bunch of files residing on an IIS server. They have no file extension on them. Just naked files with names like "asda-2342-sd3rs-asd24-ut57" and so on. Nothing intuitive. The problem is I need to serve up files on an ASP.NET (2.0) page and display the tiff files as tiff and the PDF files as PDF. Unfortunately I don't know which is which and I need to be able to display them appropriately in their respective formats. For example, lets say that there are 2 files I need to display, one is tiff and one is PDF. The page should show up with a tiff image, and perhaps a link that would open up the PDF in a new tab/window. The problem: As these files are all extension-less I had to force IIS to just serve everything up as TIFF. But if I do this, the PDF files won't display. I could change IIS to force the MIME type to be PDF for unknown file extensions but I'd have the reverse problem. http://support.microsoft.com/kb/326965 Is this problem easier than I think or is it as nasty as I am expecting?

Read the article
Debugging iFilter plug-in (PDF indexing)

- by Trevor Sullivan

I have the official Adobe x64 iFilter PDF plug-in and the FoxIt Software iFilter PDF plug-in installed, and neither one seems to be allowing me to index the contents of PDF files. So far, I've: Added my data folder into the Indexing service configuration Ensured that PDF files are configured to index "file properties and contents" Rebuilt the index from scratch But, when I search, I can only search for PDF file names, not the contents of them. Any ideas on how to debug this issue?

Read the article
IText can't keep rows together, second row spans multiple pages but won't stick with first row.

- by J2SE31

I am having trouble keeping my first and second rows of my main PDFPTable together using IText. My first row consists of a PDFPTable with some basic search criteria. My second row consists of a PdfPTable that contains all of the tabulated results. Everytime the tabulated results becomes too big and spans multiple pages, it is kicked to the second page automatically rather than showing up directly below the search criteria and then paging to the next page. How can I avoid this problem? I have tried using setSplitRows(false), but I simply get a blank document (see commented lines 117 and 170). How can I keep my tabulated data (second row) up on the first page? An example of my code is shown below (you should be able to just copy/paste). public class TestHelper{ private TestEventHelper helper; public TestHelper(){ super(); helper = new TestEventHelper(); } public TestEventHelper getHelper() { return helper; } public void setHelper(TestEventHelper helper) { this.helper = helper; } public static void main(String[] args){ TestHelper test = new TestHelper(); TestEventHelper helper = test.getHelper(); FileOutputStream file = null; Document document = null; PdfWriter writer = null; try { file = new FileOutputStream(new File("C://Documents and Settings//All Users//Desktop//pdffile2.pdf")); document = new Document(PageSize.A4.rotate(), 36, 36, 36, 36); writer = PdfWriter.getInstance(document, file); // writer.setPageEvent(templateHelper); writer.setPdfVersion(PdfWriter.PDF_VERSION_1_7); writer.setUserunit(1f); document.open(); List<Element> pages = null; try { pages = helper.createTemplate(); } catch (Exception e) { e.printStackTrace(); } Iterator<Element> iterator = pages.iterator(); while (iterator.hasNext()) { Element element = iterator.next(); if (element instanceof Phrase) { document.newPage(); } else { document.add(element); } } } catch (Exception de) { de.printStackTrace(); // log.debug("Exception " + de + " " + de.getMessage()); } finally { if (document != null) { document.close(); } if (writer != null) { writer.close(); } } System.out.println("Done!"); } private class TestEventHelper extends PdfPageEventHelper{ // The PdfTemplate that contains the total number of pages. protected PdfTemplate total; protected BaseFont helv; private static final float SMALL_MARGIN = 20f; private static final float MARGIN = 36f; private final Font font = new Font(Font.HELVETICA, 12, Font.BOLD); private final Font font2 = new Font(Font.HELVETICA, 10, Font.BOLD); private final Font smallFont = new Font(Font.HELVETICA, 10, Font.NORMAL); private String[] datatableHeaderFields = new String[]{"Header1", "Header2", "Header3", "Header4", "Header5", "Header6", "Header7", "Header8", "Header9"}; public TestEventHelper(){ super(); } public List<Element> createTemplate() throws Exception { List<Element> elementList = new ArrayList<Element>(); float[] tableWidths = new float[]{1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.25f, 1.25f, 1.25f, 1.25f}; // logger.debug("entering create reports template..."); PdfPTable splitTable = new PdfPTable(1); splitTable.setSplitRows(false); splitTable.setWidthPercentage(100f); PdfPTable pageTable = new PdfPTable(1); pageTable.setKeepTogether(true); pageTable.setWidthPercentage(100f); PdfPTable searchTable = generateSearchFields(); if(searchTable != null){ searchTable.setSpacingAfter(25f); } PdfPTable outlineTable = new PdfPTable(1); outlineTable.setKeepTogether(true); outlineTable.setWidthPercentage(100f); PdfPTable datatable = new PdfPTable(datatableHeaderFields.length); datatable.setKeepTogether(false); datatable.setWidths(tableWidths); generateDatatableHeader(datatable); for(int i = 0; i < 100; i++){ addCell(datatable, String.valueOf(i), 1, Rectangle.NO_BORDER, Element.ALIGN_CENTER, smallFont, true); addCell(datatable, String.valueOf(i+1), 1, Rectangle.NO_BORDER, Element.ALIGN_CENTER, smallFont, true); addCell(datatable, String.valueOf(i+2), 1, Rectangle.NO_BORDER, Element.ALIGN_CENTER, smallFont, true); addCell(datatable, String.valueOf(i+3), 1, Rectangle.NO_BORDER, Element.ALIGN_CENTER, smallFont, true); addCell(datatable, String.valueOf(i+4), 1, Rectangle.NO_BORDER, Element.ALIGN_CENTER, smallFont, true); addCell(datatable, String.valueOf(i+5), 1, Rectangle.NO_BORDER, Element.ALIGN_CENTER, smallFont, true); addCell(datatable, String.valueOf(i+6), 1, Rectangle.NO_BORDER, Element.ALIGN_CENTER, smallFont, true); addCell(datatable, String.valueOf(i+7), 1, Rectangle.NO_BORDER, Element.ALIGN_CENTER, smallFont, true); addCell(datatable, String.valueOf(i+8), 1, Rectangle.NO_BORDER, Element.ALIGN_RIGHT, smallFont, true); } PdfPCell dataCell = new PdfPCell(datatable); dataCell.setBorder(Rectangle.BOX); outlineTable.addCell(dataCell); PdfPCell searchCell = new PdfPCell(searchTable); searchCell.setVerticalAlignment(Element.ALIGN_TOP); PdfPCell outlineCell = new PdfPCell(outlineTable); outlineCell.setVerticalAlignment(Element.ALIGN_TOP); addCell(pageTable, searchCell, 1, Rectangle.NO_BORDER, Element.ALIGN_LEFT, null, null); addCell(pageTable, outlineCell, 1, Rectangle.NO_BORDER, Element.ALIGN_CENTER, null, null); PdfPCell pageCell = new PdfPCell(pageTable); pageCell.setVerticalAlignment(Element.ALIGN_TOP); addCell(splitTable, pageCell, 1, Rectangle.NO_BORDER, Element.ALIGN_CENTER, null, null); elementList.add(pageTable); // elementList.add(splitTable); return elementList; } public void onOpenDocument(PdfWriter writer, Document document) { total = writer.getDirectContent().createTemplate(100, 100); total.setBoundingBox(new Rectangle(-20, -20, 100, 100)); try { helv = BaseFont.createFont(BaseFont.HELVETICA, BaseFont.WINANSI, BaseFont.NOT_EMBEDDED); } catch (Exception e) { throw new ExceptionConverter(e); } } public void onEndPage(PdfWriter writer, Document document) { //TODO } public void onCloseDocument(PdfWriter writer, Document document) { total.beginText(); total.setFontAndSize(helv, 10); total.setTextMatrix(0, 0); total.showText(String.valueOf(writer.getPageNumber() - 1)); total.endText(); } private PdfPTable generateSearchFields(){ PdfPTable searchTable = new PdfPTable(2); for(int i = 0; i < 6; i++){ addCell(searchTable, "Search Key" +i, 1, Rectangle.NO_BORDER, Element.ALIGN_RIGHT, font2, MARGIN, true); addCell(searchTable, "Search Value +i", 1, Rectangle.NO_BORDER, Element.ALIGN_LEFT, smallFont, null, true); } return searchTable; } private void generateDatatableHeader(PdfPTable datatable) { if (datatableHeaderFields != null && datatableHeaderFields.length != 0) { for (int i = 0; i < datatableHeaderFields.length; i++) { addCell(datatable, datatableHeaderFields[i], 1, Rectangle.BOX, Element.ALIGN_CENTER, font2); } } } private PdfPCell addCell(PdfPTable table, String cellContent, int colspan, int cellBorder, int horizontalAlignment, Font font) { return addCell(table, cellContent, colspan, cellBorder, horizontalAlignment, font, null, null); } private PdfPCell addCell(PdfPTable table, String cellContent, int colspan, int cellBorder, int horizontalAlignment, Font font, Boolean noWrap) { return addCell(table, cellContent, colspan, cellBorder, horizontalAlignment, font, null, noWrap); } private PdfPCell addCell(PdfPTable table, String cellContent, Integer colspan, Integer cellBorder, Integer horizontalAlignment, Font font, Float paddingLeft, Boolean noWrap) { PdfPCell cell = new PdfPCell(new Phrase(cellContent, font)); return addCell(table, cell, colspan, cellBorder, horizontalAlignment, paddingLeft, noWrap); } private PdfPCell addCell(PdfPTable table, PdfPCell cell, int colspan, int cellBorder, int horizontalAlignment, Float paddingLeft, Boolean noWrap) { cell.setColspan(colspan); cell.setBorder(cellBorder); cell.setHorizontalAlignment(horizontalAlignment); if(paddingLeft != null){ cell.setPaddingLeft(paddingLeft); } if(noWrap != null){ cell.setNoWrap(noWrap); } table.addCell(cell); return cell; } } }

Read the article
Best option for PDF viewer embedded in web app

- by RationalGeek

I have a web app that needs to be able to display a PDF. It needs to allow the user to page through the PDF, and my application needs to be able to know which page is currently being viewed, because other aspects of the web app will change based on the current page. Ideally it would not be dependent on the client having Adobe Reader but I could probably support that dependency. What are my best options for this? My application stack consists of ASP.NET 4 along with optionally Silverlight 5. Also, I could use something that is client-side based as well using JavaScript / HTML if such a thing exists. I found ComponentOne's offering for this and that seems like the leading candidate at this point, but I want to know if there are other options I should consider. Edit: Per Fosco's comment, converting the PDF to another format (such as HTML) might be an option, as long as I could tie back parts of the converted document to the original PDF page #s. Another note: this has to run entirely on our servers. It would not be acceptable to use a third-party service to view the PDFs.

Read the article
How can I instruct nautilus to pre-generate thumbnails?

- by Glutanimate

I have a large library of PDF documents (papers, lectures, handouts) that I want to be able to quickly navigate through. For that I need thumbnails. At the same time however, I see that the ~/.thumbnails folder is piling up with thumbs I don't really need. Deleting thumbnail junk without removing the important thumbs is impossible. If I were to delete them, I'd have to go to each and every folder with important PDF documents and let the thumbnail cache regenerate. I would love to able to automate this process. Is there any way I can tell nautilus to pre-cache the thumbs for a set of given directories? Note: I did find a set of bash scripts that appear to do this for pictures and videos, but not for any other documents. Maybe someone more experienced with scripting might be able to adjust these for PDF documents or at least point me in the right direction on what I'd have to modify for this to work with PDF documents as well.

Read the article
Any way to loop through FPDF code with proper XY coordinates?

- by JM4

At the end of a form collection, I provide the consumer a printable PDF with the information they just entered. I already run through a loop to store the variables themselves but am wondering if it is at all possible to build a loop that builds on itself for FPDF. The catch is this, each new variable (#1, #2, #3) will change location by a determined amount of space. For example: I print the Member #1 First name at coordinate at coordinate (95, 101). I print Member #2 First name at coordinate (95, 110)... and so on. Each known variable will be 9.5mm greater than its previous entry (therefor Member #9 will be 40mm higher than Member 6) My sample code for the FPDF itself is: $pdf->SetFont('Arial','', 7); $pdf->SetXY(8,76.5); $pdf->Cell(20,0,$f1name); $pdf->SetFont('Arial','', 5); $pdf->SetXY(50.5,76.5); $pdf->Cell(20,0,$f1address); $pdf->SetFont('Arial','', 7); $pdf->SetXY(95.7,76.5); $pdf->Cell(20,0,$f1city); $pdf->SetXY(129.5,76.5); $pdf->Cell(20,0,$f1state); $pdf->SetXY(139.1,76.5); $pdf->Cell(20,0,$f1zip); $pdf->SetXY(151,76.5); $pdf->Cell(20,0,$f1dob); $pdf->SetXY(168,76.5); $pdf->Cell(20,0,$f1ssn); $pdf->SetXY(186,76.5); $pdf->Cell(20,0,$f1phone); $pdf->SetXY(55,81.1); $pdf->Cell(20,0,$f1email); $pdf->SetXY(129,81.1); $pdf->Cell(20,0,$f1fednum); Ideally, all Y variables with $f2 would be 9.5mm greater than f1's Y values.

Read the article
Official Ubuntu 10.10 Manual Now Available [Free PDF Download]

- by Asian Angel

Do you know someone who is still learning about Ubuntu or is considering trying it out for the first time? Then here is the perfect book to help get them on their way. The Ubuntu Manual Team has recently completed and made available for download their comprehensive 158 page guide on the Ubuntu 10.10 release. If you would like to purchase a regular print copy of the manual click on the left side of the screen (Star Button). For the free PDF version use the right side of the screen (Download Now Button). Download the Getting Started With Ubuntu 10.10 PDF Manual [via Softpedia] Bonus You can also download PDF copies of the manual for Ubuntu 10.04 (First and Second Editions) on the alternate downloads page! Ubuntu Manual Project Alternate Downloads How To Easily Access Your Home Network From Anywhere With DDNSHow To Recover After Your Email Password Is CompromisedHow to Clean Your Filthy Keyboard in the Dishwasher (Without Ruining it)

Read the article
How can you invert the colors of a PDF?

- by legr3c

I need to invert all the colors of a PDF document (background, text, graphics, and images). I want it persistent in the file so the inverted viewing options, that some viewers offer, won't help. Rasterizing the document and using image manipulation software is also not an option. I read somewhere that this can be done with the Enfocus PitStop plugin for Acrobat. However I didn't see a corresponding command anywhere. Am I missing something? Then I read that the ARTS PDF Crackerjack plugin for Acrobat offers negative printing so I tried that, too. The option is there but it simply doesn't work. I have been searching for very long for a way to do this. It seems like a common enough task but I just can't find out how to do it. Are there maybe any virtual printer drivers or something of the sort that support negative printing? Can anyone help?

Read the article
How to print document to PDF on Windows 7?

- by lamcro

Is there any open-sourced software that allows me to print directly to PDF from Windows7?

Read the article
Is there a way to bundle pdf tiles to a Kindle friendly file?

- by Maciej Swic

I'm downloading PDF approach plates from Navigraph, and i have a folder per airport with files named after their corresponding approach / departure etc. Now I'd like to take such a folder with a bunch of PDF files, automatically generate an index and combine them to a single .mobi file that i can send to my Kindle. The index created can be very simple and consist of the file name (without the extension). Tapping an index item should jump to the correct page for that chart. I know there is a host of apps that combine comic book jpg's to ebooks, but is there anything that does the above please?

Read the article
How can one convert a Word form to a PDF form while preserving fields?

- by Ben Collins

I have a Word source document which I'm using to create a PDF form. The first go-round, everything is fine because I can let Acrobat Pro auto-create all the fields. That feature is actually pretty awesome. However, after spending a bunch of time adjusting field sizes and alignments and formats and so on, I want to edit the source document, and now I'm faced with the prospect of doing all that over again. Isn't there some way to add the fields in the source document using the Developer ribbon and have those fields be preserved in the conversion to PDF? If not, what other ways are there to avoid this kind of redundant effort?

Read the article
Annotation of a pdf file in my dropbox in ipad and keep it the last version in dropbox

- by Farshid

I have a folder in my dropbox that i keep my ebooks in it. I want to find an app in ipad that can do these to me: Let me open a pdf file from my dropbox Let me annotate on that file Annotation getting applied to the dropbox version of my file, instead of creating a local copy that its changes does not affect the dropbox version In my pc, when i open a pdf file from my dropbox and make Some highlights, when i press the save button in acrobat reader, the dropbox version is instantly gets updated and whenever i open my dropbox folder i have the latest version of the file. I need similar functionality in my ipad. What ipad app do you recommand for gaining this functionality?

Read the article
How to make PDF pages with U3D images print properly?

- by David Thornley

I'm creating some PDF files that have three-dimensional images (U3D) in them. When I bring them up in Acrobat, everything is fine until I try to print the file. If I've viewed a page, it prints fine, showing the U3D image as I last saw it. If I haven't viewed a page yet, the printout is blank. (I can demonstrate this right from the Print dialog, by previewing pages.) The only reference I've seen to printing U3D is in the PDF Standard, which says, basically, that this should work. It recommends "PV" or "PO" for the A key in the 3D activation dictionary, and I'm using libharu which uses "PV".

Read the article
From a big PDF file how to convert only particular pages to HTML using Adobe Acrobat?

- by Jitendra vyas

From a big PDF file how to convert only particular pages to HTML using Adobe Acrobat?

Read the article
How to copy text out of a PDF without losing formatting?

- by Colen

When I copy text out of a PDF file and into a text editor, it ends up mangled in a variety of ways. Formatting like bold and italics are lost; soft line breaks within a paragraph of text are converted to hard line breaks; dashes to break a word over two lines are preserved even when they shouldn't be; and single and double quotes are replaced with ? signs. Ideally, I'd like to be able to copy text from a PDF and have formatting converted to HTML codes, "smart quotes" converted to " and ', and line breaks done properly. Is there any way to do this?

Read the article
Is it possible to edit a PDF file directly?

- by rossmcm

I have a PDF file that is produced as part of a help file compilation. There is always late breaking stuff which goes into a text file (e.g. "What's new in this version" type of stuff) and while Help and Manual allows you to include stuff from a text file it only works for the CHM output and not for the PDF. I'm wondering if I can do it by generating a unique placeholder string instead and then using some tool (I may need to write one) to do a search and replace of that unique string with the contents of the late breaking info text file. Is this feasible? Or will it break some sort of internal structure?

Read the article
how to know which fonts are being used in PDF?

- by Jitendra vyas

how to know which fonts are being used in PDF?

Read the article
How to maintain original figure numbers in pdf document saved from Word 2007 file?

- by S_H

I have a Word 2007 document in which the Figure numbers and List of Figures is correct i.e. exactly as I want. I generate a pdf (Adobe Reader X, Version 10.1.2) from Word 2007 document using the save as option. The List of Figures in the pdf document comes out exactly as present in the Word document, however when I click on that Figure number I see a different number than present in List of Figures. For example, I have this Figure number on List of Figures which is exactly as I want: But the corresponding Figure on Page 61 shows following number: It is becoming 4-21 from 4-7 because the total number of Figures from Chapters 1 till this Figure are 20. However, I want break in number of Figures between each Chapter i.e. Figures for Chapter 4 should start from 4-1, so the number 4-21 is actually 4-7 in that case. How can I correct this? Thanks.

Read the article
iText PdfPTables, document.add with writeSelectedRows

- by J2SE31

I am currently modifying an existing system that generates reports with iText and java. The report template is as follows: Header1(PdfPTable) Header2(PdfPTable) Body(PdfPTable) I am currently using the writeSelectedRows to display Header1 and Header2, but document.add is used to display the Body. The problem is that the system is setup to write the headers AFTER the body has already been displayed on the screen, so I am displaying my headers on top of my body content. My question is how do I add my body table (using document.add) and have it display about halfway down the page (or any predetermined vertical position)? This way I would have sufficient room to display my headers above the body table. Note: I believe the body table is using document.add to facilitate automatic paging if the body content is too large.

Read the article
wkhtmltopdf - cannot convert local file

- by user522962

I just downloaded version 10.0 for opensuse v. 11.3. I can convert a webpage (ie www.google.com) using it but cannot convert a local file. I grant all permissions on the file (& i've even tried running under sudo to no avail). This is the error: "Loading pages (1/6) Error: Failed loading page file:///file.html". The file exists but wkhtmltopdf refuses to load it. I even tried version 9.9 w/ the same result What am I missing?

Read the article
TCPDF grey background for few cells?

- by Pawel Mysior

I'm using TCPDF in CakePHP and trying to make some background (grey) for few cells. Well here the idea: so the grey thing would somehow have to be define outside of the cells containg text. Any ideas? Paul

Read the article

< Previous Page | 30 31 32 33 34 35 36 37 38 39 40 41 | Next Page >