pdf scraping - Page 26 - Developer IT

Looking for a recommendation of a good tutorial on best practices for a web scraping project?

- by in bruges

I need to do a fairly extensive project involving web scraping and am considering using Hpricot or Beautiful Soup (i.e. Ruby or Python). Has anyone come across a tutorial that they thought was particularly good on this subject that would help me start the project off on the right foot?

Read the article

Which type of file parsing easiest and efficient and good ?(html,pdf,csv,text)

- by Harikrishna

I want to parse the html file, pdf file, csv file and text file. Now parsing for which type of file (specified above) is easiest and efficient ? Like parsing for html file is easiest and efficient OR parsing for pdf file is easiest and efficient OR parsing for csv file is easiest and efficient ? I am asking this question because I want to parse pdf ,html ,csv and text file through common parsing code if possible. And now suppose if parsing for html is easiest and efficient then : I will write the parsing code for html file and will try to convert pdf file to the html file(if possible)so the code written for parsing html file will also work for pdf file also. And thus I will try to convert pdf,csv and text file to html file.And write the code for parsing html file and thus this code will parse html,pdf,csv and text file. Suppose if parsing for pdf is easiest and efficient then : I will convert html,csv and text file to pdf and write the code for parsing pdf file.So the code for parsing pdf file can parse html,csv and text file. So my question is (1) Which type of file parsing is easiest and efficient (pdf,csv,html,text) ? (2) And converting files(pdf,text,html,csv) to eachother is possible. Like if html parsing easiest then pdf to html,text to html and csv to html.

Read the article

FPDI - SELECT WHICH PDFS TO SHOW

- by NORM

IS THERE A WAY FOR A OPTION TO SELECT WHICH PDFS TO SHOW WITH THE FPDI FUNCTION? THIS IS THE REGULAR CODE: $pdf-AddPage(); // set the sourcefile $pdf-setSourceFile('h.pdf'); // import page 1 $tplIdx = $pdf-importPage(1); // use the imported page and place it at point 10,10 with a width of 100 mm $pdf-useTemplate($tplIdx, 0, 0, 0); Is there a way to make this $pdf-setSourceFile('h.pdf'); a option for users who visit the website. For example: have - $pdf-setSourceFile('h.pdf'); & $pdf-setSourceFile('g.pdf'); - then let the visitor select which one to include in the pdf via fpdi. I would prefer something like a input. Any ideas?? or something similar??? Help is very much appreciated!! :D

Read the article

PDFtk Password Protection Help

- by Dave W.

I am using Ubuntu 11.10 and am looking for a solution to password protect a bunch of pdf files in a directory in batch. I came across PDFtk and it looks like it might do what I need, but I've reviewed the command line PDFtk examples and can't figure out if there is a way to do it in batch without having to individually specify the output file name for every file. I'm hoping a command-line guru can take a look at the PDFtk syntax and tell me if there is some trick / command that will allow me to password protect a directory of pdf files (e.g., *.pdf) and overwrite the existing files using the same name, or consistently rename the individual output files without having to specify each output name individually. Here's a link to the PDFtk command line examples page: http://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/ Thanks for your help. I think I've answered my own question. Here's a bash script that appears to do the trick. I'd welcome help evaluating why the code I've commented out doesn't work... #!/bin/bash # Created by Dave, 2012-02-23 # This script uses PDFtk to password protect every PDF file # in the directory specified. The script creates a directory named "protected_[DATE]" # to hold the password protected version of the files. # # I'm using the "user_pw" parameter, # which means no one will be able to open or view the file without # the password. # # PDFtk must be installed for this script to work. # # Usage: ./protect_with_pdftk.bsh [FILE(S)] # [FILE(S)] can use wildcard expansion (e.g., *.pdf) # This part isn't working.... ignore. The goal is to avoid errors if the # directory to be created already exists by only attempting to create # it if it doesn't exists # #TARGET_DIR="protected_$(date +%F)" #if [ -d "$TARGET_DIR" ] #then #echo # echo "$TARGET_DIR directory exists!" #else #echo # echo "$TARGET_DIR directory does not exist!" #fi # mkdir protected_$(date +%F) for i in *pdf ; do pdftk "$i" output "./protected_$(date +%F)/$i" user_pw [PASSWORD]; done echo "Complete. Output is in the directory: ./protected_$(date +%F)"

Read the article

PDF rendering crashes app Core Graphics

- by Felixyz

EDIT: The memory leaks turned out to be unrelated to the crashes. Leaks are fixed but crashes remain, still mysterious. My (iPhone) app does lots of PDF loading and rendering, some of it threaded. Sometime, it seems always after I flush a page cash after getting a memory warning, the app crashes with a bad access when trying to draw a pdf page stored in an NSData object. Here is one example trace: #0 0x3016d564 in CGPDFResourcesGetResource () #1 0x3016d58a in CGPDFResourcesGetResource () #2 0x3016d94e in CGPDFResourcesGetExtGState () #3 0x3015fac4 in CGPDFContentStreamGetExtGState () #4 0x301629a8 in op_gs () #5 0x3016df12 in handle_xname () #6 0x3016dd9e in read_objects () #7 0x3016de6c in CGPDFScannerScan () #8 0x30161e34 in CGPDFDrawingContextDraw () #9 0x3016a9dc in CGContextDrawPDFPage () But sometimes I get this instead: Program received signal: “EXC_BAD_ACCESS”. (gdb) bt #0 0x335625fa in objc_msgSend () #1 0x32c04eba in CFDictionaryGetValue () #2 0x3016d500 in get_value () #3 0x3016d5d6 in CGPDFResourcesGetFont () #4 0x3015fbb4 in CGPDFContentStreamGetFont () #5 0x30163480 in op_Tf () #6 0x3016df12 in handle_xname () #7 0x3016dd9e in read_objects () #8 0x3016de6c in CGPDFScannerScan () #9 0x30161e34 in CGPDFDrawingContextDraw () #10 0x3016a9dc in CGContextDrawPDFPage () Is this an indication that I've mistakenly deallocated an object? It's hard for me to decode what's happening here. This is how I create and retain the various objects involved: // Some data was just loaded from the network and is pointed to by "data" self.pdfData = data; _dataProviderRef = CGDataProviderCreateWithData( NULL, [_pdfData bytes], [_pdfData length], NULL ); _documentRef = CGPDFDocumentCreateWithProvider(_dataProviderRef); _pageRef = CGPDFDocumentGetPage(_documentRef, 1); CGPDFPageRetain(_pageRef); _pdfFrame = CGPDFPageGetBoxRect(_pageRef, kCGPDFArtBox); So the NSData object is retained, and I explicitly retain the page reference. The data provider and the document are already retained by the create-functions. And here is my dealloc method: -(void)dealloc { if (_pageRef) CGPDFPageRelease(_pageRef); if (_documentRef) CGPDFDocumentRelease(_documentRef); if (_dataProviderRef) CGDataProviderRelease(_dataProviderRef); self.pdfData = nil; [super dealloc]; } Am I doing anything wrong? Even an assurance that I'm not, with explanation, would be a help.

Read the article

Using Ghostscript in a Webapplication (PDF Thumbnails)

- by cpt.oneeye

Hello, i am using the ghostscriptsharp wrapper for c# and ghostscript. I want to generate thumbnails out of pdf-files. Further Information on the sample-code are given here. There are different Methods imported form the ghostscript-c-dll "gsdll32.dll". [DllImport("gsdll32.dll", EntryPoint = "gsapi_new_instance")] private static extern int CreateAPIInstance(out IntPtr pinstance, IntPtr caller_handle); [DllImport("gsdll32.dll", EntryPoint = "gsapi_init_with_args")] private static extern int InitAPI(IntPtr instance, int argc, IntPtr argv); //...and so on I am using the GhostscriptWrapper for generating the thumbnails in a webapplication (.net 2.0). This class uses the methods imported above. protected void Page_Load(object sender, EventArgs e){ GhostscriptWrapper.GeneratePageThumb("c:\\sample.pdf", "c:\\sample.jpg", 1, 100, 100); } When i debug the Web-Application in Visual Studio 2008 by hitting key "F5" it works fine (a new instance of webserver is generated). When i create a WindowsForm Application it works too. The thumbnails get generated. When i access the application with the webbrowser directly (http://localhoast/mywebappliation/..) it doesn't work. No thumbnails are generated. But there is also no exception thrown. I placed the gsdll32.dll in the system32-folder of windows xp. The Ghostscript Runtime is installed too. I have given full access in the IIS-Webproject (.Net 2.0). Does anybody know why i can't access Ghostscript from my webapplication? Are there any security-issues for accessing dll-files on the IIS-Server? Greetings Klaus

Read the article

PDF files created on iPad dont display correctly on Windows

- by user286028

My iPhone app creates PDF files (in Arial font). The plain iPhone 3.1.x version works great (other than the known issue that PDFs created on the iPhone cant be viewed correctly in Google Docs or the BlackBerry). As I am updating my project for OS 3.2 and the iPad, it works just the same, and the PDFs still look great on the iPhone, iPad and MacOS (Preview app). But now on Windows (Vista), Acrobat 9.3.1 says "Cannot extract the embedded font 'XYZABC+ArialMT'. Some characters may not display or print correctly". And in fact Acrobat then uses some generic font instead of Arial (or whatever other font I try). Quartz 3.2 seems to generate these "random" embedded font names each time it creates a PDF (the XYZABC changes around each time). I can't tell whether the problem is just the somewhat strange "temporary" embedded font name with the plus sign, or the way Quartz 3.2 is embedding fonts. I have tried my existing code (using CGPDFContext* funtions), and also the newly supported UIGraphics* functions, with the same results. Has anyone else tried creating PDFs on the iPad yet and gotten them to display correctly on Windows?

Read the article

Increase performance on iphone at pdf rendering

- by burki

Hi! I have a UITableView, and in every cell there's displayed a UIImage created from a pdf. But now the performance is very bad. Here's my code I use to generate the UIImage from the PDF. Creating CGPDFDocumentRef and UIImageView (in cellForRowAtIndexPath method): ... CFURLRef pdfURL = CFBundleCopyResourceURL(CFBundleGetMainBundle(), (CFStringRef)formula.icon, NULL, NULL); CGPDFDocumentRef documentRef = CGPDFDocumentCreateWithURL((CFURLRef)pdfURL); CFRelease(pdfURL); UIImageView *imageView = [[UIImageView alloc] initWithImage:[self imageFromPDFWithDocumentRef:documentRef]]; ... Generate UIImage: - (UIImage *)imageFromPDFWithDocumentRef:(CGPDFDocumentRef)documentRef { CGPDFPageRef pageRef = CGPDFDocumentGetPage(documentRef, 1); CGRect pageRect = CGPDFPageGetBoxRect(pageRef, kCGPDFCropBox); UIGraphicsBeginImageContext(pageRect.size); CGContextRef context = UIGraphicsGetCurrentContext(); CGContextTranslateCTM(context, CGRectGetMinX(pageRect),CGRectGetMaxY(pageRect)); CGContextScaleCTM(context, 1, -1); CGContextTranslateCTM(context, -(pageRect.origin.x), -(pageRect.origin.y)); CGContextDrawPDFPage(context, pageRef); UIImage *finalImage = UIGraphicsGetImageFromCurrentImageContext(); UIGraphicsEndImageContext(); return finalImage; } What can I do to increas the speed and keep the memory low?

Read the article

Saving "heavy" figure to PDF in MATLAB - rendering problem

- by yuk

I generate a figure in MATLAB with lot amount of points (100000+) and want to save it into a PDF file. With zbuffer or painters renderer I've got very large and slowly opened file (over 4 Mb) - all points are in vector format. Using OpenGL renderer rasterize the figure in PDF, ok for the plot, but not good for text labels. The file size is about 150 Kb. Try this simplified code, for example: x=linspace(1,10,100000); y=sin(x)+randn(size(x)); plot(x,y,'.') set(gcf,'Renderer','zbuffer') print -dpdf -r300 testpdf_zb set(gcf,'Renderer','painters') print -dpdf -r300 testpdf_pa set(gcf,'Renderer','opengl') print -dpdf -r300 testpdf_op The actual figure is much more complex with several axes and different types of plots. Is there a way to rasterize the figure, but keep text labels as vectors? Another problem with OpenGL is that is does not work in terminal mode (-nosplash -nodesktop) under Mac OSX. Looks like OpenGL is not supported. I have to use terminal mode for automation. The MATLAB version I run is 2007b. Mac OSX server 10.4.

Read the article

Inverted colours in tiff to PDF conversion

- by spiderdijon

I'm sure I'm making some kind of silly mistake here, but when converting a tiff file to PDF, the colours become reversed. I can't figure out why. Here's my code: PdfWriter writer = PdfWriter.GetInstance(document, new FileStream("Image.pdf", FileMode.Create)); System.Drawing.Bitmap bm = new System.Drawing.Bitmap(@"C:\Temp\338814-00.tif"); int total = bm.GetFrameCount(FrameDimension.Page); document.Open(); PdfContentByte cb = writer.DirectContent; for (int k = 0; k < total; ++k) { bm.SelectActiveFrame(FrameDimension.Page, k); MemoryStream ms = new MemoryStream(); bm.Save(ms, ImageFormat.Tiff); Image img = Image.GetInstance(ms.ToArray()); img.ScalePercent(72f / (float)img.DpiX * 100); img.SetAbsolutePosition(0, 0); cb.AddImage(img); document.NewPage(); } document.Close(); Thanks.

Read the article

pdfmark for docinfo metadata in pdf is not accepting accented characters in Keywords or Subject

- by rpilkey

I am inserting metadata into postscript files with a program, to be distilled to pdf with Adobe Distiller. I am using this code that I grabbed from Thomas Merz's "Web Publishing with Acrobat-PDF": /pdfmark where {pop} {userdict /pdfmark /cleartomark load put} ifelse [ /Title (mot accenté) /Author (mot accenté) /Subject (mot accenté) /Keywords (mot accenté) /DOCINFO pdfmark When you look at the metadata, the accented characters turn into "?" in the Subject and Keyword fields, but not the Title and Author fields. The characters are the same ascii 233 I tried replacing them with octal encoding (\351), which came out the same (Title and Author okay, Subject and Keywords messed up). file encoding is latin-1,unix eol I found a mention on adobe forums, but the answer didn't make sense to me. http://forums.adobe.com/message/1165593 I changed the encoding to utf-8, inserted the characters binarily (in VIM : <Ctrl-v>u00e9), no change. I tried inserting the BOM in a few places, it didn't work. This is with Acrobat Pro 9 I didn't notice this problem with Acrobat Pro 7. Does anybody know of a workaround to get the accented characters into ALL the metadata fields when modifying a postscript file, or tell me if I'm doing it wrong? It seems weird that different fields would not accept the same bytes.

Read the article

Saving "heavy" image to PDF in MATLAB - rendering problem

- by yuk

I generate a figure in MATLAB with lot amount of points (100000+) and want to save it into a PDF file. With zbuffer or painters renderer I've got very large and slowly opened file (over 4 Mb) - all points are in vector format. Using OpenGL renderer rasterize the figure in PDF, ok for the plot, but not good for text labels. The file size is about 150 Kb. Try this simplified code, for example: x=linspace(1,10,100000); y=sin(x)+randn(size(x)); plot(x,y,'.') set(gcf,'Renderer','zbuffer') print -dpdf -r300 testpdf_zb set(gcf,'Renderer','painters') print -dpdf -r300 testpdf_pa set(gcf,'Renderer','opengl') print -dpdf -r300 testpdf_op The actual figure is much more complex with several axes and different types of plots. Is there a way to rasterize the figure, but keep text labels as vectors? Another problem with OpenGL is that is does not work in terminal mode (-nosplash -nodesktop) under Mac OSX. Looks like OpenGL is not supported. I have to use terminal mode for automation. The MATLAB version I run is 2007b. Mac OSX server 10.4.

Read the article

Creating ODT and PDF files as end result

- by Bill Zimmerman

Hello, I've been working on an app to create various document formats for a while now, and I've had limited success. Ideally, I'd like to dynamically create a fairly simple ODT/PDF/DOC file. I've been focusing my efforts on ODT, because it is editable, and open enough that there are several tools which will convert it to any of the other formats I need. The problem is that the ODT XML files are NOT simple, and there aren't any good-quality API's I could find (especially in python). So far, I've had the most success creating a template ODT file, and then manipulating the DOM in python as needed. This is ok generally, but is quickly becoming inadequate and requires too much tweaking every single time I need to alter one of the templates. The requirements are: 1) Produce a simple document that will have lists, paragraphs, and the ability to draw simple graphics on the page (boxes, circles, etc...) 2) The ability to specify page size, and the different formats should generally print the exact same output when sent to a printer My questions: 1) Are there any other ways I can produce ODT/PDF/DOC files? 2) Would LaTeX be acceptable? I've never really used it, does anyone have experience converting LaTeX files into other formats? 3) Would it be possible to use HTML? There are a lot of converters online. Technically you can specify dimensions in mm/cm, etc..., but I am worried that the printed output will differ between browsers/converters.... Any other ideas?

Read the article

Stream PDF to another local App

- by Nathan

Hi, I'm currently trying to optimize a small firefox extension that will grab a pdf off the current document and send it to a port that another local application is listening on. Right now it uses a terrifying hackjob of cache viewer. The way I'm getting it is loading the cache, searching through it using the current URL and grabbing the file and saving it to a temp directory. Then I stream the file in, delete the temp, and send it through the socket. Now, my new design, ideally I'd want to build it from scratch and cut out saving it to the local machine at all, and just stream it through the socket. I've been looking at doing something like, //check page to ensure its a pdf //init in/out streams //stream through sock //flush Now, this would be vastly superior to the 400 line hacked up mess I have now, but I'm new to building FF extensions, and after reading a lot about URIs and the file streaming and such I'm probably more confused than when I started trying to fix this three hours ago. I'm okay with sending things through the sockets and whatnot, I understand that, I'm mainly confused about what multitude of interfaces I want to use. Gah! Thanks! Also, long time reader, first time poster!

Read the article

PDF (VisPDF component) Problem with DecimalSeparator in Delphi/C++Builder2009

- by Katsumi

Hello. I use VisPDF component Delphi/C++Builder 2009 and show text with ShowMessage(FloatToStrF(1.23, ffFixed, 6, 2)); // Output: 1,23 (right!) UnicodeString Text = "Hello world!"; VPDF->CurrentPage->UnicodeTextOutStr( x, y, 0, Text); ShowMessage(FloatToStrF(1.23, ffFixed, 6, 2)); // Output: 1.23 (false!) afer UnicodeTextOutStr() the DecimalSeparator is changed. I have look in VisPDF source and found, that: Abscissa := Angle * Pi / 180; X := XProjection(X) + StrHeight * sin(Abscissa); Y := (YProjection(Y)) - StrHeight * cos(Abscissa); MtxA := cos(Abscissa); MtxB := sin(Abscissa); SetTextMatrix(MtxA, MtxB, -MtxB, MtxA, X, Y); with SetTextMatrix() show up the bug. comment out this line, DecimalSeparator is right, but no text in my pdf. procedure TVPDFPage.SetTextMatrix(a, b, c, d, x, y: Single); var S: AnsiString; begin S := _CutFloat(a) + ' ' + _CutFloat(b) + ' ' + _CutFloat(c) + ' ' + _CutFloat(d) + ' ' + _CutFloat(x) + ' ' + _CutFloat(y) + ' Tm'; SaveToPageStream(S); end; procedure TVPDFPage.SaveToPageStream(ValStr: AnsiString); begin PageContent.Add(string(ValStr)); // PageContent: TStringList; end; I don't understand this function. Can somebody help? VisPDF does not use any DLL or other software to create PDF files. Using VisPDF is very easy and have good examples.

Read the article

.NET, ASHX, "Server cannot append header after HTTP headers have been sent" after sending PDF Streem

- by Inturbidus

I visit my ASHX file, and it outputs a PDF perfectly. If I visit the very same ASHX with a different query string (I append DateTime.Now.Ticks to the end each visit), and I get this error: Server cannot append hader after HTTP headers have been sent. My code is below: copy.CloseStream = false; document.Close(); var r = context.Response; r.ExpiresAbsolute = DateTime.Now; r.BufferOutput = true; r.ContentType = "application/pdf"; r.AppendHeader("Content-Type", r.ContentType); r.AppendHeader("Content-disposition", "inline; filename=" + context.Server.UrlEncode(formType.File_Name)); r.BinaryWrite(copyStream.ToArray()); r.StatusCode = 200; r.End(); originalReader.Close(); copy.CloseStream = true; copy.Close(); There is no other place in this code that headers are sent. You are seeing the entire interaction with the Response object. I've tried to use r.Flush(); and r.End(); I've also tried not sending them if they are already there, but this causes other issues.

Read the article

Screen scraping software that will traverse pages

- by nilbus

We're creating a mashup site that pulls information from many sources all over the web. Many of these sites don't provide RSS feeds or APIs to access the information they provide. This leaves us with screen scraping as our method for collecting the data. There are many scripting tools out there written in different scripting languages for screen scraping that require you to write scraping scripts in the language the scraper was written in. Scrapy, scrAPI, and scrubyt are a few written in Ruby and Python. There are other web-based tools I've seen like Dapper that create XML or RSS feeds based on a webpage. It has a beautiful web-based interface that requires no scripting skills to use. This would be a great tool, if it were able to traverse multiple pages to gather data from hundreds pages of results. We need something that will scrape information from paginated web sites, much like scrubyt, but with a user interface that a non-programmer could use. We'll script up our own solution if we need to, probably using scrubyt, but if there's a better solution out there, we want to use it. Does anything like this exist?

Read the article

How to print multiple Excel sheets into a single PDF file?

- by Anriëtte Combrink

I am trying to print multiple sheets from the same Excel workbook into ONE PDF file. But it frequently prints them seperately or only the first sheet. I selected all the sheets and made them have the same page setup. I am working on Tiger and from the Print dialogue, I click on the left-hand bottom button, "Save PDF" and from there I choose "Save PDF-X". Anyone have another solution for me?

Read the article

How can I export PDF from InDesign so that transparency renders properly on all platforms?

- by strangeronyourtrain

Gradients, including drop shadows, all show up as solid blocks when I view my document on an Android phone. I tried different PDF compression and compatibility settings in an attempt to flatten and rasterize all the graphics, but it's clearly not working, as the Android viewer still identifies the outlines of transparent shapes instead of the blended pixels. Is there any way to truly flatten these PDF graphics, so that it doesn't matter whether a PDF viewer supports transparency, while keeping the text as text?

Read the article

How can I convert an OpenOffice document to PDF from the Linux command line?

- by Norman Ramsey

I have students who, when asked for PDF, sometimes hand me an OpenOffice document or spreadsheet. file(1) can identify these documents, but I've been unable to discover how to convert them to PDF using the command line. (The man page for ooffice(1) lists an option to print a document but not to convert to PDF.) Google is unhelpful, except for giving me the uneasy feeling that this can't be done without a nifty script in a language I don't know against an API whose documentation I can't find. Can anyone help me solve the problem of converting an OpenDocument to PDF using only the Unix command line?

Read the article

How to print a rendered website to pdf or vector graphics?

- by Lo Sauer

This is a crucial question to many: Searching the web, I have found several command line tools that allow you to convert a HTML-document to a PDF-document, however they all seem to use their own, and rather incomplete rendering engine, resulting in poor quality How can you print the rendered output of a modern web-browser to pdf, (and/or svg) whilst retaining as much vector graphics as possible? There is a solution called: webkit-pdf (which renders everything to bitmap graphics) I am looking for options, alternatives, suggestions perhaps even a printer-driver or webservices? Thanks

Read the article

GhostScript noob help - Breaking a multipage PDF file into many single page PS or EPS files.

- by godzilla_g

Hi, I'm trying to do the following with ghostscript: Turn one multipage PDF file (about 3,000 pages, 200mb file) into: One file per page of the PDF, and convert each (page/file) to EPS or PS (post script(preferably)). Example: hello.pdf (10 pages) would produce: hello1.ps (page 1 out of 10) hello2.ps hello3.ps ... hello10.ps How can I do this? I've been trying for 4 days, and can't figure it out. I have a script I've tried(won't work): Note: Windows(7) user here. gs -sDEVICE=epswrite -o documentname-%.eps documentname.pdf I also don't know how to navigate to the directory where my file resides (cannot figure that out, too). If you can, please show me how. A big thank you.

Read the article

How to change internal page numbers in the meta data of a PDF?

- by YGA

I have a pdf document I created through non-Acrobat means (printing to pdf, then merging a bunch of pdfs), but I'd like to manually change the page numbers (i.e. the first several pages are simply title pages, the page that is labeled "page 1" is really the 7th sheet of the pdf). What's the simplest (and ideally, free) way to do this? To be clear, I am not trying to change the numbers on the pages themselves, but the page numbers in the "metadata" that the pdf stores (the pages themselves are already numbered correctly; I just want "go to page 1" to go to the page labeled 1, which could be sheet 7). For what it's worth, I'm on Windows, though I have access to Macs as well.

Read the article

What is the simplest (free) way to change page numbers on a pdf?

- by YGA

Hi Folks, I have a pdf document I created through non-Acrobat means (printing to pdf, then merging a bunch of pdfs), but I'd like to manually change the page numbers (i.e. the first several pages are simply title pages, "page 1" is really the 7th page of the pdf). What's the simplest (and ideally, free) way to do this? For what it's worth, I'm on Windows, though I have access to Macs as well. Thanks, /YGA

Read the article

PDF to HTML - batch converter - most reliable and accurate free AND paid for software?

- by Rob

I'm look for either a free or paid-for (about 50$/40pounds) BATCH PDF to HTML converter to convert several PDF files at once. Needs to be able to handle vectored and bitmap images within the file, outputting both as jpegs referenced by the html pages. I've tried iorigsoft paid-for PDF to HTML - problems it seems to hang or just go idle, and the stuff it actually converts have broken links - the wrong name is used for constituent chapters as html. Also tried application from intrapdf.com but this crashes near the beginning of the conversion, consitently. Looked at opensource tools but they look equally flakey or use old PDF versions. Need it on Windows 7 32bit home. Thoughts?

Search Results

Search found 4479 results on 180 pages for 'pdf scraping'.

Page 26/180 | < Previous Page | 22 23 24 25 26 27 28 29 30 31 32 33 | Next Page >

- by in bruges

- by Harikrishna

- by NORM

- by Dave W.

- by Felixyz

- by cpt.oneeye

- by user286028

- by burki

- by yuk

- by spiderdijon

- by rpilkey

- by yuk

- by Bill Zimmerman

- by Nathan

- by Katsumi

- by Inturbidus

- by nilbus

- by Anriëtte Combrink

- by strangeronyourtrain

- by Norman Ramsey

- by Lo Sauer

- by godzilla_g

- by YGA

- by YGA

- by Rob

< Previous Page | 22 23 24 25 26 27 28 29 30 31 32 33 | Next Page >