pdf scraping - Page 2 - Developer IT

PHP to Mesh 2 PDF'S Together or not?

- by Justin

I have an already pre-designed PDF, and I would like to fill the PDF with some database information. So I'm curious if I should save the PDF into JPG's and render them out with the data on top of the image and re-create a PDF. Or is there a way to use the PDF already, and print data into the PDF that is already made? I am trying to figure out the best solution to generating this type of PDF. All thoughts would be greatly appreciated!

Read the article

Screen Scraping Twitter

- by BRADINO

I got an email today asking for help to scrape Twitter. In particular, to be able to login. So I am going to show everyone, NOT to encourage anyone to violate Twitters terms of use but as an educational blog post about how PHP and cURL can be used to post variables and store cookies. Again, I am using the cScrape class I wrote, which you can download. Step 1 First go to twitter.com and look at the source code of the login to get the form field names and the form post location. You will see that the form posts to https://twitter.com/session and the username and password fields are session[username_or_email] and session[password] respectively. Step 2 Now you are ready to login. So using the fetch function in the Scrape class you create an associative array to contain the form values you want to post. The other thing you will need to do is uncomment the lines for CURLOPT_COOKIEFILE and CURLOPT_COOKIEJAR. Cookies will be required to stay logged in and scrape around. The paths to the cookie files need to be writable by your app. Also you will need to uncomment the line about CURLOPT_FOLLOWLOCATION. $data = array('session[username_or_email]' => "bradino", 'session[password]' => "secret"); $scrape->fetch('https://twitter.com/sessions',$data); Step 1.5 Oops that didn't work. All I got back was 403 Forbidden: The server understood the request, but is refusing to fulfill it. Ahhh I see another variable called authenticity_token I bet Twitter was looking for that. So let's back up and first hit twitter.com to get the authenticity_token variable, and then make the login post request with that variable included in our array of parameters. $scrape->fetch('https://twitter.com'); $data = array('session[username_or_email]' => "bradino", 'session[password]' => "secret"); $data['authenticity_token'] = $scrape->fetchBetween('name="authenticity_token" type="hidden" value="','"',$scrape->result); $scrape->fetch('https://twitter.com/sessions',$data); echo $scrape->result; So that's basically it. Now you are logged in and can scrape around and request other pages as you normally would. Sorry it wasn't a longer post. I really do enjoy this kind of stuff so if anyone has a request, hit me up. Errors? 1) Make sure that you are properly parsing the token variable 2) Make sure that you uncommented the lines about CURLOPT_COOKIEFILE and CURLOPT_COOKIEJAR, those options need to be enabled and be sure the path set is writable by your application 3) Make sure that the path to the cookie file is writable and that it is getting data written to it 4) If you get a message about being redirected you need to uncomment the line about CURLOPT_FOLLOWLOCATION, that option needs to be enabled true

Read the article

Improving performance for web scraping code

- by Pankaj Upadhyay

I have a website in which the code scrapes other websites for getting the accurate data. While the code works good but there a decent lag in performance because the code firsts downloads the html stream from various sites(some times 9 websites), extracts the relative part and then renders the html page. What should I do to get an optimal performance. Should I change from shared hosting (godaddy) to my own server or it has nothing to do with my hosting and I need to make changes to my code?

Read the article

How to embed Arial in PDF when PDF has Helvetica?

- by Brooks Moses

So, I've got a PDF file that's generated by a program that uses the Base 14 fonts, so that it contains "Helvetica" and "Times Roman". When I look at that in my copy of Acrobat 7.0 on Windows (for example), it shows these with Arial and Times New Roman. I'm fine with that. The issue is that I'd like to publish this PDF file on lulu.com, and they want all fonts embedded. Including the Base 14. I don't have a copy of Helvetica, so what seems the natural thing to do is substitute Arial for Helvetica and embed Arial. How can I do that? I tried using the Print feature in Acrobat (note: this is the full version, not Reader) to print to a PDF file using Adobe's "Print to PDF" printer driver, and selected the "Embed All Fonts" option in the print settings. This worked for the fonts that I had actual copies of, but instead of "printing" Arial for Helvetica -- which it would do if printing to a real printer -- it leaves all the Helvetica as Helvetica and doesn't embed it. Any suggestions for alternate ways to do this? What I really want is just a copy of my PDF file with ALL fonts embedded, and I'm quite happy if doing that means making one of the usual substitutions for the "Helvetica" that's in it. I'd be happiest if I can do that within Acrobat or other software that I have (pdftex, maybe?), but I'm willing to install another free utility if I need to.

Read the article

PDF with 200 DPI JPEGs look blurry in viewer

- by queueoverflow

I often scan images, and I usually do that as 200 DPI JPEG files. If I look at them in Gwenview (Linux image viewer) the text looks readable, and on 100% it looks perfectly sharp. When I convert them into a PDF, using convert from the Imagick package and open that in Okular or Evince (Linux document viewer) the text looks blurry. I wrote a little script that creates a HTML page with all the images and opens it in the browser. Firefox scales the images a little sharper than the PDF viewers. Firefox JPEGs: Evince PDF Is there any way to get the PDF to scale the images sharply?

Read the article

Convert office document to PDF with horizontal page in the middle

- by ianix

I'm using CutePDF to convert a Word Document to PDF, but the document has one page in the middle with a horizontal page layout and the PDF doesn't get the horizontal page. Is there a way / tool I can use so I can get the PDF as I'd want?

Read the article

PHP FPDF PDF Page Break Question

- by Michael

I am using PHP and FPDF to generate a PDF with a list of items. My problem is if the item list goes on to a second or third page, I want to keep the Item Name, Quantity and Description together. Right now, it will go to a second page, but it may split up all of the details for a particular item. PLEASE HELP! <?php require_once('auth.php'); require_once('config.php'); require_once('connect.php'); $sqlitems="SELECT * FROM $tbl_items WHERE username = '" . $_SESSION['SESS_LOGIN'] . "'"; $resultitems=mysql_query($sqlitems); require_once('pdf/fpdf.php'); require_once('pdf/fpdi.php'); $pdf =& new FPDI(); $pdf->AddPage('P', 'Letter'); $pdf->setSourceFile('pdf/files/healthform/meds.pdf'); $tplIdx = $pdf->importPage(1); $pdf->useTemplate($tplIdx); $pdf->SetAutoPageBreak(on, 30); $pdf->SetTextColor(0,0,0); $pdf->Ln(10); while($rowsitems=mysql_fetch_array($resultitems)){ $pdf->SetFont('Arial','B',10); $pdf->Cell(50,4,'Item Name:',0,0,'L'); $pdf->SetFont(''); $pdf->Cell(100,4,$rowsitems['itemname'],0,0,'L'); $pdf->SetFont('Arial','B',10); $pdf->Cell(50,4,'Quantity:',0,0,'L'); $pdf->SetFont(''); $pdf->Cell(140,4,$rowsitems['itemqty'],0,1,'L'); $pdf->SetFont('Arial','B'); $pdf->Cell(50,4,'Description:',0,0,'L'); $pdf->SetFont(''); $pdf->Cell(140,4,$rowsitems['itemdesc'],0,1,'L'); } $pdf->Output('Items.pdf', 'I'); ?>

Read the article

convert scanned images pdf file to searchable pdf file

- by yuval

I have a pdf of a scanned book. I'm looking for a free software that will perform ocr and then provide an option to save it as pdf/doc. Is there one? Thanks.

Read the article

What freely available software is equipped for editing PDF Bookmarks?

- by Brenton Horne

I know of PDFMod, which I rather like except it has one flaw, I can't seem to be able to add bookmarks before pre-existing bookmarks (see the attached image if this is unclear). I've looked at this question: - Which programs can I use to edit PDF files? That question does not deal specifically with bookmark editing, neither does it specify which software is freely available and which isn't. I should add that I'm incompetent when it comes to installing software so any software that's not available in the Software Centre requires you to detail exactly how I'm meant to install it. Plus, I would specifically like a software that can handle large (like 80 MB) PDF files without lagging like crazy or closing down.

Read the article

pdf creation software, for docx and odt

- by oxinabox.ucc.asn.au

Ok, I have a fairly large collection of docx and odt files. Minutes from meetings etc. Now I want to convert them to pdfs for distrobution. and also into one combined pdf. At the momement I'm using Adobe Acrobat 8 (Pro iirc). and on another machine I'm using foxit pdf printer. To do this I have to print them each individually to pdfs. and then I can combine them with Acrobat, cos acrobat doesn't support conversion stright from docx or odt to pdf - only via printing. Now this is annoying if you have to do it on a regular basis, since i don't keep the pdfs around (I have the orignals source controlled :-D) cos they go out of date pretty quick as I often have to go back and modify old versions (like ridiculously often). e Eg When I find out I've got something in the minutes wrong or I want to add more context for clarifaction. Anyone got a better solution?

Read the article

How to convert pdf to utf-8

- by Apple

I am trying to upload a pdf file using webservice api. But this api doesnot work for pdf file. it works fine for text file.when i try to upload a pdf file it give error as Client-SOAP-ERROR: Encoding: string '%PDF-1.4 %\xc7...' is not a valid utf-8 string So can we convert this pdf file into utf8 string. i am using php as a scripting language.

Read the article

PDF thumbnails don't work

- by Sergey

For a reason I don't know my ubuntu doesn't show thumbnails for pdf. in gconf-editor -- /desktop/gnome/thumbnailers: application@pdf/command = "evince-thumbnailer -s %s %u %o" application@pdf/enable = "True" from console evince-thumbnailer file.pdf out.png works perfectly What can be wrong?? when I installed gnome3 thumbnails worked but everything else didnt. when returned to gnome2.3 - no thumbnails.

Read the article

How to edit pdf metadata from command line?

- by bdr529

I need a command line tool for editing metadata of pdf-files. I'm using a Aiptek MyNote Premium tablet for writing my notes and minutes on this device, import them later and convert them to pdf automatically with a simple script using inkscape and ghostscript. Is there any command line tool to add some categories to the pdf's metadata, so i can find the pdf later (e.g. with gnome-do) by categories?

Read the article

Convert Pdf to exe to raise security

- by kamiar3001

hi folks I have some pdfs I want convert all to exe files and put them into folder on my cd and my customers run exe files instead of pdf and for security reason. so I have tried pdf2exe tools but I need something totally free. pdf2exe is evaluation version and there are some limitation. Please tell me something free, I don't need complicated one I just need encapsulating into exe file. In fact I don't want someone be able to save pdf document and I want to have Independent viewer on my cd.

Read the article

Ubuntu: Is there a good tabbed PDF viewer?

- by Frank

Is there a good non-bloated PDF viewer for (Ubuntu) Linux that supports tabs? I don't want to use Acrobat Reader because it is slow and takes much memory, and my computer isn't the fastest. I know the alternative readers evince and foxit, but they don't support opening different PDF files in tabs. (foxit has that feature on Windows, but the Linux version 1.1, which I just tried, doesn't have it.) For evince, I know many people would like this functionality, but they get ridiculed by Ubuntu people (see here), who say that tabs are the task of a window manager. If that is the case, how can I put all evince windows into one in GNOME?

Read the article

reading a pdf, adding text/images to it and write the modified pdf in rails

- by hurikhan77

I want to use PDF::writer to put some dates on a calendar sheet. The calendar sheet itself is prepared as a prerendered template in PDF format. How could I read this PDF as template to write text on it? Is there an alternative? Fiddling with an HTML-to-PDF converter (like HTMLDoc) is no option.

Read the article

How to edit pdf metadata from command line?

- by bdr529

I need a command line tool for editing metadata of pdf-files. I'm using a Aiptek MyNote Premium tablet for writing my notes and minutes on this device, import them later and convert them to pdf automatically with a simple script using inkscape and ghostscript. Is there any command line tool to add some categories to the pdf's metadata, so i can find the pdf later (e.g. with gnome-do) by categories? Update: I tried the solution with pdftk and it works, but it seems that gnome-do doesn't take care of pdf-metadata. Is there a way to get gnome-do to do that?

Read the article

XML to DOC to PDF

- by Max

what is the easiest(and fastest) way to perform this kind of transformation: "Data in XML" to "Some MS Word 2003 Supported format" to PDF using Java? My first guess was to fill the template with XML data (using Placeholders for example) and then save it and convert it to PDF. But I can't just put placeholders to DOC files, and I can't convert from some other Word formats to PDF... My primary task is to convert XML Data to PDF allowing users to change the PDF on-demand. The best way to change the PDF on-demand seems to give user some kind of MS Word readable document, and then convert it back. There are 2 main problems with this task: 1) I can't use OpenOffice for conversion. 2) System should be able to convert ~1 page of table-based document per 1 second on 2Ghz Core. 3) RTF does not provide enough styling, so some more complex format should be used. Thanks in advance.

Read the article

Assistance using respond_to to find the right actions to render PDF in ruby on rails

- by Angela

Hi, I am trying out Prince with the Princely plugin, which is supposed to format templates that have the .pdf into a PDF generator. Here is my controller: class TodoController < ApplicationController def show_date @date = Date.today @campaigns = Campaign.all @contacts = Contact.all @contacts.each do |contact| end respond_to do |format| format.html format.pdf do render :pdf => "filename", :stylesheets => ["application", "prince"], :layout => "pdf" end end end end I changed the routes.db to include the following: map.connect ':controller/:action.:format' map.todo "todo/today", :controller => "todo", :action => "show_date" My expected behavior is when I enter todo/today.pdf, it tries to execute show_date, but renders according to the princely plugin. Right now, it says cannot find action. What do I need to do to fix this?

Read the article

failed to find PDF header: '%PDF' not found in xCode

- by Alexander

I'm trying to create a PDF Object from binary XString in xCode. (OData from SAP, utf-8) Here is the coding: const char* buf = [temp1 UTF8String]; pdffile = [NSData dataWithBytes:buf length:length1]; [webDisplay loadData:self.pdffile MIMEType:@"application/pdf" textEncodingName:@"utf-8" baseURL:nil]; self.webDisplay.scalesPageToFit = YES; temp1 is a XString length1 is the length of PDF file in bytes. I get following error message: failed to find PDF header: '%PDF' not found Some ideas? Thanks!

Read the article

Converting .doc files to .pdf

- by ngn

Anybody aware of a piece of software which could do MS Office .doc to .pdf conversion for me? I already tried OpenOffice but it appeared to be rather slow and resource-hungry for large documents.

Read the article

Print SSRS Report / PDF automatically from SQL Server agent or Windows Service

- by Jeremy Ramos

Originally posted on: http://geekswithblogs.net/JeremyRamos/archive/2013/10/22/print-ssrs-report--pdf-from-sql-server-agent-or.aspxI have turned the Web upside-down to find a solution to this considering the least components and least maintenance as possible to achieve automated printing of an SSRS report. This is for the reason that we do not have a full software development team to maintain an app and we have to minimize the support overhead for the support team.Here is my setup:SQL Server 2008 R2 in Windows Server 2008 R2PDF format reports generated by SSRS Reports subscriptions to a Windows File ShareNetwork printerColoured reports with logo and brandingI have found and tested the following solutions to no avail:ProsConsCalling Adobe Acrobat Reader exe: "C:\Program Files (x86)\Adobe\Reader 11.0\Reader\acroRd32.exe" /n /s /o /h /t "C:\temp\print.pdf" \\printserver\printername"Very simple optionAdobe Acrobat reader requires to launch the GUI to send a job to a printer. Hence, this option cannot be used when printing from a service.Calling Adobe Acrobat Reader exe as a process from a .NET console appA bit harder than above, but still a simple solutionSame as cons abovePowershell script(Start-Process -FilePath "C:\temp\print.pdf" -Verb Print)Very simple optionUses default PDF client in quiet mode to Print, but also requires an active session. Foxit ReaderVery simple optionRequires GUI same as Adobe Acrobat Reader Using the Reporting Services Web service to run and stream the report to an image object and then passed to the printerQuite complexThis is what we're trying to avoid After pulling my hair out for two days, testing and evaluating the above solutions, I ended up learning more about printers (more than ever in my entire life) and how printer drivers work with PostScripts. I then bumped on to a PostScript interpreter called GhostScript (http://www.ghostscript.com/) and then the solution starts to get clearer and clearer.I managed to achieve a solution (maybe not be the simplest but efficient enough to achieve the least-maintenance-least-components goal) in 3-simple steps:Install GhostScript (http://www.ghostscript.com/download/) - this is an open-source PostScript and PDF interpreter. Printing directly using GhostScript only produces grayscale prints using the laserjet generic driver unless you save as BMP image and then interpret the colours using the imageInstall GSView (http://pages.cs.wisc.edu/~ghost/gsview/)- this is a GhostScript add-on to make it easier to directly print to a Windows printer. GSPrint automates the above PDF -> BMP -> Printer Driver.Run the GSPrint command from SQL Server agent or Windows Service:"C:\Program Files\Ghostgum\gsview\gsprint.exe" -color -landscape -all -printer "printername" "C:\temp\print.pdf"Command line options are here: http://pages.cs.wisc.edu/~ghost/gsview/gsprint.htmAnother lesson learned is, since you are calling the script from the Service Account, it will not necessarily have the Printer mapped in its Windows profile (if it even has one). The workaround to this is by adding a local printer as you normally would and then map this printer to the network printer. Note that you may need to install the Printer Driver locally in the server.So, that's it! There are many ways to achieve a solution. The key thing is how you provide the smartest solution!

Read the article

i had problem in adding the additional content in my pdf

- by Ayyappan.Anbalagan

I am converting my data set into a pdf document.My data set contains the product bill details.So,at the top of the pdf i need to added some more content like "my company name & address customer name, date of bill,bill no" Below code i am using to convert into pdf. public static void Exportdata(DataTable dataTable, HttpResponse Response, int val) { //String filename = String.Concat(name, "-", DateTime.Today.Day.ToString(), "/", DateTime.Today.Month.ToString(), "/", DateTime.Today.Year.ToString(), ".pdf"); Document pdfDoc = new Document(PageSize.A4, 30, 30, 40, 25); System.IO.MemoryStream mStream = new System.IO.MemoryStream(); PdfWriter writer = PdfWriter.GetInstance(pdfDoc, mStream); //int cols = 0; //int rows = 0; int cols = dataTable.Columns.Count; int rows = dataTable.Rows.Count; pdfDoc.Open(); iTextSharp.text.Table pdfTable = new iTextSharp.text.Table(cols, rows); pdfTable.BorderWidth = 1; pdfTable.Width = 100; pdfTable.Padding = 1; pdfTable.Spacing = 1; //creating table headers for (int i = 0; i < cols; i++) { Cell cellCols = new Cell(); Font ColFont = FontFactory.GetFont(FontFactory.HELVETICA, 8, Font.BOLD); Chunk chunkCols = new Chunk(dataTable.Columns[i].ColumnName, ColFont); cellCols.Add(chunkCols); pdfTable.AddCell(cellCols); } //creating table data (actual result) for (int k = 0; k < rows; k++) { for (int j = 0; j < cols; j++) { Cell cellRows = new Cell(); Font RowFont = FontFactory.GetFont(FontFactory.HELVETICA, 6); Chunk chunkRows = new Chunk(dataTable.Rows[k][j].ToString(), RowFont); cellRows.Add(chunkRows); pdfTable.AddCell(cellRows); } } pdfDoc.Add(pdfTable); pdfDoc.Close(); Response.ContentType = "application/octet-stream"; if (val == 1) { Response.AddHeader("Content-Disposition", "attachment; filename=Users.pdf"); } else if (val == 2) { Response.AddHeader("Content-Disposition", "attachment; filename=Customers.pdf"); } else if (val == 3) { Response.AddHeader("Content-Disposition", "attachment; filename=Materials.pdf"); } else { Response.AddHeader("Content-Disposition", "attachment; filename=Reports.pdf"); } Response.Clear(); Response.BinaryWrite(mStream.ToArray()); //Response.Write(mStream.ToString()); HttpContext.Current.ApplicationInstance.CompleteRequest(); Response.End(); }

Read the article

Merge PDF's with PDFTK with Bookmarks?

- by Jason

Using pdftk to merge multiple pdf's is working well. However, any easy way to make a bookmark for each pdf merged? I don't see anything on the pdftk docs regarding this so I don't think it's possible with pdftk. All of our files merged will be 1 page, so wondering if there's any other utility that can add in bookmarks afterwards? Or another linux based pdf utility that will allow to merge while specifying a bookmark for each individual pdf.

Read the article

i had problem in adding the additional content in my pdf...using asp.net c#

- by Ayyappan.Anbalagan

I am converting my data set into a pdf document.My data set contains the product bill details.So,at the top of the pdf i need to added some more content like "my company name & address customer name, date of bill,bill no" Below code i am using to convert into pdf. public static void Exportdata(DataTable dataTable, HttpResponse Response, int val) { //String filename = String.Concat(name, "-", DateTime.Today.Day.ToString(), "/", DateTime.Today.Month.ToString(), "/", DateTime.Today.Year.ToString(), ".pdf"); Document pdfDoc = new Document(PageSize.A4, 30, 30, 40, 25); System.IO.MemoryStream mStream = new System.IO.MemoryStream(); PdfWriter writer = PdfWriter.GetInstance(pdfDoc, mStream); //int cols = 0; //int rows = 0; int cols = dataTable.Columns.Count; int rows = dataTable.Rows.Count; pdfDoc.Open(); iTextSharp.text.Table pdfTable = new iTextSharp.text.Table(cols, rows); pdfTable.BorderWidth = 1; pdfTable.Width = 100; pdfTable.Padding = 1; pdfTable.Spacing = 1; //creating table headers for (int i = 0; i < cols; i++) { Cell cellCols = new Cell(); Font ColFont = FontFactory.GetFont(FontFactory.HELVETICA, 8, Font.BOLD); Chunk chunkCols = new Chunk(dataTable.Columns[i].ColumnName, ColFont); cellCols.Add(chunkCols); pdfTable.AddCell(cellCols); } //creating table data (actual result) for (int k = 0; k < rows; k++) { for (int j = 0; j < cols; j++) { Cell cellRows = new Cell(); Font RowFont = FontFactory.GetFont(FontFactory.HELVETICA, 6); Chunk chunkRows = new Chunk(dataTable.Rows[k][j].ToString(), RowFont); cellRows.Add(chunkRows); pdfTable.AddCell(cellRows); } } pdfDoc.Add(pdfTable); pdfDoc.Close(); Response.ContentType = "application/octet-stream"; if (val == 1) { Response.AddHeader("Content-Disposition", "attachment; filename=Users.pdf"); } else if (val == 2) { Response.AddHeader("Content-Disposition", "attachment; filename=Customers.pdf"); } else if (val == 3) { Response.AddHeader("Content-Disposition", "attachment; filename=Materials.pdf"); } else { Response.AddHeader("Content-Disposition", "attachment; filename=Reports.pdf"); } Response.Clear(); Response.BinaryWrite(mStream.ToArray()); //Response.Write(mStream.ToString()); HttpContext.Current.ApplicationInstance.CompleteRequest(); Response.End(); }

Search Results

Search found 4479 results on 180 pages for 'pdf scraping'.

Page 2/180 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

- by Justin

- by BRADINO

- by Pankaj Upadhyay

- by Brooks Moses

- by queueoverflow

- by ianix

- by Michael

- by yuval

- by Brenton Horne

- by oxinabox.ucc.asn.au

- by Apple

- by Sergey

- by bdr529

- by kamiar3001

- by Frank

- by hurikhan77

- by bdr529

- by Max

- by Angela

- by Alexander

- by ngn

- by Jeremy Ramos

- by Ayyappan.Anbalagan

- by Jason

- by Ayyappan.Anbalagan

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >