Search Results

Search found 4479 results on 180 pages for 'pdf scraping'.

Page 39/180 | < Previous Page | 35 36 37 38 39 40 41 42 43 44 45 46 | Next Page >

201001 acta.pdf as text (full text of leaked copy)

<b>swpat.org:</b> "201001_acta.pdf is a leaked copy of the full "January 18th 2010 consolidated text" of the Anti-Counterfeiting Trade Agreement (ACTA). Below is a full transcript."

Read the article
Adobe PDF Apps Susceptible to New Attack Threat

Adobe Acrobat and Reader PDF applications are vulnerable to a new attack that relies on social engineering to trick users into clicking on something they shouldn't.

Read the article
Adobe Issues Warning on PDF Security Risk

Security researchers identify threat of social engineering scheme that could aim to trick users into launch actions from PDF files that could trigger arbitrary code execution.

Read the article
How to set PDF paragraph or font line-height with iTextSharp?

- by FreshCode

How can I change the line-height of a PDF font or paragraph using iTextSharp?

Read the article
Dinamically creating a member ID card as pdf using PHP?

- by aefxx

I need to code a PHP script that would let me generate a pdf file which displays a member ID card (something like a credit card used to identify oneself) at a certain resolution. Let me explain: I do have the basic blueprint of the card in png file format. The script needs to drop in a member's name and birth day along with a serial. So far, no problem - there are plenty of good working PHP librarys out there. My problem is to ensure that the resulting pdf (the generated image of the card, to be precise) meets a certain resolution (preferably 300dpi), so that printing it would look right. Any ideas? EDIT I solved it using the TCPDF library which let's you scale images at a certain resolution. Get it here: http://www.tecnick.com/public/code/cp_dpage.php?aiocp_dp=tcpdf

Read the article
Dynamically creating a member ID card as pdf using PHP?

- by aefxx

I need to code a PHP script that would let me generate a pdf file which displays a member ID card (something like a credit card used to identify oneself) at a certain resolution. Let me explain: I do have the basic blueprint of the card in png file format. The script needs to drop in a member's name and birthday along with a serial. So far, no problem - there are plenty of good working PHP libraries out there. My problem is to ensure that the resulting pdf (the generated image of the card, to be precise) meets a certain resolution (preferably 300dpi), so that printing it would look right. Any ideas? EDIT I solved it using the TCPDF library which lets you scale images at a certain resolution. Get it here: http://www.tecnick.com/public/code/cp_dpage.php?aiocp_dp=tcpdf

Read the article
how to deal with pdf annotation with ipad in objective c?

- by Sarah

Hello, I know that it may sound a silly question but i am really very confused. I am to work with one application that is having operations like PDF loading,annotation, scrolling,zooming and other such functions. Now my question is that i am little bit confused about what template i should use as i went through Quartz 2D Programming Guide and was little bit confused whether i'll be able to apply the above shown functions with the same guideline,as it displays the pdf page on the whole screen. Or is there any other way around? Please help me..Can i use UIWebView for the same functions as i listed above? I ll be grateful if you can help me. Thank you.

Read the article
Is it not possible to print a pdf from a hyperlink?

- by andrew

I have looked for weeks and I keep hitting dead ends. I know you can create a text or image link and tell it to "print page" in a browser. But so far, I can't get it to print a document, specifically a pdf. I would like the print dialog to show after the link is clicked and yes, the pdf linked to has been printed. Why does this seem to be such an impossible feat? I have seen it work in a Flash movie, but since I cannot access the native file I cannot see how it was done. Any advice? Thanks.

Read the article
How to generate a PDF of dynamic HTML content?

- by chris Frisina

I am trying to be able to allow users to generate content dynamically, and have that information be in a , and then allow that specific to be exportable to a pdf. I have got Joomla up and running (with the appropriate mySQL and ANT) locally with the Web2PDF extension, but how would I get those running on my domain (hosted by Dreamhost). Are there any other approaches you might recommend. The content is generated by JS and JQuery, and formatted with CSS and HTML. Other considerations: Web2PDF generates a PDF on the entire content, (pulling the entire page's HTML, not just the specific <div>.

Read the article
How to know if a PDF contains only images or has been OCR scanned for searching?

- by Bratch

I have a bunch of PDF files that came from scanned documents. The files contain a mix of images and text. Some were scanned as images with no OCR, so each PDF page is one large image, even where the whole page is entirely text. Others were scanned with OCR and contain images and searchable text where text is present. In many cases even words in the images were made searchable. I want to make an automated process to recognize the text in all of the scanned documents using OCR, with Acrobat 8 Pro, but I don't want to re-OCR the files that have already been through the OCR process in the past. Does anyone know if there is a way to tell which ones contain only images, and which ones already contain searchable text? I'm planning on doing this in C# or VB.NET but I don't think being able to tell the two kinds of files apart is language dependent.

Read the article
Why do some PDFs lag in Adobe Acrobat?

- by Coldblackice

I have a handful of PDFs open. One of them in particular is extremely laggy, almost to the point of being unreadable. When I scroll through its pages, it's almost like an extreme version of v-sync being turned off. Very choppy. Overall system resources are plentiful, and all of the other PDFs cruise up and down with no stuttering or problems. I've tried closing and reopening the problem PDF to no avail. It's a small PDF, only 3MB in size, with no graphics (only programming code snippets). Surely, it must be some type of problem with the specific PDF (I'll try opening it in another PDF-viewing program, rather than Acrobat X). Possible corruption? Could there be some type of GPU/hardware-acceleration intervening going on? I've never heard of such with PDF-viewing.

Read the article
How to Split a Big Postscript file (3000 pages) into one individual file per page (using Windows 7)?

- by Pablo

Hi, I'm having trouble doing the following: I have a big PDF file that I converted to postscript (for commercial printing). The resulting file is too big to be processed by the printer (machine). I've been trying to find a way to either: Convert from the original (many pages) PDF file to many Postscript file (one postcript file per PDF page in original PDF file(. Convert from PDF to PS (or even EPS). - I managed to do this Then split the PS file into a collection of smaller files. I've tried using Ghostscript, but it is all gibberish to me. Thanks. PS. If you have a good GS tutorial (for dummies?), please share the link.

Read the article
Web scraping with Python

- by Jack

I'm currently trying to scrape a website that has fairly poorly-formatted HTML (often missing closing tags, no use of classes or ids so it's incredibly difficult to go straight to the element you want, etc.). I've been using BeautifulSoup with some success so far but every once and a while (though quite rarely), I run into a page where BeautifulSoup creates the HTML tree a bit differently from (for example) Firefox or Webkit. While this is understandable as the formatting of the HTML leaves this ambiguous, if I were able to get the same parse tree as Firefox or Webkit produces I would be able to parse things much more easily. The problems are usually something like the site opens a <b> tag twice and when BeautifulSoup sees the second <b> tag, it immediately closes the first while Firefox and Webkit nest the <b> tags. Is there a web scraping library for Python (or even any other language (I'm getting desperate)) that can reproduce the parse tree generated by Firefox or WebKit (or at least get closer than BeautifulSoup in cases of ambiguity).

Read the article
Web scraping with Python

- by Jack

I'm currently trying to scrape a website that has fairly poorly-formatted HTML (often missing closing tags, no use of classes or ids so it's incredibly difficult to go straight to the element you want, etc.). I've been using BeautifulSoup with some success so far but every once and a while (though quite rarely), I run into a page where BeautifulSoup creates the HTML tree a bit differently from (for example) Firefox or Webkit. While this is understandable as the formatting of the HTML leaves this ambiguous, if I were able to get the same parse tree as Firefox or Webkit produces I would be able to parse things much more easily. The problems are usually something like the site opens a <b> tag twice and when BeautifulSoup sees the second <b> tag, it immediately closes the first while Firefox and Webkit nest the <b> tags. Is there a web scraping library for Python (or even any other language (I'm getting desperate)) that can reproduce the parse tree generated by Firefox or WebKit (or at least get closer than BeautifulSoup in cases of ambiguity).

Read the article
a question on webpage data scraping using Java

- by Gemma

Hi there. I am now trying to implement a simple HTML webpage scraper using Java.Now I have a small problem. Suppose I have the following HTML fragment. <div id="sr-h-left" class="sr-comp"> <a class="link-gray-underline" id="compare_header" rel="nofollow" href="javascript:i18nCompareProd('/serv/main/buyer/ProductCompare.jsp?nxtg=41980a1c051f-0942A6ADCF43B802'); " Compare Showing 1 - 30 of 1,439 matches, The data I am interested is the integer 1.439 shown at the bottom.I am just wondering how can I get that integer out of the HTML. I am now considering using a regular expression,and then use the java.util.Pattern to help get the data out,but still not very clear about the process. I would be grateful if you guys could give me some hint or idea on this data scraping. Thanks a lot.

Read the article
Scraping paginated items from a website using scrapy

- by Mridang Agarwalla

I'm using scrapy to scrape items from a site. I'm not being able to implement this scraping pattern. The site I'm trying to scrape is a forum and I scrape the site once a day. Each page has a table containing posts. New posts are added to the top of the table and as more and more posts are posted to the site, the older posts go further into the pages due to pagination. This is a very simple scenario and we will assume that the order of the posts never change. I would like to scrape this site and scrape all the "new" records until the last scraped post from yesterday is encountered. I have configured my spider to paginate endlessly and when it encounters yesterday's last scraped post, it should stop. How can implement this? (My Scrapy installation works with my Django installation using django-dynamic-scraper )

Read the article
Document conversion and viewing, what are the cutting edge solutions?

- by DigitalLawyer

Goal: building a web application where a user can: Upload a document (doc, docx, pdf, additional office formats a +) View that document in a browser, preferably in html Download the document (in doc, pdf, additional open formats a +) Current solution: Ruby on Rails Application on Rackspace Users can upload doc and pdf files (AWS) Files can be downloaded in the format in which they were uploaded Thumbnail generation ([doc, pdf] - pdf - png) is done through AbiWord. Certain doc files do not convert well. Documents can be viewed in embedded Google docs viewer (https://docs.google.com/viewer). Certain doc files cannot be displayed. Little flexibility. Potential improvements: Document viewing in pdf through pdf.js Viewing in html (+ annotation) through Crocodoc I'd be glad to hear other users' experiences, and will add good recommendations to this list.

Read the article
Como Exportar Crystal Reports a Excel, Word, Rich Text, PDF ó HTML

- by jaullo

Cuando trabajamos con reportes siempre requerimos la funcionalidad de exportación. En crystal reports para asp.net, realizar esta tarea es sumamente sencillo. Sin embargo la pregunta más grande que salta siempre, es como realizarlo utilizando código Behind. Para poder acceder a las librerias de crystal y sus componentes, primero debemos importar los espacios de nombres: Normal 0 21 false false false ES X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Tabla normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin-top:0cm; mso-para-margin-right:0cm; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0cm; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} Imports CrystalDecisions.CrystalReports.Engine Imports CrystalDecisions.Shared CrystalDecisions.CrystalReports.Engine, nos servirá para poder manejar nuestro reportDocument y CrystalDecisions.Shared, será el medio que utilicemos para la exportación. Así que, veamos como podemos exportar nuestro informe sin tener que enviarlo a la impresora, recordemos que por defecto crystal reports ya tiene la opcion de exportar a PDF sin embargo debemos hacerlo tal como si fueramos a imprimir y que es lo que evitaremos acá. Colocamos un botón en nuestra pagina asp Normal 0 21 false false false ES X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Tabla normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin-top:0cm; mso-para-margin-right:0cm; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0cm; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} <asp:Button ID="btntopdf" runat="server" Text="Exportar a PDF" /> Y en nuestro boton deberemos ejecutar la siguiente rutina: Normal 0 21 false false false ES X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Tabla normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin-top:0cm; mso-para-margin-right:0cm; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0cm; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} Protected Sub btntodpf_Click(ByVal sender As Object, ByVal e As System.EventArgs) Handles btntopdf.Click 'Cargar reporte. Enlazando a la fuente de datos. LoadReporte() 'Mas adelante veremos que estas lineas las podemos obviar Response.Buffer = False Response.Clear() 'ClearContent, ClearHeaders reporteDoc.ExportToHttpResponse(ExportFormatType.PortableDocFormat, Response, True, "NombreArchivo") End Sub LoadReport, es el encargado de llenar nuestro crystal con la fuente de datos. Está fue la primer forma de exporta nuestro crystal reports, pero no es la única, así que vamos a ver otra forma en la cual utilizaremos el metodo v\:* {behavior:url(#default#VML);} o\:* {behavior:url(#default#VML);} w\:* {behavior:url(#default#VML);} .shape {behavior:url(#default#VML);} Normal 0 false 21 false false false ES X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Tabla normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin-top:0cm; mso-para-margin-right:0cm; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0cm; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} ExportToHttpResponse Para este metodo, nuestro código en el botón cambia relativamente, pero antes de ello, daremos un repaso a los metodos utilizados. Nuestro primer parametro FormatType es un valor de tipo ExportFormatType, que puede corresponder a cualquiera de los metodos que enumeramos a continuación: CrystalReport: El formato al cual se exporta es de Tipo CrystalReport. Excel: El formato al cual se exporta es de tipo Excel ExcelRecord: El formato al cual se exporta es de Tipo Excel Record. NoFormat: No se ha especificado un formato de exportación. PortableDocFormat: El formato al cual se exporta es de Tipo PDF. No voy a enumerar todos, pues me imagino que ya sabrán la idea de cada uno de los formatos, los numerados arriba son los mas importantes. Nuestro segundo parametro el objeto response nos permite adozar el archivo. Y por último, nuestro tercer parametro, definirá si debe ir como un objeto adjunto o no. Si lo colocamos en TRUE, estaremos enviando nuestro archivo como parametro, esto hará que no necesitemos las siguientes líneas de código: Normal 0 21 false false false ES X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Tabla normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin-top:0cm; mso-para-margin-right:0cm; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0cm; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} Response.Buffer = False Response.Clear() Con esto realizado, ya contamos con la posibilidad de enviar el archivo directamente al cliente. Ahora si, veamos cuanto se ha reducido nuestro código: Unicamente nos quedan dos líneas de código en nuestro botón Normal 0 21 false false false ES X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Tabla normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin-top:0cm; mso-para-margin-right:0cm; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0cm; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} 'Cargar reporte. Enlazando a la fuente de datos. LoadReport() reporteDoc.ExportToHttpResponse(ExportFormatType.PortableDocFormat, Response, True, "NombreArchivo") Para finalizar, nada mas decir que espero esto les sea de ayuda y por supuesto, que les facilite la vida con el uso de crystal reports.

Read the article
Purpose of Adobe PDF Link Helper

- by user770750

I have an idea of what this browser add-on does. Adobe PDF Browser Control (AcroPDF.dll) Apparently, if I disable this one, PDFs embedded in a page with the embed or object tag fail to function properly. So, its pretty clear as to its function. However, I can't find anywhere accurate documentation on what this add-on below does. Adobe PDF Link Helper (AcroIEHelperShim.dll) IE9 (with Reader X) seems to work flawlessly with it disabled. PDF's still open within the browser. Only if I uncheck Display PDF in Browser in Readers preferences does that cease. I played around on an XP VM with IE7 and Reader X... no isssues noticed when disabled. Does anyone know the purpose of this add-on? At one time I believed it was necissary for the 'within browser' functionality to work, though that was never verified. Something change?

Read the article
Can PaperPort be used to convert a non OCR PDF to OCR PDF?

- by Senseful

My scanner came with the following software: ScanSoft PageViewer ScanSoft PaperPort ScanDirect I believe it also comes with a basic version of OmniPage I'm not sure which of these programs is the one that actually performs the OCR. When I scan a document, it can perform OCR on it and convert it to a searchable PDF. Is there any way I can take an existing image or PDF file and run the same OCR engine on it in order to create a searchable PDF?

Read the article
Problems in "Save as PDF" plugin with Arabic numbers

- by Mohamed Mohsen

I use the "Save as PDF" plugin with Microsoft word 2007 to generate a PDF document from DOCX document. It works great except that the Arabic numbers in the word file have been converted to English numbers in the PDF document. Kindly find two links containing two screen shots explaining the problem. http://img27.imageshack.us/img27/2893/englishpdf.jpg http://img4.imageshack.us/img4/1857/arabicword.jpg The first image is the generated PDF file with the English numbers highlighted. The second image is the original word file with the Arabic numbers highlighted. Thanks in advance

Read the article
Losing Hyperlinks when converting PPT to PDF using Adobe Acrobot Pro

- by Houda

When I use Adobe Acrobot Pro 9.0 to create PDF from a PPT file, the hyperlinks in the generated PDF don't work (unless I have http://... in the hyperlink text)! I have checked the setting and "Add links to Adobe PDF file" is selected. Any idea why it is not working and how I can get it to work?

Read the article
Printer "ripping" forever (network printer)

- by Julien Gorenflot

Since I installed Ubuntu 11.10, printing is a disaster. I did not have the problem with Lucid Lynx (Ubuntu 10.04), but maybe it just comes from the fact that someone else had installed it for me, and possibly it configured better. When I print a pdf, even 2 pages, my printer (SHARP MX 2300N) stays in rippen for hours. "Rippen" is a German word, not really sure how to translate. Google translate says, The English equivalent is "Rib". And eventually, sometimes, the pages finally get printed. But in between my whole floor is very angry because they also need the printer. Additionally, I don't always have the whole day for waiting for my pages. I remember that when printing I used to be asked if I wanted to reduce transparency effects, which does not seem to happen anymore after I installed Ubuntu 11.10. Is there any connection? Not sure, because I don't think it was for pdf files.

Read the article
Pros and cons of creating a print friendly page to remove the use of pdfs?

- by Phil

the company I work for has a one page invoice that uses the library tcpdf. they wanted to do some design changes that I found are just incredibly difficult for setting up in .pdf format. Using html/css I could easily create the page and have it print very nicely, but I have a feeling that I am over looking something. What are the pros and cons of setting up a page just for printing? What are the pros and cons of putting out a .pdf? I could also use the CSS inline so that if they wanted to download it and open it they could.

Read the article
bad practice to create a print friendly page to remove the use of pdfs?

- by Phil

the company I work for has a one page invoice that uses the library tcpdf. they wanted to do some design changes that I found are just incredibly difficult for setting up in .pdf format. using html/css I could easily create the page and have it print very nicely, but I have a feeling that I am over looking something. is it a good practice to set up a page just for printing? and if not, is it at least better than putting out a ugly .pdf? I could also use the CSS inline so that if they wanted to download it and open it they could.

Read the article

< Previous Page | 35 36 37 38 39 40 41 42 43 44 45 46 | Next Page >