pdf scraping - Page 30 - Developer IT

Export pdf table to excel

- by Nagu

Hi, anyone know how to export pdf table to excel using c# and asp.net? if any can you give me a sample code snippet? thanks in advance nagu

Use text from XML document to create PDF?

I have some xml files that contain text, how can I can extract this text from the xml documents?(libraries etc?) how can I use this text to create a pdf document? I am working in a PHP environment, however if this is not the suitable language, I could change.

Read the article

????????s in my generated PDF

- by gAMBOOKa

I'm getting ???????? characters in my PDF, i've stripped out \r\n \r \n \t, trimmed everything, decoded html entities and stripped tags. Nothing helps. The data is coming from a MySQL database. Any help would be appreciated.

Read the article

Web scraping: how to get scraper implementation from text link?

- by isme

I'm building a java web media-scraping application for extracting content from a variety of popular websites: youtube, facebook, rapidshare, and so on. The application will include a search capability to find content urls, but should also allow the user to paste a url into the application if they already where the media is. Youtube Downloader already does this for a variety of video sites. When the program is supplied with a URL, it decides which kind of scraper to use to get the content; for example, a youtube watch link returns a YoutubeScraper, a Facebook fanpage link returns a FacebookScraper and so on. Should I use the factory pattern to do this? My idea is that the factory has one public method. It takes a String argument representing a link, and returns a suitable implementation of the Scraper interface. I guess the Factory would hold a list of Scraper implementations, and would match the link against each Scraper until it finds a suitable one. If there is no suitable one, it throws an Exception instead.

Read the article

How to use regular expressions to pull a substring? (screen scraping)

- by Diego

Hey guys, i'm really trying to understand regular expressions while scraping a site, i've been using it in my code enough to pull the following, but am stuck here. I need to quickly grab this: http://www.example.com/online/store/TitleDetail?detail&sku=123456789 from this: ('<a href="javascript:if(handleDoubleClick(this.id)){window.location=\'http://www.example.com/online/store/TitleDetail?detail&sku=123456789\';}" id="getTitleDetails_123456789">\r\n\t\t\t \tcheck store inventory\r\n\t\t\t </a>', 1) This is where I got confused. any ideas?

Read the article

Adding barcodes to pdfs

- by calccrypto

is there any way to do something like a mail merge, where the data (9-15 chars long) is converted to a barcode? im using trying to use openoffice's code128 for calc, but for some reason, every 10 strings, the barcode goes crazy, and the ascii tells me to register at the site where the extension came from, which i dont want to do i also found one for oodraw, but that requires the values to be inputted manually. since im not familiar with the macros, i can't write something that will do it automatically what im trying to do is: take an old pdf (only 1 page) covert it to word or picture or something add a function/macro/whatever to show a barcode (whether or not the barcode shows in this file, i dont care), given a string from excel data reconvert to separate pdfs or some other way that adds barcodes to pdfs all other free programs i have found do not do this nicely, and since im not really a pdf person, im not going to buy random programs. i just need this done for one large batch of data

Read the article

Getting the height of text to know fill height as in TCPDF

- by sami

I'm trying to go through the code of TCPDF to understand how it calculates the height of the text to be rendered, but it's too much for me to handle without asking. What I want to know: in the PDF from example 5 http://www.tcpdf.org/examples/example_005.pdf it gives the cell a yellow background. I'm guessing that at the basic level, it first draws a box with this fill color, then adds the text, so what method is it calling to get the height of the text to know the height of the box to fill? Anyone knows how to trace this, because doing it by hand and looking through the code isn't working at all for me.

Read the article

Why do this PDF's fonts appear unreadable on my machine?

- by Matthew

I'm trying to read The Art of Assembly Language as per this answer on Stack Overflow. When I open it on my Ubuntu 12.04 box, it looks like this: I haven't tested it on another machine, but this can't be intentional. What is going on, and how can I fix it? Edit: The above screenshot is from Chrome. It look like this in Evince: Still squished and hardly readable, but better. Is there anything I can do to fix it?

Read the article

How to render Asian characters in a PDF using xhtmlrenderer

- by Mark Derricutt

I was wondering what steps were needed to render Asian characters using the java based xhtmlrenderer (flying saucer) library? I am wanting to render the following: <html> <body>????????</body> </html> Without any font settings being added to the HTML this renders fine in normal browsers, but I can't find anyway to render this to PDF using the iTextRenderer portion of xhtmlrenderer. After following various threads on the mailing list, I see lots of posts talking about adding .TTF files from the c:\windows\fonts directory, and I have modified the examples to run on linux ( https://gist.github.com/643173745182c9becc57 ), which shows me various fonts being displayed, but I don't see any Asian glyffs. Does anyone have any decent pointers, or clean solutions to this problem? Or am I looking at the wrong problem with a really simple solution elsewhere?

Read the article

PDF Text Extraction Approach Using OCR

- by Jon

Has anybody attempted to extract text from a PDF using an OCR library and Java? What did you find to be the most reliable library for text extraction. Most of the approaches I've seen (tesseract, GOCR) are C libraries that would require some JNI code to be written. I'm familiar with pdfbox, which is now an Apache incubator project at version 0.8.x, but it's text extraction isn't always accurate. I'm looking for an alternative approach that is somewhat more reliable. I've not tried Asprise JavaPDF yet, in the process of trying that, but wanted to know more about the OCR approach (if it's possible). Any help would be appreciated.

Read the article

LocalReport (WebForms) and Partial Trust, for PDF Generation

- by Peet Brits

My goal is to generate a PDF for display in a web page, either as aspx or with a generic handler. (This will link from a Silverlight page, but this is irrelevant to the problem.) The problem is that LocalReport (Microsoft.Reporting.WebForms; Microsoft.ReportViewer.WebForms.dll) requires full trust, and our hosting server does not allow full trust. I am aware that ReportViewer has a remote mode that will allow it to run with partial trust, but for that I need a report server url which should probably have full trust as well, which does not solve anything. So how do I generate PDFs from WebForms in a partially trusted environment?

Read the article

WinForm PDF viewer with programatic search functionality

- by Anthony Shaw

We are currently using PDFTron's PDFView WinForm control to view PDF's in a WinForm application. I can load the file and search the file to highlight a certain text value. We have run into an issue where we have expired our trial license period and do not wish to shell out the $900/seat licensing on 15-20 computers. Does anybody know if the ActiveX Adobe Reader control can support the searching functionality programatically? I've tried that and the FoxIt Reader OCX control and neither seem to have this feature exposed (unless it's hidden really well) Does anybody have any suggestions for other freeware/open source viewers, or less-expensive viewers? TIA!

Read the article

Exception on dowloading Pdf file in ASP.NET

- by Sauron

I am downloading a Pdf file created by crystal report and I download as ReportDocument repDoc = ( ReportDocument ) System.Web.HttpContext.Current.Session["StudyReportCrystalDocument"]; // Stop buffering the response Response.Buffer = false; // Clear the response content and headers Response.ClearContent(); Response.ClearHeaders(); try { repDoc.ExportToHttpResponse(CrystalDecisions.Shared.ExportFormatType.PortableDocFormat, Response, true, "StudyReport" ); } catch( Exception ex ) { } Eventhough it is working But I got an exception base {System.SystemException} = {Unable to evaluate expression because the code is optimized or a native frame is on top of the call stack.} Can anyone explain what is the reason for this and how to override the exception?

Read the article

How to generate pdf files _with_ utf-8 multibyte characters using Zend Framework

- by Sejanus

Hello, I've got a "little" problem with Zend Framework Zend_Pdf class. Multibyte characters are stripped from generated pdf files. E.g. when I write aabccdee it becomes abcd with lithuanian letters stripped. I'm not sure if it's particularly Zend_Pdf problem or php in general. Source text is encoded in utf-8, as well as the php source file which does the job. Thank you in advance for your help ;) P.S. I run Zend Framework v. 1.6 and I use FONT_TIMES_BOLD font. FONT_TIMES_ROMAN does work

Read the article

Link to pdf within app

- by Thijs

I've got an iOS app that at one point opens a link to a website in a webview. These links are kept in a plist file (so it is easy to maintain them as the app evolves). What I want to do next is to also link to PDF's (or any picture of text file format, of even a html format, this is flexible) that are kept within the app. And I would like to do this as much as possible from within the existing app structure. So, is it possible to create a link that can be put in the plist as a web-link, but instead opens a file on the device itself (possibly in the webview)? And how would I go about that? Any ideas? Thanx in advance for your help!

Read the article

Does anyone know of a way to easily convert a PDF to a docx format programmatically

- by Rob

We have a couple 3rd party systems that give us PDFs. We would like to convert those PDFs for display on the web without using an Adobe product. Ideally we would like to use Silverlight to render the PDFs but are having trouble converting from a PDF to Xaml or using docx format as a middle man. There are lots of libraries that give PDFs but that is not what we need. If there is a library out there that does this, a .net lib would be preferable but we can run the conversion using the command line as well if that is an option.

Read the article

Absolute link to PDF - Executable File?

- by tony noriega

We have a mass emailing tool (.net based) that we developed in house. html editor. and sends via html and text formats. within the body we have an absolute URL path to a PDF on our server. some of our subscribers are stating that when they click on the link they get a message box that the file is an executable file and whether they should run it or not... why would that happen... and only to a certain group?

Read the article

Translate GPS coordinates to location on PDF Map

- by christo16

Hi everyone, I'd like to know (from a high level view) what would be required to take a pdf floor plan of a building and determine where exactly you are on that floor plan using GPS coordinates? In addition to location, the user would be presented with a "turn by turn" directions to another point on the map, navigating down hallways, between cubicles, etc. Use case: an iPhone app that determined a user's location and guided them to a conference room or person's office in the building. I realize that this is by no means trivial, but any help is appreciated. Thanks!

Read the article

Problem when using \LaTeX \includegraphics with some PDF files

- by brandstaetter

I noticed some strange effects when including existing pdf graphics in my laTeX documents: Most file work flawlessly, but some PDFs that were created on a different machine (or from the web) cause the whole page on which they are embedded to become ever-so-slightly distorted. I only notice the difference in a side-by-side comparison, but once you see it, it's obvious. The text layout seems slightly broken, and when you zoom in you can see it better. I will try to make some screenshots to further elaborate, but in the meantime: Has anyone seen this before and how can I get rid of these distortions?

Read the article

How to compare 2 pdf files using command line or any tool

- by Darzen

I have to compare 2 pdf files and check if they are same or different. I have tried my luck with WinMerge and Beyond compare. however they dont solve my issue. The issue is that there are 2 files which are exactly same except for the time in which they were saved. I have a piece of code that embeds the time of saving into the file. but i am not supposed to do any modifications to the code I have. Hence the above mentioned tools will say that the files dont match and There is just one difference, that is timestamp. Can anyone suggest me a way to handle this. Please. Thanks

Read the article

how to limit the number of datas in pdf

- by udaya

Hi I am exporting data from php page to word,, there i get 'n' number of datas in each page .... How to set the maximum number of data that a word page can contain ,,,, I want only 20 datas in a single page This is the coding i use to export the data to pdf In mysql_table.php the table for the pdf document is be generated <?php require('mysql_table.php'); class PDF extends PDF_MySQL_Table { function Header() { //Title $this->SetFont('Arial','',18); $this->Cell(0,6,'Country details',0,1,'C'); $this->Ln(10); parent::Header(); } } //Connect to database mysql_connect('localhost','root',''); mysql_select_db('cms'); $pdf=new PDF(); $pdf->AddPage(); //First table: put all columns automatically $pdf->Table("SELECT (SELECT COUNT(*) FROM tblentercountry t2 WHERE t2.dbName <= t1.dbName and dbIsDelete='0') AS SLNO ,dbName as Namee,t3.dbCountry as Country,t4.dbState as State,t5.dbTown as Town FROM tblentercountry t1 join tablecountry as t3, tablestate as t4, tabletown as t5 where t1.dbIsDelete='0' and t1.dbCountryId=t3.dbCountryId and t1.dbStateId=t4.dbStateId and t1.dbTownId=t5.dbTownId order by dbName"); $pdf->AddPage(); //Second table: specify 3 columns $pdf->AddCol('rank',20,'','C'); $pdf->AddCol('name',20,'tablecountry'); $pdf->AddCol('pop',20,'Pop (2001)','R'); $prop=array('HeaderColor'=>array(255,150,100), 'color1'=>array(210,245,255), 'color2'=>array(255,255,210), 'padding'=>2); //$pdf->Table('select dbCountry,dbCountryId from tablecountry limit 0,10',$prop); $pdf->Output(); ?> How to limit the number of datas in a page

Read the article

wkhtmltopdf displaying text as blocks

- by making3

We're using wkhtmltopdf in a web project (nodejs/compoundjs). We've gotten it working how we wanted on our machines (using the --use-xserver switch). However, when I try to run this on our Ubuntu server 12.04 (without the ubuntu-desktop package), the PDF cannot use the switch. When we disable the switch, the PDF displays any characters as blocks (image below). How do I resolve this without installing ubuntu-desktop and running x server? I've found liberation fonts, which installing ttf-liberation and fonts-liberation did not help. And urw-fonts, but I have yet to find an Ubuntu equivalent. EDIT: It just hit me, this doesn't matter if I'm on the server or not. On my development machine (Ubuntu 13.04 desktop), I can run the following, which produces the same blocks: wkhtmltopdf http://google.com google1.pdf While this prints out the pdf properly: wkhtmltopdf --use-xserver http://google.com google2.pdf My version of wkhtmltopdf is 0.12.0.

Read the article

Can review changes in Acrobat Reader (Pro, or not) be 'applied' to a PDF?

- by Danjah

Hi there, As part of an enhancement to my workplace processes, we're trying to streamline review of various documents. Yeah, there's way better alternatives to what I'm about to suggest, but the reality is that I have no time allocated to investigate things like DAV, repo setups and such. What I do have time allocated for is improving workflow around tools we already use. So I tried to work through the Adobe PDF collaborative review cycle. I have to say it was pretty amazing, from the notify toolbar icon to doc merging, to user access control. They offer it all, EXCEPT the ability to actually apply review changes to a PDF!?! To clarify, after sending a PDF through the collab review cycle (involving a bunch for external editors and internal staff) the end result was a PDF full of rich feedback - but I can see no way to finalised and apply those 'accepted' review points to the PDF in question. I hope this is clear enough, feel free to ask questions to clarify - perhaps I'm just missing something obvious, but perhaps applying changes to an already existing PDF is not possible? -d

Read the article

Cocoa-Created PDF Not Rendering Correctly

- by Matthew Roberts

I've created a PDF on the iPad, but the problem is when you have a line of text greater than 1 line, the content just goes off the page. This is my code: void CreatePDFFile (CGRect pageRect, const char *filename) { CGContextRef pdfContext; CFStringRef path; CFURLRef url; CFMutableDictionaryRef myDictionary = NULL; path = CFStringCreateWithCString (NULL, filename, kCFStringEncodingUTF8); url = CFURLCreateWithFileSystemPath (NULL, path, kCFURLPOSIXPathStyle, 0); CFRelease (path); myDictionary = CFDictionaryCreateMutable(NULL, 0, &kCFTypeDictionaryKeyCallBacks, &kCFTypeDictionaryValueCallBacks); NSString *foos = @"Title"; const char *text = [foos UTF8String]; CFDictionarySetValue(myDictionary, kCGPDFContextTitle, CFSTR(text)); CFDictionarySetValue(myDictionary, kCGPDFContextCreator, CFSTR("Author")); pdfContext = CGPDFContextCreateWithURL (url, &pageRect, myDictionary); CFRelease(myDictionary); CFRelease(url); CGContextBeginPage (pdfContext, &pageRect); CGContextSelectFont (pdfContext, "Helvetica", 12, kCGEncodingMacRoman); CGContextSetTextDrawingMode (pdfContext, kCGTextFill); CGContextSetRGBFillColor (pdfContext, 0, 0, 0, 1); NSString *body = @"text goes here"; const char *text = [body UTF8String]; CGContextShowTextAtPoint (pdfContext, 30, 750, text, strlen(text)); CGContextEndPage (pdfContext); CGContextRelease (pdfContext);} Now the issue lies within me writing the text to the page, but the problem is that I can't seem to specify a multi-line "text view".

Read the article

Convert HTML + CSS to PDF with PHP?

- by cletus

Ok, I'm now banging my head against a brick wall with this one. I have an HTML (not XHTML) document that renders fine in Firefox 3 and IE 7. It uses fairly basic CSS to style it and renders fine in HTML. I'm now after a way of converting it to PDF. I have tried: DOMPDF: it had huge problems with tables. I factored out my large nested tables and it helped (before it was just consuming up to 128M of memory then dying--thats my limit on memory in php.ini) but it makes a complete mess of tables and doesn't seem to get images. The tables were just basic stuff with some border styles to add some lines at various points; HTML2PDF and HTML2PS: I actually had better luck with this. It rendered some of the images (all the images are Google Chart URLs) and the table formatting was much better but it seemed to have some complexity problem I haven't figured out yet and kept dying with unknown node_type() errors. Not sure where to go from here; and Htmldoc: this seems to work fine on basic HTML but has almost no support for CSS whatsoever so you have to do everything in HTML (I didn't realize it was still 2001 in Htmldoc-land...) so it's useless to me. I tried a Windows app called Html2Pdf Pilot that actually did a pretty decent job but I need something that at a minimum runs on Linux and ideally runs on-demand via PHP on the Webserver. I really can't believe I'm this stuck. Am I missing something?

Search Results

Search found 4479 results on 180 pages for 'pdf scraping'.

Page 30/180 | < Previous Page | 26 27 28 29 30 31 32 33 34 35 36 37 | Next Page >

- by Nagu

- by MMAmail.com

- by gAMBOOKa

- by isme

- by Diego

- by calccrypto

- by sami

- by Matthew

- by Mark Derricutt

- by Jon

- by Peet Brits

- by Anthony Shaw

- by Sauron

- by Sejanus

- by Thijs

- by Rob

- by tony noriega

- by christo16

- by brandstaetter

- by Darzen

- by udaya

- by making3

- by Danjah

- by Matthew Roberts

- by cletus

< Previous Page | 26 27 28 29 30 31 32 33 34 35 36 37 | Next Page >