pdf scraping - Page 47 - Developer IT

Converting Visio (.vsd) files to pdf automatically

- by Aseques

I am trying to create a scheduled task to convert all my .vsd files to pdf so all of our devices can read them (linux, mac, smartphones, etc..) and I would prefer not paying for something that can be done with Visio + PDFcreator. The approach of using openoffice doesn't work with .vsd files since it's not a supported format ( Method/tools for batch-converting Microsoft Word files into PDF?) What I've currently is this: 'C:\Program Files\Microsoft Office\Visio11\VISIO.EXE' /pt "Z:\Archive\Files.vsd",-PPDFCREATORPRINTER /nologo That is able to open automatically the document I want and to prepare it to be printed, the only missing part is that it requires me to confirm on the printing dialog. There's some information here: http://support.microsoft.com/kb/314392 but it doesn't explain abotu non interactive printing.

Read the article

Converting Visio (.vsd) files to pdf automatically [migrated]

- by Aseques

I am trying to create a scheduled task to convert all my .vsd files to pdf so all of our devices can read them (linux, mac, smartphones, etc..) and I would prefer not paying for something that can be done with Visio + PDFcreator. The approach of using openoffice doesn't work with .vsd files since it's not a supported format ( Method/tools for batch-converting Microsoft Word files into PDF?) What I've currently is this: 'C:\Program Files\Microsoft Office\Visio11\VISIO.EXE' /pt "Z:\Archive\Files.vsd",-PPDFCREATORPRINTER /nologo That is able to open automatically the document I want and to prepare it to be printed, the only missing part is that it requires me to confirm on the printing dialog. There's some information here: http://support.microsoft.com/kb/314392 but it doesn't explain abotu non interactive printing.

Read the article

Clone/mirror a textbox in a PDF form

- by Tim Pietzcker

I've got a PDF form in Acrobat X Pro where users can enter their name in a textbox on the first page. I would like the contents of that box to be cloned/mirrored in another box on the second page of the same form: However, in the Properties dialog of the second textbox, I can't find a way to reference the first one. I do have options to calculate numerical values and perform validation etc. etc., but I can't seem to simply have it display the contents of another textbox. Is this not possible in PDF forms, or am I overlooking something obvious?

Read the article

Update a PDF to include an encrypted, hidden, unique identifier?

- by Dave Jarvis

Background The idea is this: Person provides contact information for online book purchase Book, as a PDF, is marked with a unique hash Person downloads book PDF passwords are annoying and extremely easy to circumvent. The ideal process would be something like: Generate hash based on contact information Store contact information and hash in database Acquire book lock Update an "include" file with hash text Generate book as PDF (using pdflatex) Apply hash to book Release book lock Send email with book download link Technologies The following technologies can be used (other programming languages are possible, but libraries will likely be limited to those supplied by the host): C, Java, PHP LaTeX files PDF files Linux Question What programming techniques (or open source software) should I investigate to: Embed a unique hash (or other mark) to a PDF Create a collusion-attack resistant mark Develop a non-fragile (e.g., PDF -> EPS -> PDF still contains the mark) solution Research I have looked at the following possibilities: Steganography Natural Language Processing (NLP) Convert blank pages in PDF to images; mark those images; reassemble PDF LaTeX watermark package ImageMagick Steganograhy requires keeping a master copy of the images, and I'm not sure if the watermark would survive PDF -> EPS -> PDF, or other types of conversion. LaTeX creates an image cache, so any steganographic process would have to intercept that process somehow. NLP introduces grammatical errors. Inserting blank pages as images is immediately suspect; it is easy to replace suspicious blank pages. The LaTeX watermark package draws visible marks. ImageMagick draws visible marks. What other solutions are possible? Related Links http://www.tcpdf.org/ invisible watermarks in images Thank you!

Read the article

Generate or update a PDF to include an encrypted, hidden watermark?

- by Dave Jarvis

Background Using LaTeX to write a book. When a user purchases the book, the PDF will be generated automatically. Problem The PDF should have a watermark that includes the person's name and contact information. Question What software meets the following criteria: Applies encrypted, invisible watermarks to a PDF Open Source Platform independent (Linux, Windows) Fast (marks a 200 page PDF in under 1 second) Batch processing (exclusively command-line driven) Collusion-attack resistant Non-fragile (e.g., PDF - EPS - PDF still contains the watermark) Well documented (shows example usages) Ideas & Resources Some thoughts and findings: Natural language processing (NLP) watermarks. Apply steganography on a randomly selected image. http://openstego.sourceforge.net/cmdline.html The problem with NLP is that grammatical errors can be introduced. The problem with steganography is that the images are sourced from an image cache, and so recreating that cache with watermarked images will impart a delay when generating the PDF (I could just delete one image from the cache, but that's not an elegant solution). Thank you!

Read the article

Setting the default zoom level for Acrobat Reader browser plugin

- by Joe

I have a large monitor and when a PDF file is opened in my browser with the PDF plugin the letters are tiny with 100% zoom level. Every time I have to set it to 200% manually to be able to read the text. Is there a way to set the default zoom level to 200% in the plugin?

Read the article

Adobe Reader not loading form content

- by wullxz

We have an FDL file which is used to offer an online application possibility. The FDL is filled out and sent to a mailbox. When I open the received file, Adobe Reader starts, loads the document in Internet Explorer (had to change my default browser because it doesn't work in chrome - the customer uses IE as default) and displays a warning that Adobe Reader has blocked the connection to the server where the initial document is saved: I can then click on "Trust this document once" (translated by me!) or "Add this host to trusted hosts" (also translated by me!). The second option doesn't work at all. The first option works but is a little bit annoying. I looked into Adobe Readers options (Edit - "Voreinstellungen" in german / the last option - Security (advanced)) and found the possibility to add hosts, files and directories or allow Adobe Reader to use the "Trusted Websites" list from Internetoptions. When I add the website either to Trusted Websites or the trusted list in Adobe Readers options, the warning doesn't pop up but the content in the prefilled (by the applicant) input boxes of the document doesn't show up on Windows 7 but it does show up on Windows XP. This Screenshot shows the settings window described in the last paragraph. The big input box at the bottom normally holds the trusted files/directories/hosts list. System Information: Windows 7 Enterprise x64 Adobe Reader X multiple IE versions (mine is latest but there's also IE 7 or 8) How do I get Adobe Reader to load the content of the form? This behaviour can be reproduced on a PC. When opening an fdf from a command line the form fields are blank even though there is data in the fdf and the pdf is located in a mnaully entered trsuted folder. Steps to reproduce: Clean install a Windows 7 PC (or use a virtual box) Map a network drive to a shared folder with a subfolder e.g. c:\test\docs becomes m:\docs Set security permissions to allow full control to everyone Add an fdf and a matching pdf file in the subfolder Manually add m:\docs to each of the trusted folders in the trust manager registry settings Ensure that Enhanced Security is on Run a command line to open the fdf file Expected result: pdf is opened in Adobe Reader with form fields filled out with data Actual results: pdf is opened with blank fields 'Yellow bar' appears asking to add document to trusted locations It appears that Adobe Reader XI is ignoring the privileged locations entries in the registry. Adding the document via the 'yellow bar' adds the individual document, with the same folder, to the privileged locations but means that the process has to be repeated for every document that needs to be opened from the folder.

Read the article

doxilion document converter alternative

- by Nrew

Do you know of any alternative to doxilion document converter. because when I try to convert .doc files into .pdf. The images is removed and the output .pdf file will only contain text. Please not the online converter. Because I have slow internet.

Read the article

iPhoto printing PDFs sideways

- by Marcelo Cantos

When I print to PDF from iPhoto (7.1.5), my scans (which are all portrait) all end up sideways in the PDF file. I've searched high and low for some setting that will alter this. The only one that I could find that would plausibly affect this is the Layout menu item in the Print customisation view, but this has no effect. Even rotating the images doesn't change it. Why is iPhoto printing my PDFs sideways, and how do I get it to stop?

Read the article

Can anyone recommend a PDF->(rtf,doc,html) application for windows

- by Nifle

I would like an application able to convert a PDF to either doc, html or rtf. I don't want an online solution. (I have tried a few but I think my PDF's are too large). I don't need any OCR capability. I have 64bit win7 so anything that works on that is preferable but any os back to win2k will do.

Read the article

Can anyone reccomend a PDF->(rtf,doc,html) application for windows

- by Nifle

I would like an application able to convert a PDF to either doc, html or rtf. I dont want an online solution. (I have tried a few but I think my PDF's are too large). I don't need any OCR capability. I have 64bit win7 so anything that works on that is preferable but any os back to win2k will do.

Read the article

Flatten Word document

- by user126389

I have a document with some precise formatting, created in Word. This doc was converted to PDF for distribution. Now the original is lost, and reconverting to Word using a PDF to word add-on from Microsoft results in many text boxes in the new DOC file. How can I 'flatten' this to remove the text boxes and retain most of the formatting in order to update the contents? Recreating the original formatting would take a long time.

Read the article

Is there a way to change the order of tabs in Foxit Reader?

- by Harold

In web browsers you can drag the tabs to change the order. But I can't do that in Foxit Reader (version 3.1.4.1125, using Windows Vista Home, Chinese Traditional) Example: I open 3 files: Page2.pdf Page3.pdf Page1.pdf which opens Foxit Reader with a tab for each file, in the order Page2.pdf Page3.pdf Page1.pdf Is there a way to change the order of the tabs to Page1.pdf Page2.pdf Page3.pdf ? This would really be helpful when you have many files open... TIA! Harold

Read the article

openSSL tutorial not fully working - Can sign but cannot restore original file

- by djechelon

I'm writing, and testing, a little tutorial for my groupmates involved in an openSSL homework. We have a bunch of PDF files, I'm the CA and each one should send me a signed PDF for me to be verified. I've told them to do the following (and tried to do it by myself) Request and obtain a certificate (I'll skip this part) Create a MIME message with the PDF file in it makemime -c "text/pdf" -a "Content-Disposition: attachment; filename=”Elaborato.pdf" Elaborato.pdf > Elaborato.pdf.msg Sign with openSSL openssl smime -sign -in Elaborato.pdf.msg -out Elaborato.pdf.p7m -certfile ca.pem -certfile nomegruppo.crt -inkey nomegruppo.key -signer nomegruppo.crt Verify with openssl smime -verify -in Elaborato.pdf.p7m -out Elaborato-verified.msg -CAfile ca.pem -signer nomegruppo.crt Extract attachment with munpack Elaborato-verified.msg View with Acrobat Reader The problem is that even if I get a file that (from its binary content) resembles a PDF file my current Ubuntu PDF viewer doesn't read it. The XXXElaborato.pdf extracted by munpack is a little bit smaller than the original. What's the problem with this procedure? In theory, they should send me the signed S/MIME message and I should be able to read the PDF within it. Why can't I restore the original content of the PDF file?

Read the article

scrape data from a website and post it on the blog (wordpress)

- by Pennf0lio

This could be in DocType But I'm looking for a software or just a plugin for wordpress. I wanted to fetch those data from a website and automatically post it on my blog (Wordpress powered). It doesn't have rss or api to get those data, so I need to manually copy and paste it one-by-one and post it on wordpress. Do you know an alternative options on my process? or you know a software or a plugin that does the job? Thanks!

Read the article

IE won't load PDF in a window created with window.open

- by Dean

Here's the problem, which only occurs in Internet Explorer (IE). I have a page that has links to several different types of files. Links from these files execute a Javascript function that opens a new window and loads the specific file. This works great, unless the file I need to open in the new window is a PDF in which case the window is blank, even though the URL is in the address field. Refreshing that window using F5 doesn't help. However, if I put the cursor in the address field and press <enter> the PDF loads right up. This problem only occurs in IE. I have seen it in IE 7 and 8 and am using Adobe Acrobat Reader 9. In Firefox (PC and Mac) everything works perfectly. In Chrome (Mac), the PDF is downloaded. In Safari (Mac) it works. In Opera (Mac) it prompts me to open or save. Basically, everything probably works fine, except for IE. I have searched for similar problems and have seen some posts where it was suggested to adjust some of the Internet Options on IE. I have tried this but it doesn't help, and the problem wasn't exactly the same anyway. Here's the Javascript function I use to open the new window. function newwin(url,w,h) { win = window.open(url,"temp","width="+w+",height="+h+",menubar=yes,toolbar=yes,location=yes,status=yes,scrollbars=auto,resizable=yes"); win.focus(); } You can see that I pass in the URL as well as the height, h, and width, w, of the window. I've used a function like this for years and as far as I know have never had a problem. I call the newwin() function using this. <a href="javascript:newwin('/path/document.pdf',400,300)">document.pdf</a> (Yes, I know there are other, better ways than using inline JS, and I've even tried some of them because I've run out of things to try, but nothing works.) So, if anyone has an idea as to what might be causing this problem, I'd love to hear it.

Read the article

How to detect Javascript pop-up notifications in WatiN?

- by Ian P

I have a, what seems to be, rather common scenario I'm trying to work through. I have a site that accepts input through two different text fields. If the input is malformed or invalid, I receive a Javascript pop-up notification. I will not always receive one, but I should in the event of (like I said earlier) malformed data, or when a search result couldn't be found. How can I detect this in WatiN? A quick Google search produced results that show how to click through them, but I'm curious as to whether or not I can detect when I get one? In case anyone is wondering, I'm using WatiN to do some screen scraping for me, rather than integration testing :) Thanks in advance! Ian

Read the article

Scrape HTML tables from a given URL into CSV

- by dreeves

I seek a tool that can be run on the command line like so: tablescrape 'http://someURL.foo.com' [n] If n is not specified and there's more than one HTML table on the page, it should summarize them (header row, total number of rows) in a numbered list. If n is specified or if there's only one table, it should parse the table and spit it to stdout as CSV or TSV. Potential additional features: To be really fancy you could parse a table within a table, but for my purposes -- fetching data from wikipedia pages and the like -- that's overkill. The Perl module HTML::TableExtract can do this and may be good place to start for writing the tool I have in mind. An option to asciify any unicode. An option to apply an arbitrary regex substitution for fixing weirdnesses in the parsed table. Related questions: http://stackoverflow.com/questions/259091/how-can-i-scrape-an-html-table-to-csv http://stackoverflow.com/questions/1403087/how-can-i-convert-an-html-table-to-csv http://stackoverflow.com/questions/2861/options-for-html-scraping

Read the article

Screen scrape a web page that uses javaScript and frames

- by Mello

Hi, I want to scrape data from www.marktplaats.nl . I want to analyze the scraped description, price, date and views in Excel/Access. I tried to scrape data with Ruby (nokogiri, scrapi) but nothing worked. (on other sites it worked well) The main problem is that for example selectorgadget and the add-on firebug (Firefox) don’t find any css I can use to scrape the page. On other sites I can extract the css with selectorgadget or firebug and use it with nokogiri or scrapi. Due to lack of experience it is difficult to identify the problem and therefore searching for a solution isn’t easy. Can you tell me where to start solving this problem and where I maybe can find more info about a similar scraping process? Thanks in advance!

Read the article

Automating WebTrends analysis

- by tridium

Every week I access server logs processed by WebTrends (for about 7 profiles) and copy ad clickthrough and visitor information into Excel spreadsheets. A lot of it is just accessing certain sections and finding the right title and then copying the unique visitor information. I tried using WebTrends' built-in query tool but that is really poorly done (only uses a drag-and-drop system instead of text-based) and it has a maximum number of parameters and maximum length of queries to query with. As far as I know, the tools in WebTrends are not suitable to my purpose of automating the entire web metrics gathering process. I've gotten access to the raw server logs, but it seems redundant to parse that given that they are already being processed by WebTrends. To me it seems very scriptable, but how would I go about doing that? Is screen-scraping an option?

Read the article

Python GUI Scraper hanging issues.

- by bball

I wrote a scraper using python a while back, and it worked fine in the command line. I have made a GUI for the application now, but I am having trouble with one issue. When I attempt to update text inside the gui (e.g. 'fetching URL 12/50'), I am unable seeing as the function within the scraper is grabbing 100+ links. Also when going from one scraping function, to a function that should update the gui, to another function, the gui update function seems to be skipped over while the next scrape function is run. An example would be: scrapeLinksA() #takes 20 seconds updateInfo("LinksA done") scrapeLinksB() #takes another 20 seconds in the above example, updateInfo is never executed, unless I end the program with a KeyboardInterrupt. I'm thinking my solution is threading, but I'm not sure. What can I do to fix this? I am using: PyQt4 urllib2 BeautifulSoup

Read the article

Python Scraper for Javascript?

- by Diego

Hey all, Can anyone direct me to a good Python screen scraping library for javascript code (hopefully one with good documentation/tutorials)? I'd like to see what options are out there, but most of all the easiest to learn with fastest results... wondering if anyone had experience. I've heard some stuff about spidermonkey, but maybe there are better ones out there? Specifically, I use BeautifulSoup and Mechanize to get to here, but need a way to open the javascript popup, submit data, and download/parse the results in the javascript popup. <a href="javascript:openFindItem(12510109)" onclick="s_objectID="javascript:openFindItem(12510109)_1";return this.s_oc?this.s_oc(e):true">Find Item</a> I'd like to implement this with Google App engine and Django. Thanks!

Read the article

How to get InnerText of IFrame from another site?

- by Eclipsed4utoo

I am trying to do some screen-scraping of a website. The content that I want to get is inside of an IFrame. How do I get the InnerText or HTML that is being displayed inside of the IFrame? I am using .Net 4.0 and C#. I want to be able to do this from a WinForm. I tried this, but can't find where to get the actual data from... void PageCompleted(object sender, WebBrowserDocumentCompletedEventArgs e) { WebBrowser b = sender as WebBrowser; string response = b.DocumentText; HtmlElement element = b.Document.GetElementById("profileFrame"); if (element != null) { // do something with the data } } I've tried searching through the element but couldn't find any of the HTML. Is this possible?

Read the article

xvfb on a machine with a display, can an application run 'in the background?'

- by marfarma

I'm setting up to cron a web scraping job, using xvfb, firefox, and watir on my Mac OS X. In testing the script so far, firefox pops up visibly on the local desktop, the watir script executes, and then firefox exits (I quit firefox in my script). I'd like to set the xvfb DISPLAY such that firefox will run, but won't be seen on the local desktop, running 'in the background' so to speak. Nothing I've been able to find online discusses such a possibility - nor explains that it's not possible. Is it possible? If so, what do I need to do to make it work?

Read the article

Sending Adobe PDF attachments from Adobe Reader (in Outlook 2003) takes too long

- by White Island

I have a customer who is using Outlook 2003 (Microsoft Online Services) and Adobe reader 9+. When they send a PDF from Adobe reader to Outlook (via the Send as attachment to e-mail feature in Adobe), it freezes for 30 seconds to 5 minutes before the new e-mail pops up with the PDF attachment. I'm pretty sure the issue is on the Outlook side of things, as I've tried Adobe reader 8 and Foxit Reader with the same results (Windows XP/7 doesn't seem to make a difference, either). I tried Outlook in safe mode on the first (Win7) machine I was working on, and the e-mail attachment worked a lot faster, but when I tried to replicate the results on another machine, one wouldn't go into safe mode, the other didn't seem to show a difference. In an effort to fix the problem in Outlook normal mode, I tried disabling all add-ins, Com add-in (Office Communicator is the only one), reading pane, Word 2003 as e-mail editor... but none of these seemed to address the issue. Does anyone have any other ideas? I need to get this resolved as soon as possible, and it doesn't seem practical to make them run in safe mode. :P

Search Results

Search found 4479 results on 180 pages for 'pdf scraping'.

Page 47/180 | < Previous Page | 43 44 45 46 47 48 49 50 51 52 53 54 | Next Page >

- by Aseques

- by Aseques

- by Tim Pietzcker

- by Dave Jarvis

- by Dave Jarvis

- by Joe

- by wullxz

- by Nrew

- by Marcelo Cantos

- by Nifle

- by Nifle

- by user126389

- by Harold

- by djechelon

- by Pennf0lio

- by Dean

- by Ian P

- by dreeves

- by Mello

- by tridium

- by bball

- by Diego

- by Eclipsed4utoo

- by marfarma

- by White Island

< Previous Page | 43 44 45 46 47 48 49 50 51 52 53 54 | Next Page >