Search Results

Search found 4479 results on 180 pages for 'pdf scraping'.

Page 6/180 | < Previous Page | 2 3 4 5 6 7 8 9 10 11 12 13  | Next Page >

  • Effect of HOME on libreoffice to convert to pdf as non-root user

    - by user1032531
    I installed libreoffice-headless and can convert documents when logged on as root. I then tried doing so as another user, and it didn't show an error, but didn't convert the file. I then found that if I get rid of the HOME=/tmp/ayb, it works with the other user. Doesn't HOME=/tmp/ayb just allow files to default to this directory if not specified? (Sorry, I tried to search "Linux HOME", but as you probably expect, received a bunch of non-relevant results). If not, what is the purpose of specifying HOME? Why does setting HOME prevent it from converting on non-root users? Note that /tmp and /tmp/ayb or both 0777. Thank you [root@desktop ~]# yum install libreoffice-headless [root@desktop ~]# yum install libreoffice-writer [root@desktop ~]# ls -l total 48 -rwxrwxrwx. 1 NotionCommotion NotionCommotion 48128 Jul 30 02:38 document_34.doc [root@desktop ~]# HOME=/tmp/ayb; /usr/bin/libreoffice --headless -convert-to pdf --outdir /tmp/ayb /tmp/ayb/document_34.doc convert /tmp/ayb/document_34.doc -> /tmp/ayb/document_34.pdf using writer_pdf_Export [root@desktop ~]# rm d*.pdf rm: remove regular file `document_34.pdf'? y [root@desktop ~]# /usr/bin/libreoffice --headless -convert-to pdf --outdir /tmp/ayb /tmp/ayb/document_34.doc convert /tmp/ayb/document_34.doc -> /tmp/ayb/document_34.pdf using writer_pdf_Export [root@desktop ~]# rm d*.pdf rm: remove regular file `document_34.pdf'? y [root@desktop ~]# su NotionCommotion sh-4.1$ HOME=/tmp/ayb; /usr/bin/libreoffice --headless -convert-to pdf --outdir /tmp/ayb /tmp/ayb/document_34.doc sh-4.1$ rm d*.pdf rm: cannot remove `d*.pdf': No such file or directory sh-4.1$ /usr/bin/libreoffice --headless -convert-to pdf --outdir /tmp/ayb /tmp/ayb/document_34.doc sh-4.1$ rm d*.pdf rm: cannot remove `d*.pdf': No such file or directory sh-4.1$ exit exit [root@desktop ~]# su NotionCommotion sh-4.1$ /usr/bin/libreoffice --headless -convert-to pdf --outdir /tmp/ayb /tmp/ayb/document_34.doc convert /tmp/ayb/document_34.doc -> /tmp/ayb/document_34.pdf using writer_pdf_Export sh-4.1$ rm d*.pdf sh-4.1$ HOME=/tmp/ayb; /usr/bin/libreoffice --headless -convert-to pdf --outdir /tmp/ayb /tmp/ayb/document_34.doc sh-4.1$ rm d*.pdf rm: cannot remove `d*.pdf': No such file or directory sh-4.1$ /usr/bin/libreoffice --headless -convert-to pdf --outdir /tmp/ayb /tmp/ayb/document_34.doc sh-4.1$ rm d*.pdf rm: cannot remove `d*.pdf': No such file or directory sh-4.1$

    Read the article

  • Recommendation for a simple no-frills Windows PDF printer driver?

    - by Scott Bussinger
    I'm looking for an extremely simple Windows PDF printer driver that I can recommend to clients. Ideally it would have these characteristics: When you print something, it should just create it as a temporary file and then display it in their default PDF viewer with no prompting. If they want to save it, they can save it manually from inside the viewer. This workflow should be with no special post-install configuration. Installation should be very simple. A double click the installation program and click "Finish" sort of thing. No complicated multi-step installation, no asking questions your grandmother wouldn't know the answer to (preferably no questions at all), no extra crap being installed. An option for a completely silent installation would be nice, but not necessary. Ideally it would be free to simplify their installing on a small network, but low cost is an option. I've tried a quite a few but none really fit the bill. Some can achieve the first goal but only after careful configuration, some try to install extra toolbars, some have other installation complexities that would make it hard for extremely novice users to succeed. Any suggestions? Thanks!

    Read the article

  • PDF writeHTML for rendering tables

    - by sami
    I'm using TCPDF and following this example which uses writeHTML and heredoc syntax to write a table. http://www.tcpdf.org/examples/example_048.phps I'm trying to do though is switch the font using setFont before writing each column. This means I have to break the html and write it on different pieces (see pseudo code). But once I break the HTML like that, it doesn't work, I think because it becomes invalid. writeHTML <tr> writeHTML first column setFont() writeHTML second column writeHTML </tr> I want to use writeHTML because it helps me write the table without having to manually construct. But how can I make modifications before I output each cell?

    Read the article

  • Adding page title to each page while creating a PDF file using itextsharp in VB.NET

    - by Snowy
    I have recently started using itextsharp and gradually learning it. So far I created a PDF file and it seems great. I have added a table and some subtables as the first table cells to hold data. It is done using two for loops. The first one loops through all data and the second one is each individual data displayed in columns. The html outcome looks like the following: <table> <tr> <td>Page title in center</td> </tr> <tr> <td> <table> <tr> <td>FirstPersonName</td> <td>Rank1</td> <td>info1a</td> <td>infob</td> <td>infoc</td> </tr> </table> </td> <td> <table> <tr> <td>SecondPersonName</td> <td>Rank2</td> <td>info1a</td> <td>infob</td> <td>infoc</td> <td>infod</td> <td>infoe</td> </tr> </table> </td> <td> <table> <tr> <td>ThirdPersonName</td> <td>Rank2</td> <td>info1a</td> <td>infob</td> <td>infoc</td> <td>infod</td> <td>infoe</td> <td>infof</td> <td>infog</td> </tr> </table> </td> </tr> </table> For page headings, I added a cell at the top before any other cells. I need to add this heading to all pages. Depending on the size of data, some pages have two rows and some pages have three rows of data. So I can not tell exactly when the new page starts to add the heading/title. My question is how to add the heading/title to all pages. I use VB.net. I searched for answer online and had no success. Your help would be greatly appreciated.

    Read the article

  • Screen scraping over SSL with .NET

    - by Even Mien
    What solutions exist for screen scraping a site over SSL for use with .NET? My use case is that I need to login to a partner website (https), navigate through a dynamic hierarchy, and download a zipped file of reports. I certainly could use other screen scrapers if there are no good viable options in .NET, either though the framework or OSS.

    Read the article

  • screen scraping

    - by sam
    hello folks., i am screen scraping a website which is in danish language.. i am unable to scrape certain characters as like må .. any idea to solve this? thanks

    Read the article

  • Scraping data from Flash (Games)

    - by awegawef
    I saw this video, and I am really curious how it was performed. Does anyone have any ideas? My intuition is that he scraped pixels from the screen (one per 'box'), and then fed that into some program to determine the next move. Is scraping pixel-by-pixel the way to do this, or is there a better way? I am looking to do something similar with either Java or Python. Thanks

    Read the article

  • HTML Scraping in Php.

    - by tsellon
    I've been doing some html scraping in PHP using regular expressions. This works, but the result is finicky and fragile. Has anyone used any packages that provide a more robust solution? A config driven solution would be ideal, but I'm not picky.

    Read the article

  • Scraping &#151 character (long dash) error in Nokogiri

    - by DavidP6
    I having trouble scraping a certain long dash that is encoded as — ; on the Time magazine site. It looks like this: —. It works fine when this dash is encoded as mdash, but when the problem dash is scraped, it is returned as unknown characters. I am using Nokogiri and am wondering if I have to use some sort of special encoding? The page says it is encoded with UTF-8.

    Read the article

  • Converting PDF eBooks into a Kindle format

    - by Ender
    Over the past couple of years I've amassed quite a collection of guides, tutorials and ebooks in PDF format. A lot of these are quite useful for work, especially PDF documentation, and rather than have to be at a computer every time I want to read how to do something in Sitecore or to read through a software testing ebook I'd like to do it on my brand-spanking-new Kindle. However, even though there is now a native PDF reader on the Kindle due to the nature of PDF's they are practically unreadable. The text doesn't wrap due to how PDF's are sized and so far after a bunch of Google searches I've yet to find a viable solution to get my PDF's converted into a readable Kindle format. Sometimes these books have code or pictures/tables in them, but most of the time they're text-heavy and to be honest I'd be surprised if there wasn't a free tool to handle the converting of PDF to one of the (seemingly many) Kindle formats. So, can anyone help me out with this? EDIT: I've tried Calibre, and have checked their forums to play with some of the advanced settings, yet the solutions available seem to be extremely poor, especially if the book you're attempting to read contains equations, code, or anything outside of plain text. I've also tried Amazon's conversion service, which wasn't much help with such documents. The best way I have found so far is to build the entire thing over again in ePub or RTF format and convert to MOBI from there. This works for text-heavy books with tables, but anything technical still isn't covered. Can anyone help with this?

    Read the article

  • What is a good Foxit reader equivalent (or other PDF editor)?

    - by Yanick Rochon
    On Windows, I have found Foxit Reader to be quite handy when I need to highlight texts in PDF document, make annotations, etc. etc. Unfortunately, I have not yet found product as user friendly (which also does not corrupt PDF files...) and full-featured as Foxit software... Any recommendations? ** UPDATE ** I just tried the Open Office PDF import extension. It seems to work ok... If anyone used it for a while, I'd appreciate your feedback on that one. Thanks! ** UPDATE ** You can't highlight text with OpenOffice's PDF extension. Doesn't matter, I was reading this thread and found out about Xournal . As it turns out, it's in the repository. It does not natively save in PDF, but once all edits are done, the document can be exported to PDF (and overwrite the old one, just like Gimp with the native .XCE format and original PNG file, for example) I realize that this question is no longer a question in itself, but could be migrated to community wiki. However, feedbacks are still welcome! ** EDIT ** So... to close up this question, I have to say that I adopted Xournal . It is light and works pretty well, even on multi-page PDF documents. Thank you all for your answers!

    Read the article

  • Saving a datawindow as PDF in PB 10.5

    - by gd047
    I have a grid datawindow with a picture in it's background (with dimensions of an A4 page) and I would like to export both data and the picture as a (single page) PDF file. I used several combinations of the following commands but at most I got a 0-sized pdf. //dw_1.Modify("Datawindow.Export.PDF.Method = Distill! ") //dw_1.Modify("DataWindow.Export.PDF.Method = XSLFOP! ") dw_1.Object.DataWindow.Export.PDF.Method = Distill! //dw_1.Object.DataWindow.Printer = "\\prntsrvr\pr-6" dw_1.Object.DataWindow.Export.PDF.Distill.CustomPostScript="No" dw_1.SaveAs("c:\dw_one.pdf", PDF!, false) User’s guide (on page 533) says: … the data is printed to a PostScript file and automatically distilled to PDF using GNU Ghostscript… Installing Ghostscript For licensing reasons, Ghostscript is not installed with PowerBuilder. You (and your users) must download and install it before you can use this technique… Does anyone have any idea what is the procedure?

    Read the article

  • PDF text search and split library

    - by Horace Ho
    I am look for a server side PDF library (or command line tool) which can: split a multi-page PDF file into individual PDF files, based on a search result of the PDF file content Examples: Search "Page ???" pattern in text and split the big PDF into 001.pdf, 002,pdf, ... ???.pdf A server program will scan the PDF, look for the search pattern, save the page(s) which match the patten, and save the file in the disk. It will be nice with integration with PHP / Ruby. Command line tool is also acceptable. It will be a server side (linux or win32) batch processing tool. GUI/login is not supported. i18n support will be nice but no required. Thanks~

    Read the article

  • Print PDF from ASP.Net without preview

    - by nmiranda
    Hi, I've generated a pdf using iTextSharp and I can preview it very well in ASP.Net but I need to send it directly to printer without a preview. I want the user to click the print button and automatically the document prints. I know that a page can be sent directly to printer using the javascript window.print() but I don't know how to make it for a PDF. Edit: it is not embedded, I generate it like this; ... FileStream stream = new FileStream(Request.PhysicalApplicationPath + "~1.pdf", FileMode.Create); Document pdf = new Document(PageSize.LETTER); PdfWriter writer = PdfWriter.GetInstance(pdf, stream); pdf.Open(); pdf.Add(new Paragraph(member.ToString())); pdf.Close(); Response.Redirect("~1.pdf"); ... And here I am.

    Read the article

  • Convert word to "JPEG-like" pdf file

    - by Chheang
    I've got a word document I'm trying to save to an uneditable, unselectable PDF file. Essentially, I'd like it to look like a JPEG, but in PDF format. I'm trying to avoid "printing to tiff, THEN printing to PDF." I'd prefer to go directly from Word to PDF. Additionally, I don't want to add a Password or anything. Does an option exist for this? Thanks!

    Read the article

  • Restrict print copies on a PDF

    - by Chops
    We have a very specific use case for an application we're developing, where a user will be presented with a PDF document, which they can print off. However, we need to be able to restrict the PDF so it can only be printed off once. Does anyone know if there's a way to restrict the number of times a PDF can be printed. I'm aware the PDF spec has lots of security features, but I've not found reference to anything like this before. Many thank.

    Read the article

  • Why using Acrobat 10 resaving a PDF file that was 4MB will become 3MB?

    - by Jian Lin
    I had some PDF files and just try to open it and do some highlighting using Acrobat 10 (also called Adode Reader X)... After highlighting, I save the file (using a different filename), and now the file change from 4MB to 3MB... is it just compression? Or making the images have lower clarity? (thought I cannot see any difference). What is the reason? If it is just compression, then why wasn't it done before, as winzip technology is quite mature more than even 10, 12 year ago.

    Read the article

  • PDF Form Field Manipulation

    - by 108039818756939362532
    I'm making a web interface to autofill pdf forms with user data from a database. The admin needs to be able to upload a pdf (right now targeted at IRS pdf forms) and then associate the fields in the pdf with data fields in the database. I need a way to help the admin associate the field names (stuff like "topmostSubform[0].Page2[0].p2-t66[0]") with the the data fields in the database. I'm looking for a way to modify the PDF programatically to in some way provide this information. Basically I'm open to suggestions on how I might make the field names appear in an obvious manner on a modified version of the original pdf. The closest I've gotten is being able to insert Tooltips into the fields in the pdf by just editting the raw pdf line by line. However when editting the pdf in this manner the field names are gibberish, and so I can't just use them. An optimal solution would be anything that could automatically parse a pdf and set each field's tooltip to be the fields name. Anything that can be run from the command line, or any python tool, or just a basic how to correctly parse a field's name from a raw pdf file would be amazing.

    Read the article

  • Dynamically generated PDF files working in most readers except Adobe Reader

    - by Shane
    I'm trying to dynamically generate PDFs from user input, where I basically print the user input and overlay it on an existing PDF that I did not create. It works, with one major exception. Adobe Reader doesn't read it properly, on Windows or on Linux. QuickOffice on my phone doesn't read it either. So I thought I'd trace the path of me creating the files - 1 - Original PDF of background PDF 1.2 made with Adobe Distiller with the LZW encoding. I didn't make this. 2 - PDF of background PDF 1.4 made with Ghostscript. I used pdf2ps then ps2pdf on the above to strip LZW so that the reportlab and pyPDF libraries would recognize it. Note that this file looks "fuzzy," like a bad scan, in Adobe Reader, but looks fine in other readers. 3 - PDF of user-input text formatted to be combined with background PDF 1.3 made with Reportlab from user input. Opens properly and looks good in every reader I've tried. 4 - Finished PDF PDF 1.3 made from PyPDF's mergePage() function on 2 and 3. Does not open in: Adobe Reader for Windows Adobe Reader for Linux QuickOffice for Android Opens perfectly in: Google Docs' PDF viewer on the web evince for linux ghostscript viewer for linux Foxit reader for Windows Preview for Mac Are there known issues that I should know about? I don't know exactly what "flate" is, but from the internet I gather that it's some sort of open source alternative to LZW for PDF compression? Could that be causing my problem? If so, are there any libraries I could use to fix the cause in my code?

    Read the article

  • page posting issue when working in Screen Scraping

    - by Muhammad Akhtar
    Hi, I am working on screen scraping and done successfully in 3 websites, I have an issue in last website here is my url, When I hit with my parameter, it is showing result on next page, simply posting to other page and showing the result fine on other page Here is My Test However, when I hit from my application, since here I don't have an option to post, it only fetch html of requested page that is obviously my above mention HTML test link, that actually have parameter in URL to get the result. How can I handle this situtation? Please give me hint. Thanks here is my C# code, I am using HTMLAgality String url; HtmlWeb hw = new HtmlWeb(); HtmlDocument doc; url = "http://mysampleURL"; doc = hw.Load(url);

    Read the article

  • Screen Scraping

    - by Sambo
    Hi I'm trying to implement a screen scraping scenario on my website and have the following set so far. What I'm ultimately trying to do is replace all links in the $results variable that have "ResultsDetails.aspx?" to "results-scrape-details/" then output again. Can anyone point me in the right direction? <?php $url = "http://mysite:90/Testing/label/stuff/ResultsIndex.aspx"; $raw = file_get_contents($url); $newlines = array("\t","\n","\r","\x20\x20","\0","\x0B"); $content = str_replace($newlines, "", html_entity_decode($raw)); $start = strpos($content,"<div id='pageBack'"); $end = strpos($content,'</body>',$start) + 6; $results = substr($content,$start,$end-$start); $pattern = 'ResultsDetails.aspx?'; $replacement = 'results-scrape-details/'; preg_replace($pattern, $replacement, $results); echo $results;

    Read the article

< Previous Page | 2 3 4 5 6 7 8 9 10 11 12 13  | Next Page >