pdf parsing - Page 53 - Developer IT

pdf printer in wine ?

- by Arkapravo

I wish to convert (print) my MS Word files to pdf on the fly ! I am on Ubuntu 9.10 and using Wine 1.1.40. Can someone help ? I have heard that pdf printer can be installed using Wine Cups ! Thanks !

Read the article

Encrypt uploaded pdf files with mcrypt and php

- by microchasm

I'm currently set up with a CentOS box that utilizes mcrypt to encrypt/decrypt data to/from the database. In my haste, I forgot that I also need a solution to encrypt files (primarily pdf, with a xls and txt file here and there). Is there a way to utilize mcrypt to encrypt uploaded pdf files? I understand the possibility of file_get_contents() with txt; is a similar solution available for other formats? Thanks!

Read the article

open pdf when usb is plugged in

- by Funky Dude

i have a pdf file in an usb drive. how do i get it to open automatically when i plug the usb drive. no dialog or anything. just open the pdf directly after usb drive is plugged in. let's say we have to do this in win xp. autorun.inf doesnt seem to be able to do it.

Read the article

Software automatic content analysis of pdf files

- by iceman

Is there a commercial software which can perform automatic content analysis in a large collection of pdf documents which have tagged meta-data for easy classification? What is the technology Google uses to parse web-hosted pdf and rank them?

Read the article

Adobe pdf icon disappeared!

- by Nano8Blazex

Starting last week I've noticed that all my pdf files now have a generic white document as an icon instead of the original Adobe pdf icon... I've reinstalled Adobe Reader, repaired it, and have had no success in getting the original icon back. The generic document icon is really getting into my head now... it's just... generic. Is there any way to fix this?

Read the article

Encrypt pdf files with mcrypt and php

- by microchasm

I'm currently set up with a CentOS box that utilizes mcrypt to encrypt/decrypt data to/from the database. In my haste, I forgot that I also need a solution to encrypt files (primarily pdf, with a xls and txt file here and there). Is there a way to utilize mcrypt to encrypt pdf files? I understand the possibility of file_get_contents() with txt; is a similar solution available for other formats? Thanks!

Read the article

ImageMagick's `convert` utility takes too much memory with PDF input

- by o_O Tync

I often use ImageMagick's convert for *-PNG conversion, but when PDF has more than 50 pages — convert eats more that 3 Gib (!!!) of memory. I guess it first loads everything. That's unacceptable. It should read PDF page by page, why the heck all of them at once! Maybe there's a way to tune it somehow? Or any good alternatives?

Read the article

Where can I download a full .pdf copy of the Cisco Internetworking Technology Handbook, rather than

- by flesh

Cisco publish the Internetworking Technology Handbook on their website but they only provide the individual chapters in pdf. Is there a .pdf of the entire thing available?

Read the article

PPTX to PDF -> Hyperlinks failed

- by Joe

Hi, I have a Powerpoint 2007 document with hyperlinks to other slides and external docs both on the master layout and on serveral slides. When I try to convert it to PDF, it loses all the hyperlinks. I have the following Questions: Which PDF-Converter does not kill my Hyperlinks made in the Powerpoint Layout master? I tried Adobe and DoPDF. And how do I have to setup the prog/converter? thanks in advance!

Read the article

how to change pdf reader of chrome on linux

- by chenge

it seems chrome uses xdg-open to open pdf file. how to change to another pdf reader? i use linux mint 8 LXDE. thanks!

Read the article

Populate Multiple PDFs

- by gmcalab

I am using itextsharp to populate my PDFs. I have no issues with this. Basically what I am doing is getting the PDF and populating the fields in memory then passing back the MemoryStream to be displayed on a webpage. All this is working with a single document PDF. What I am trying to figure out now, is merging multiple PDFs into one MemoryStream. The part I cant figure out is, the documents I am populating are identical. So for example, I have a List<Person> that contains 5 persons. I want to fill out a PDF for each person and merge them all into one, in memory. Bare in mind I am going to fill out the same type of document for each person. The problem I am getting is that when I try to add a second copy of the same PDF to be filled out for the second iteration, it just overwrites the first populated PDF, since it's the same document, therefore not adding a second copy for the second Person at all. So basically if I had the 5 people, I would end up with a single page with the data of the 5th person, instead of a PDF with 5 like pages that contain the data of each person respectively. Here's some code... MemoryStream ms = ms = new MemoryStream(); PdfReader docReader = null; PdfStamper Stamper = null; List<Person> persons = new List<Person>() { new Person("Larry", "David"), new Person("Dustin", "Byfuglien"), new Person("Patrick", "Kane"), new Person("Johnathan", "Toews"), new Person("Marian", "Hossa") }; try { // Iterate thru all persons and populate a PDF for each foreach(var person in persons){ PdfCopyFields Copier = new PdfCopyFields(ms); Copier.AddDocument(GetReader("Person.pdf")); Copier.Close(); docReader = new PdfReader(ms.ToArray()); Stamper = new PdfStamper(docReader, ms); AcroFields Fields = Stamper.AcroFields; Fields.SetField("FirstName", person.FirstName); } }catch(Exception e){ // handle error }finally{ if (Stamper != null) { Stamper.Close(); } if (docReader != null) { docReader.Close(); } }

Read the article

how to parse pdf document in iphone or ipad?

- by JohnWhite

I have implemented pdf parsing application for ipad.But content of pdf display not clearly.I dont know why this happning.can you help me for this problem. Edit: I have used below code for pdf parsing.But text and image is not clear. (id)initWithFrame:(CGRect)frame { if (self = [super initWithFrame:frame]) { // Initialization code /* Demo PDF printed directly from Wikipedia without permission; all content (c) respective owners */ CFURLRef pdfURL = CFBundleCopyResourceURL(CFBundleGetMainBundle(), CFSTR("2008_11_mp.pdf"), NULL, NULL); pdf = CGPDFDocumentCreateWithURL((CFURLRef)pdfURL); CFRelease(pdfURL); counts+=1; self.pageNumber = counts; self.backgroundColor = nil; self.opaque = NO; self.userInteractionEnabled = NO; } return self; } (void)drawRect:(CGRect)rect { // Drawing code CGContextRef context = UIGraphicsGetCurrentContext(); // PDF page drawing expects a Lower-Left coordinate system, so we flip the coordinate system // before we start drawing. // Grab the first PDF page CGPDFPageRef page = CGPDFDocumentGetPage(pdf, pageNumber); CGContextTranslateCTM(context, 0, self.bounds.size.height); CGContextScaleCTM(context, 1, -1); // We're about to modify the context CTM to draw the PDF page where we want it, so save the graphics state in case we want to do more drawing CGContextSaveGState(context); // CGPDFPageGetDrawingTransform provides an easy way to get the transform for a PDF page. It will scale down to fit, including any // base rotations necessary to display the PDF page correctly. CGAffineTransform pdfTransform = CGPDFPageGetDrawingTransform(page, kCGPDFMediaBox, self.bounds, 0, true); // And apply the transform. CGContextConcatCTM(context, pdfTransform); // Finally, we draw the page and restore the graphics state for further manipulations! CGContextDrawPDFPage(context, page); CGContextRestoreGState(context); } Please help me solve this problem

Read the article

Idea of an algorithm to detect a website's navigation structure?

- by Uwe Keim

Currently I am in the process of developing an importer of any existing, arbitrary (static) HTML website into the upcoming release of our CMS. While the downloading the files is solved successfully, I'm pulling my hair off when it comes to detect a site structure (pages and subpages) purely from the HTML files, without the user specifying additional hints. Basically I want to get a tree like: + Root page 1 + Child page 1 + Child page 2 + Child child page1 + Child page 3 + Root page 2 + Child page 4 + Root page 3 + ... I.e. I want to be able to detect the menu structure from the links inside the pages. This has not to be 100% accurate, but at least I want to achieve more than just a flat list. I thought of looking at multiple pages to see similar areas and identify these as menu areas and parse the links there, but after all I'm not that satisfied with this idea. My question: Can you imagine any algorithm when it comes to detecting such a structure? Update 1: What I'm looking for is not a web spider, but an algorithm do create a logical tree of the relationship of the pages to be able to create pages and subpages inside my CMS when importing them. Update 2: As of Robert's suggestion I'll solve this by starting at the root page, and then simply parse links as you go and treat every link inside a page simply as a child page. Probably I'll recurse not in a deep-first manner but rather in a breadth-first manner to get a more balanced navigation structure.

Read the article

Download the ZFSSA Objection Handling document (PDF)

- by swalker

View and download the new ZFS Storage Appliance objection handling document from the Oracle HW Technical Resource Centre here. This document aims to address the most common objections encountered when positioning the ZFS Storage Appliance disk systems in production environments. It will help you to be more successful in establishing the undeniable benefits of the Oracle ZFS Storage Appliance in your customers´ IT environments. If you do not already have an account to access the Oracle Hardware Technical Resource Centre, please click here and follow the instructions to register.

Read the article

How should I implement a command processing application?

- by Nini Michaels

I want to make a simple, proof-of-concept application (REPL) that takes a number and then processes commands on that number. Example: I start with 1. Then I write "add 2", it gives me 3. Then I write "multiply 7", it gives me 21. Then I want to know if it is prime, so I write "is prime" (on the current number - 21), it gives me false. "is odd" would give me true. And so on. Now, for a simple application with few commands, even a simple switch would do for processing the commands. But if I want extensibility, how would I need to implement the functionality? Do I use the command pattern? Do I build a simple parser/interpreter for the language? What if I want more complex commands, like "multiply 5 until >200" ? What would be an easy way to extend it (add new commands) without recompiling? Edit: to clarify a few things, my end goal would not be to make something similar to WolframAlpha, but rather a list (of numbers) processor. But I want to start slowly at first (on single numbers). I'm having in mind something similar to the way one would use Haskell to process lists, but a very simple version. I'm wondering if something like the command pattern (or equivalent) would suffice, or if I have to make a new mini-language and a parser for it to achieve my goals?

Read the article

Persisting NLP parsed data

- by tjb1982

I've recently started experimenting with NLP using Stanford's CoreNLP, and I'm wondering what are some of the standard ways to store NLP parsed data for something like a text mining application? One way I thought might be interesting is to store the children as an adjacency list and make good use of recursive queries (postgres supports this and I've found it works really well). Something like this: Component ( id, POS, parent_id ) Word ( id, raw, lemma, POS, NER ) CW_Map ( component_id, word_id, position int ) But I assume there are probably many standard ways to do this depending on what kind of analysis is being done that have been adopted by people working in the field over the years. So what are the standard persistence strategies for NLP parsed data and how are they used?

Read the article

How can I best manage making open source code releases from my company's confidential research code?

- by DeveloperDon

My company (let's call them Acme Technology) has a library of approximately one thousand source files that originally came from its Acme Labs research group, incubated in a development group for a couple years, and has more recently been provided to a handful of customers under non-disclosure. Acme is getting ready to release perhaps 75% of the code to the open source community. The other 25% would be released later, but for now, is either not ready for customer use or contains code related to future innovations they need to keep out of the hands of competitors. The code is presently formatted with #ifdefs that permit the same code base to work with the pre-production platforms that will be available to university researchers and a much wider range of commercial customers once it goes to open source, while at the same time being available for experimentation and prototyping and forward compatibility testing with the future platform. Keeping a single code base is considered essential for the economics (and sanity) of my group who would have a tough time maintaining two copies in parallel. Files in our current base look something like this: > // Copyright 2012 (C) Acme Technology, All Rights Reserved. > // Very large, often varied and restrictive copyright license in English and French, > // sometimes also embedded in make files and shell scripts with varied > // comment styles. > > > ... Usual header stuff... > > void initTechnologyLibrary() { > nuiInterface(on); > #ifdef UNDER_RESEARCH > holographicVisualization(on); > #endif > } And we would like to convert them to something like: > // GPL Copyright (C) Acme Technology Labs 2012, Some rights reserved. > // Acme appreciates your interest in its technology, please contact [email protected] > // for technical support, and www.acme.com/emergingTech for updates and RSS feed. > > ... Usual header stuff... > > void initTechnologyLibrary() { > nuiInterface(on); > } Is there a tool, parse library, or popular script that can replace the copyright and strip out not just #ifdefs, but variations like #if defined(UNDER_RESEARCH), etc.? The code is presently in Git and would likely be hosted somewhere that uses Git. Would there be a way to safely link repositories together so we can efficiently reintegrate our improvements with the open source versions? Advice about other pitfalls is welcome.

Read the article

PDF or ebook Java API documentation

- by AmaDaden

Since I have a long train ride to and from work I was wondering if there is a version of the Java API documentation floating around that I could put on my Kindle. It would be nice on the rare occasion I get something in my head that I want to think about some more. I know I can browse the web through the Kindle but coverage is spotty and slow. I know that the api docs are not really designed for a sequential reading format but I'm curious to see if anyone else has thought about this and given it a shot.

Read the article

How To Make Images, Music, Video, and PDF Files Open On The Desktop in Windows 8

- by Chris Hoffman

Windows 8 opens many types of files in the Windows 8 interface formerly known as Metro by default. If you’re at the desktop and double-click many types of media files, you’ll see a full-screen media viewer. You can easily prevent these media files from opening in the full-screen Windows 8 apps when you double-click them. All you have to do is change your default programs. What Is the Purpose of the “Do Not Cover This Hole” Hole on Hard Drives? How To Log Into The Desktop, Add a Start Menu, and Disable Hot Corners in Windows 8 HTG Explains: Why You Shouldn’t Use a Task Killer On Android

Read the article

What is the simplest human readable configuration file format?

- by Juha

Current configuration file is as follows: mainwindow.title = 'test' mainwindow.position.x = 100 mainwindow.position.y = 200 mainwindow.button.label = 'apply' mainwindow.button.size.x = 100 mainwindow.button.size.y = 30 logger.datarate = 100 logger.enable = True logger.filename = './test.log' This is read with python to a nested dictionary: { 'mainwindow':{ 'button':{ 'label': {'value':'apply'}, ... }, 'logger':{ datarate: {'value': 100}, enable: {'value': True}, filename: {'value': './test.log'} }, ... } Is there a better way of doing this? The idea is to get XML type of behavior and avoid XML as long as possible. The end user is assumed almost totally computer illiterate and basically uses notepad and copy-paste. Thus the python standard "header + variables" type is considered too difficult. The dummy user edits the config file, able programmers handle the dictionaries. Nested dictionary is chosen for easy splitting (logger does not need or even cannot have/edit mainwindow parameters).

Read the article

Persisting natural language processing parsed data

- by tjb1982

I've recently started experimenting with natural language processing (NLP) using Stanford's CoreNLP, and I'm wondering what are some of the standard ways to store NLP parsed data for something like a text mining application? One way I thought might be interesting is to store the children as an adjacency list and make good use of recursive queries (Postgres supports this and I've found it works really well). But I assume there are probably many standard ways to do this depending on what kind of analysis is being done that have been adopted by people working in the field over the years. So what are the standard persistence strategies for NLP parsed data and how are they used?

Read the article

What is this algorithm for converting strings into numbers called?

- by CodexArcanum

I've been doing some work in Parsec recently, and for my toy language I wanted multi-based fractional numbers to be expressible. After digging around in Parsec's source a bit, I found their implementation of a floating-point number parser, and copied it to make the needed modifications. So I understand what this code does, and vaguely why (I haven't worked out the math fully yet, but I think I get the gist). But where did it come from? This seems like a pretty clever way to turn strings into floats and ints, is there a name for this algorithm? Or is it just something basic that's a hole in my knowledge? Did the folks behind Parsec devise it? Here's the code, first for integers: number' :: Integer -> Parser Integer number' base = do { digits <- many1 ( oneOf ( sigilRange base )) ; let n = foldl (\x d -> base * x + toInteger (convertDigit base d)) 0 digits ; seq n (return n) } So the basic idea here is that digits contains the string representing the whole number part, ie "192". The foldl converts each digit individually into a number, then adds that to the running total multiplied by the base, which means that by the end each digit has been multiplied by the correct factor (in aggregate) to position it. The fractional part is even more interesting: fraction' :: Integer -> Parser Double fraction' base = do { digits <- many1 ( oneOf ( sigilRange base )) ; let base' = fromIntegral base ; let f = foldr (\d x -> (x + fromIntegral (convertDigit base d))/base') 0.0 digits ; seq f (return f) Same general idea, but now a foldr and using repeated division. I don't quite understand why you add first and then divide for the fraction, but multiply first then add for the whole. I know it works, just haven't sorted out why. Anyway, I feel dumb not working it out myself, it's very simple and clever looking at it. Is there a name for this algorithm? Maybe the imperative version using a loop would be more familiar?

Read the article

How to add precedence to LALR parser like in YACC?

- by greenoldman

Please note, I am asking about writing LALR parser, not writing rules for LALR parser. What I need is... ...to mimic YACC precedence definitions. I don't know how it is implemented, and below I describe what I've done and read so far. For now I have basic LALR parser written. Next step -- adding precedence, so 2+3*4 could be parsed as 2+(3*4). I've read about precedence parsers, however I don't see how to fit such model into LALR. I don't understand two points: how to compute when insert parenthesis generator how to compute how many parenthesis the generator should create I insert generators when the symbols is taken from input and put at the stack, right? So let's say I have something like this (| denotes boundary between stack and input): ID = 5 | + ..., at this point I add open, so it gives ID = < 5 | + ..., then I read more input ID = < 5 + | 5 ... and more ID = < 5 + 5 | ; ... and more ID = < 5 + 5 ; | ... At this point I should have several reduce moves in normal LALR, but the open parenthesis does not match so I continue reading more input. Which does not make sense. So this was when problem. And about count, let's say I have such data < 2 + < 3 * 4 >. As human I can see that the last generator should create 2 parenthesis, but how to compute this? After all there could be two scenarios: ( 2 + ( 3 *4 )) -- parenthesis is used to show the outcome of generator or (2 + (( 3 * 4 ) ^ 5) because there was more input Please note that in both cases before 3 was open generator, and after 4 there was close generator. However in both cases, after reading 4 I have to reduce, so I have to know what generator "creates".

Read the article

LL(8) and left-recursion

- by Peregring-lk

I want to understand the relation between LL/LR grammars and the left-recursion problem (for any question I know parcially the answer, but I ask them as I don't know nothing, because I am a little confused now, and prefer complete answers) I'm happy with sintetized or short and direct answers (or just links solving it unambiguously): What type of language isn't LL(8) languages? LL(K) and LL(8) have problems with left-recursion? Or only LL(k) parsers? LALR(1) parser have troubles with left or right recursion? What type of troubles? Only in terms of the LL/LALR comparision. What is better, Bison (LALR(1)) or Boost.Spirit (LL(8))? (Let's suppose other features of them are irrelevant in this question) Why GCC use a (hand-made) LL(8) parser? Only for the "handling-error" problem?

Read the article

Getting data from a webpage in a stable and efficient way

- by Mike Heremans

Recently I've learned that using a regex to parse the HTML of a website to get the data you need isn't the best course of action. So my question is simple: What then, is the best / most efficient and a generally stable way to get this data? I should note that: There are no API's There is no other source where I can get the data from (no databases, feeds and such) There is no access to the source files. (Data from public websites) Let's say the data is normal text, displayed in a table in a html page I'm currently using python for my project but a language independent solution/tips would be nice. As a side question: How would you go about it when the webpage is constructed by Ajax calls?

Search Results

Search found 7251 results on 291 pages for 'pdf parsing'.

Page 53/291 | < Previous Page | 49 50 51 52 53 54 55 56 57 58 59 60 | Next Page >

- by Arkapravo

- by microchasm

- by Funky Dude

- by iceman

- by Nano8Blazex

- by microchasm

- by o_O Tync

- by flesh

- by Joe

- by chenge

- by gmcalab

- by JohnWhite

- by Uwe Keim

- by swalker

- by Nini Michaels

- by tjb1982

- by DeveloperDon

- by AmaDaden

- by Chris Hoffman

- by Juha

- by tjb1982

- by CodexArcanum

- by greenoldman

- by Peregring-lk

- by Mike Heremans

< Previous Page | 49 50 51 52 53 54 55 56 57 58 59 60 | Next Page >