I have a pdf file with English and Hindi Text in it and I need to extract text into raw text(utf-8).
I tried using openoffice but the hindi characters get ruined
given a URL like www.mysampleurl.com is it possible to crawl through the site and extract links for all PDFs that might exist?
I've gotten the impression that Python is good for this kind of stuff. but is this feasible to do? how would one go about implementing something like this?
also, assume that the site does not let you visit something like www.mysampleurl.com/files/
Hello,
I have :
$date = $actualite['date'];
$actualite['date'] is a TIMESTAMP
and i was wondering how i can extract from this timestamp the day, then the month, then the year in 3 variables.
Thank you for your help :
I have data that looks like this:
foo 78 xxx
bar yyy
qux 99 zzz
xuq xyz
They are tab delimited.
How can I extract lines where column 2 is empty, yielding
bar yyy
xuq xyz
I tried this but doesn't seem to work:
awk '$2==""' myfile.txt
Example Booking Application, which JBoss Seam is shipped with, build into EAR file of about 7 MB. That's pretty much if you consider deploying this package to a remote Jboss server and possibly redeploying it package many times during your regular work.
Lib files like richfaces and jsf-facelet make the lion's share of that EAR size. Why can't we just extract lib files into jboss-web.deployer directory on JBoss 4.2.0 GA server?
I'm using timthumb to resize my images because it scales them nicely if I only enter one of the dimensions. However I want to know if it's possible to extract the new resized image's dimensions so that I can add that dynamically to the img tag attributes.
I tried this with no luck:
$fullpath = '/lib/timthumb.php?src='.$image.'&w=100';
$my_image = array_values(getimagesize($fullpath));
list($width, $height, $type, $attr) = $my_image;
Any ideas?
I have a string that has some double quoted substring in it, the " character. In between the double quotes is the data i want.
How can i write a regex to extract "the first data i want" and "the second data i want" from this:
'some string with "the first data i want" and "the second data i want"'
I tried the following code.
<cfset mydata = 'some string with "the first data i want" and "the second data i want"'/>
<cfset arrData = ListToArray(mydata ,'"') />
Hi
i am new to python
i am trying to extract the text between that has specific text file
----
data1
data1
data1
extractme
----
data2
data2
data2
----
data3
data3
extractme
----
and then dump it to text file so that
----
data1
data1
data1
extractme
---
data3
data3
extractme
---
thanks for the help
data1
data1
I would like to extract only the content of a webpage. If the webpage is an article I would like to get only the article content and not all the other stuff. What is the best way to implement this?
Hi. I'm on a shared server with restricted disk space and i've got a gz file that super expands into a HUGE file, more than what i've got. How can I extract it "portion" by "portion (lets say 10 MB at a time), and process each portion, without extracting the whole thing even temporarily!
No, this is just ONE super huge compressed file, not a set of files please...
I would like to be able to extract the entity and property from the ConstraintViolationException message in a rdbms neutral way... Can this be done ? if so how ?
I do have a BKS file which contains 15GB+ size. How do I extract the data and run queries on this? I was told that this is SQL Server data. Should I try to upload this to SQL Azure? Any ideas.
I've got a text file, and wish to extract every above the !--- comment ---! into a new file, not based line numbers (but checking for the comment), How would I do this?
test123
bob
ted
mouse
qwerty
!--- comment ---!
123456
098786
I am looking to use javascript to extract the GET parameters from a user inputed url.
For example is a user enters a url say:
http://www.youtube.com/watch?v=ee925OTFBCA
I could get the v parameter
'ee925OTFBCA' as a variable
Thanks in Advance.
I would like to extract some text between two points in a string, in Javascript
Say the string is
"start-extractThis-234"
The numbers at the end can be any number, but the hyphens are always present.
Ideally I think capturing between the two hypens should be ok.
I would like the result of the regex to be
extractThis
For example, here is a string representing an expression:
var str = 'total = sum(price * qty) * 1.09875';
I want to extract variables (i.e., 'total', 'price' and 'qty' but not 'sum' since 'sum' is a function name) from this expression. What is the regexp pattern in javascript? Variable name consists of letters, digits, or the underscore, beginning with letters or the underscore.
I'm using this code:
s = line.match( /ABCD(\d{4})/ ).values_at( 1 )[0]
To extract numbers from strings like:
ABCD1234
ABCD1235
ABCD1236
etc.
It works, but I wonder what other alternative I have to to this in Ruby?
TheAutoConfig URL says to run the script in the directory that contains your package's configure script,what that directory means?anybody has experienced this to execute the script?and also how can we extract tar file through that?
I have a string of text that contains html, and I need to extract each url (most likely in img or a tags) to create a generic list of string objects. Is there an easy way to do this or will I have to resort to regular expressions?
If I have to resort to regular expressions, would you mind helping me out with that as well? :)
I'd like to extract all the ISBNs on a dynamic web page that I can't feed through a Yahoo Pipe (the user has to log in to see the page). Is there a way to do that with jQuery? How?
Thanks!
I am reading a file by line and need to extract latitude and longitude from it.
This how lines can looks:
DE 83543 Rott am Inn Bayern BY Oberbayern Landkreis Rosenheim 47.983 12.1278
DE 21147 Hamburg Hamburg HH Kreisfreie Stadt Hamburg 53.55 10
What's for sure is, there are no dots surrounded by digits except for the ones representing the doubles.
Unfortunately there are Values without a dot, so it's probably best to check for numbers from the end of the String.
thanks for your help!
We have a designer creating a user interface for an application. The main window has several QStackedWidgets used for in place panel switching. What I'd like to be able to do is extract each individual panel that makes up each page of the QStackedWidget into it its own .ui file.
Is there an easy way to accomplish this from within Qt Designer, or are there any other tools to help accomplish this task short of redesigning all of the panels in their own .ui files?
basically i want to extract keywords or words or tokens that are present in the webpage after removing the stopwords..please help if anybody knws how to do it..will be thankful to u...code in c# would be appreciable..seeking for replies..pls help
Wikipedia defines a 'typical' ETL cycle as :
Cycle initiation
Build reference data
Extract (from sources)
Validate
Transform (clean, apply business rules, check for data integrity, create aggregates or disaggregates)
Stage (load into staging tables, if used)
Audit reports (for example, on compliance with business rules. Also, in case of failure, helps to diagnose/repair)
Publish (to target tables)
Archive
Clean up
..What is meant by 'Build reference data'?
I am trying to get PHP to extract the TOKEN (the uppercase one), USERID (uppercase), and the USER NAME (uppercase) from a web page with the following text.
{"rsp":{"stat":"ok","auth":{"token":"**TOKEN**","perms":"read","user":{"id":"**USERID**","username":"**USER NAME**","fullname":"**NAME OF USER**"}}}}
(This is from the RTM api, getting the authentication token of the user).
How would I go about doing this? Thanks!