I have a string in Rails, e.g. "This is a Twitter message. #books War & Peace by Leo Tolstoy. I love this book!", and I want to parse the text and extract only certain phrases, like "War & Peace by Leo Tolstoy".
Is this a matter of using Regex and lifting the text between "#books" to "."?
What if there's no structure to the message, like:
"This is a Twitter message #books War & Peace by Leo Tolstoy I love this book!" or
"This is a Twitter message. I love the book War & Peace by Leo Tolstoy #books"
How can I reliably pull the phrase "War & Peace by Leo Tolstoy" without knowing the phrase ex ante.
Are there any gems, methods, etc. that can help me do this?
At the very least, what would you call what I'm trying to do? It will help me search for a solution on Google. I've tried a few searches on "parsing" with no luck.
Hi,
i was wondering, how can i extract thumbnail from a flash-video file, then display it in a listbox.
the listbox is suppose to have many videos which i need to extract thumbnails from programatically with actionscript.
the flash-player is going to be on the web, and the extraction must happen when the swf file is loading, therefore, the method must not be too time-taking.
how do i go on about doing so? is this even possible?
tnx in advance
First of all, I did a search on this and was able to find how to use something like String.Split() to extract the string based on a condition. I wasn't able to find however, how to extract it based on an ending condition as well. For example, I have a file with links to images: http://i594.photobucket.com/albums/tt27/34/444.jpghttp://i594.photobucket.com/albums/as/asfd/ghjk6.jpg
You will notice that all the images start with http:// and end with .jpg. However, .jpg is succeeded by http:// without a space, making this a little more difficult.
So basically I'm trying to find a way (Regex?) to extract a string from a string that starts with http:// and ends with .jpg
I have the following code:
f = open(path, 'r')
html = f.read() # no parameters => reads to eof and returns string
soup = BeautifulSoup(html)
schoolname = soup.findAll(attrs={'id':'ctl00_ContentPlaceHolder1_SchoolProfileUserControl_SchoolHeaderLabel'})
print schoolname
which gives:
[<span id="ctl00_ContentPlaceHolder1_SchoolProfileUserControl_SchoolHeaderLabel">A B Paterson College, Arundel, QLD</span>]
when I try and access the value (i.e. 'A B Paterson College, Arundel, QLD) by using schoolname['value'] I get the following error:
print schoolname['value'] TypeError: list indices must be integers, not str
What am I doing wrong to get that value?
I need to extract window content if this is based on text, or at least the file path associated to that window. To-date, I have considered:
1. win32api
2. 3rd party libraries
3. wrapper classes
However, I am not satisfied with the solutions. So any ideas how this can be done in a clean way?
i Have an arguments like the one below which i pass to powershell script
-arg1 -abc -def -arg2 -ghi -jkl -arg3 -123 -234
Now i need to extract three strings without any whitespace
string 1: "-abc -def"
string 2: "-ghi -jkl"
string 3: "-123 -234"
i figured this expression could do it. But this doesnt seem to work.
$args -match '-arg1(?'arg1'.*?) -arg3(?'arg3'.*?) -arg3(?'arg3'.*)'.
THis should return $matches['arg1'] etc. So whats wrong in above expression. Why do i get an error as shown below
runScript.ps1 -arg1 -abc -def -arg2 -ghi -jkl -arg3 -123 -234
Unexpected token 'arg1'.?) -arg2
(?'arg2'.?) -arg3 (?'arg3'.)'' in
expression or statement. At
G:\powershell\tools\powershell\runTest.ps1:1
char:71
+ $args -match '-arg1 (?'arg1'.?) -arg2 (?'arg2'.?) -arg3 (?'arg3'.)' <<<<
+ CategoryInfo : ParserError: (arg1'.?) -arg2...g3
(?'arg3'.)':String) [],
ParseException
+ FullyQualifiedErrorId : UnexpectedToken
and also the second question is how do i make arg1 or arg2 or arg3 optional?
The argument to script can be
-arg2 -def -ghi.
I'll take some default values for arg(1|2|3) that is not mentioned.
Thanks
Hi,
I am able to list Documents from "Public Folders"
Using this sample code :
session.LogonExchangeMailbox("[email protected]", "server");
RDOFolder folder = session.GetFolderFromPath(@"\Public Folders\All Public Folders");
Now i want to Extract this documents to another location.
Hi.
So after writhing a large .tex file and using many packages I want to archive everything.
Not just the .tex .jpg files but also the .sty files.
This is because sometimes some options in the sty files are changed, and then I can't compile the file.
The "problem" is that in using Ubuntu, I already installed all the packages in my system.
I don't want to have to copy the manually.
Is there a program that can do this automatically.
Tnx.
Hi,i have a small problem i have xome xml with a cdata section. This CDATA section contains fragments of HTMl. I would like to extract some of the data inside this CDATA element. Right now i have a XSLT transformation that outputs the rest of the document as HTMl, but i need only a small part of the CDATA HTML, not the entire part - e.g. a my Title tag. How to do this?
I am trying to take a string of text like so:
$string = "This (1) is (2) my (3) example (4) text";
In every instance where there is a positive integer inside of parentheses, I'd like to replace that with simply the integer itself.
The code I'm using now is:
$result = preg_replace("((\d+))", "$0", $string);
But I keep getting a "Delimiter must not be alphanumeric or backslash" error.
Any thoughts? I know there are other questions on here that sort of answer the question, but my knowledge of regex is not enough to switch it over to this example.
i was trying to extract a particular set of nodes from the following XML structure using XML::Twig, but have been stuck ever since. I need to extract the 'player' nodes from the following structure and do a string match/replace on each of these node values.
<pep:record>
<agency>
<subrecord type="scout">
<isnum>123XXX (print)</isnum>
<isnum>234YYY (mag)</isnum>
</subrecord>
<subrecord type="group">
</subrecord>
</agency
</record>
I tried using the following code, but I get pointed to a hash reference rather than actual string.
my $parser = XML::Twig->new(twig_handlers => {
isnum => sub { print $_->text."::" },
});
foreach my $rec (split(/::/, $parser->parse($my_xml))) {
if ($rec =~ m/print/) {
($print = $rec) =~ s/( \(print\))//;
}
elsif($rec =~ m/mag/) {
($mag = $rec) =~ s/( \(mag\))//;
}
}
I have just started programming and have made a few small applications in C and C#. My understanding is that programming for web and thing related to web is nowadays a very easy task.
Please note this is for personnel learning, not for rent a coder or any money making.
An application which can run on any Windows platform even Windows 98.
The application should start automatically at a scheduled time and do the following.
Connect to a site which displays stock prices summary (high low current open).
Capture the data (excluding the other things in the site.)
And save it to disk (an SQL database)
Please note:-
Internet connection is assumed to be there always.
Do not want to know how to make database schema or database.
The stock exchange has no law prohibiting the use of the data provided on its site, but I do not want to mention the name in case I am wrong, but it's for personal private use only.
The data of summary of pricing is arranged in a table such that when copied pasted to MS Excel it automatically forms a table.
need steps guidance please, examples, lbraries
Consider this html snippet:
<tr>
<td valign=top class="tim_new"><a href="/stocks/company_info/pricechart.php?sc_did=MI42" class="tim_new">3M India</a></td>
<td class="tim_new" valign=top><a href='/stocks/marketstats/indcomp.php?optex=NSE&indcode=Diversified' class=tim>Diversified</a></td>
<td class="tim_new" align=right valign=top>2,487.25</td>
<td class="tim_new" align=right valign=top><font color=#16a903>187.25</font></td>
<td class="tim_new" align=right valign=top><font color=#16a903>8.14</font></td>
<td class="tim_new" align=right valign=top>2,801.90</td>
<td class="tim_new" align=right valign=top>0.06</td>
</tr>
Realize these three things:
The HTML file from which this snippet has been taken, contains multiple number of HTML tables.
The table from which this snippet has been extracted doesnt contain only rows of the shown format, but also of other formats like this, for example:
<tr><td colspan=7><img src="http://img1.moneycontrol.com/images/blank.gif"height="5"></td></tr>`
This same table contains multiple rows of the format which I need to extract.
So given this scenario, is it possible to run a code, which extracts, the link with the class name = "tim_new"?
Help Appreciated,
Soham
I am trying to implement a module in either C# or classic ASP, which extracts the XMP data from a EPS file.
Is there a framework or component (not necesarilly free) which can help me to create this module?
Any advice / direction will be greatly appreciated.
I can't figure this out. I need to extract the second level domain from a FQDN. For example, all of these need to return "example.com":
example.com
foo.example.com
bar.foo.example.com
example.com:8080
foo.example.com:8080
bar.foo.example.com:8080
Here's what I have so far:
Dim host = Request.Headers("Host")
Dim pattern As String = "(?<hostname>(\w+)).(?<domainname>(\w+.\w+))"
Dim theMatch = Regex.Match(host, pattern)
ViewData("Message") = "Domain is: " + theMatch.Groups("domainname").ToString
It fails for example.com:8080 and bar.foo.example.com:8080. Any ideas?
I have got around 800 files of maximum 55KB-100KB each where the data is in this format
Date/Time/Float1/Float2/Float3/Float4/Integer
Date is in DD/MM/YYYY format and Time is in the format of HH:MM
Here the date ranges from say 1st May to 1June and each day, the Time varies from 09:00 to 15:30.
I want to run a program so that, for each file, it extracts the data pertaining to a particular given date and writes to a file.
I will not face any problem in writing into directory operations.
I am trying to get around, to form a to do a search and extract operation. I dont know, how to do it, would like to have some idea.
Thanks
Soham
Hi
I need to extract musical features (note details-pitch, duration, rhythm, loudness, note start time) from a polyphonic (having 2 scores for treble and bass - bass may also have chords) MIDI file. I'm using the jMusic API to extract these details from a MIDI file. My approach is to go through each score, into parts, then phrases and finally notes and extract the details.
With my approach, it's reading all the treble notes first and then the bass notes - but chords are not captured (i.e. only a single note of the chord is taken), and I cannot identify from which point onwards are the bass notes.
So what I tried was to get the note onsets (i.e. the start time of note being played) - since the starting time of both the treble and bass notes at the start of the piece should be same - But I cannot extract the note onset using jMusic API. Each time it shows 0.0.
Is there any way I can identify the voice (treble or bass) of a note? And also all the notes of a chord? How is the voice or note onset for each note stored in MIDI? Is this different for each MIDI file?
Any insight is greatly appreciated. Thanks in advance
Hi,
I have a raster file (basically 2D array) with close to a million points. I am trying to extract a circle from the raster (and all the points that lie within the circle. Using ArcGIS is exceedingly slow for this. Can anyone suggest any image processing library that is both easy to learn and powerful and quick enough for something like this?
Thanks!
Can I parse the html tables by giving only column name ?
Like only those data should be extracted from the table which matches those column names I give.
Like for example I have table of column names like serial no., name, address, phone no,total Rs..
And I want to extract the information about only name, phone no and total Rs.. Then how can I do it?
Consider this piece of code:
<tr>
<td valign=top class="tim_new"><a href="/stocks/company_info/pricechart.php?sc_did=MI42" class="tim_new">3M India</a></td>
<td class="tim_new" valign=top><a href='/stocks/marketstats/indcomp.php?optex=NSE&indcode=Diversified' class=tim>Diversified</a></td>
I want to write a piece of code using HTMLAgility pack which would extract the link in the first line.
I am not very good at regular expression but want to do some thing like this :
string="c test123 d split"
I want to split the word based on "c" and "d". this can be any word which i already have. The string will be given by the user. i want "test123" and "split" as my output. and there can be any number of words i.e "c test123 d split e new" etc. c d e i have already with me. I want just the next word after that word i.e after c i have test123 and after d i have split and after e i have new so i need test123 and split and new. how can I do this??? And one more thing I will pass just c first than d and than e. not together all of them. I tried
string strSearchWord="c ";
Regex testRegex1 = new
Regex(strSearchWord);
List lstValues =
testRegex1.Split("c test123 d
split").ToList();
But it's working only for last character i.e for d it's giving the last word but for c it includes test123 d split.
How shall I do this???
I am looking at an embedded system where secrets are stored in flash that is internal to the chip package, and there is no physical interface to get that information out - all access to this flash is policed by program code.
All DMA attacks and JTAG and such are disabled. This seems to be a common locked-down configuration for system-on-a-chip.
How might an attacker recover the secrets in that Flash?
I understand they can fuzz for vulnerabilities in the app code and exploit it, that there could be some indistinct general side channel attack or something.
But how would an attacker really go about trying to recover those keys? Are there viable approaches for a determined attacker to somehow shave-down the chip or some kind of microscope attack?
I am having some problems parsing this piece of XML using SimpleXML. There is always only one Series element, and a variable number of Episode elements beneath. I want to parse XML so I can store the Series data in one table, and all the Episode data in another table.
XML:
<Data>
<Series>
<id>80348</id>
<Genre>|Action and Adventure|Comedy|Drama|</Genre>
<IMDB_ID>tt0934814</IMDB_ID>
<SeriesID>68724</SeriesID>
<SeriesName>Chuck</SeriesName>
<banner>graphical/80348-g.jpg</banner>
</Series>
<Episode>
<id>935481</id>
<Director>Robert Duncan McNeill</Director>
<EpisodeName>Chuck Versus the Third Dimension 2D</EpisodeName>
<EpisodeNumber>1</EpisodeNumber>
<seasonid>27984</seasonid>
<seriesid>80348</seriesid>
</Episode>
<Episode>
<id>935483</id>
<Director>Robert Duncan McNeill</Director>
<EpisodeName>Buy More #15: Employee Health</EpisodeName>
<EpisodeNumber>2</EpisodeNumber>
<seasonid>27984</seasonid>
<seriesid>80348</seriesid>
</Episode>
</Data>
When I attempt to access just the first Series element and child nodes, or iterate through the Episode elements only it does not work. I have also tried to use DOMDocument with SimpleXML, but could not get that to work at all.
PHP Code:
<?php
if(file_exists('en.xml'))
{
$data = simplexml_load_file('en.xml');
foreach($data as $series)
{
echo 'id: <br />' . $series->id;
echo 'imdb: <br />' . $series->IMDB_ID;
}
}
?>
Output:
id:80348
imdb:tt0934814
id:935481
imdb:
id:1534641
imdb:
Any help would be greatly appreciated.
I am getting a string from the user and then doing some checking to make sure it is valid, here is the code I have been using;
char digit= userInput.charAt(0) - '0';
This had been working fine until I did some work on another method, I went to compile and have been receiving a 'possible loss of precision' error since then.
What am I doing wrong?