what is the best way to extract last 2 characters of a string using regular expression.
For example, I want to extract state code from the following
"A_IL"
I want to extract IL as string..
please provide me C# code on how to get it..
string fullexpression = "A_IL";
string StateCode = some regular expression code....
thanks
Hi all
I have written myself a web crawler using simplehtmldom, and have got the crawl process working quite nicely. It crawls the start page, adds all links into a database table, sets a session pointer, and meta refreshes the page to carry onto the next page. That keeps going until it runs out of links
That works fine however obviously the crawl time for larger websites is pretty tedious. I wanted to be able to speed things up a bit though, and possibly make it a cron job.
Any ideas on making it as quick and efficient as possible other than setting the memory limit / execution time higher?
Hi all,
I have a set of data that contains garbled text fields because of encoding errors during many import/exports from one database to another. Most of the errors were caused by converting UTF-8 to ISO-8859-1. Strangely enough, the errors are not consistent: the word 'München' appears as 'München' in some place and as 'MÜnchen'.
Is there a trick in SQL server to correct this kind of crap? The first thing that I can think of is to exploit the COLLATE clause, so that ü is interpreted as ü, but I don't exactly know how. If it isn't possible to make it in the DB level, do you know any tool that helps for a bulk correction? (no manual find/replace tool, but a tool that guesses the garbled text somehow and correct them)
I need help on regular expression on the condition (4) below:
Begin with a-z
End with a-z0-9
allow 3 special characters like ._-
The characters in (3) must be followed by alphanumeric characters, and it cannot be followed by any characters in (3) themselves.
Not sure how to do this. Any help is appreciated, with the sample and some explanations.
Hi,
I'm using regular expression to count the total spaces in a line (first occurrence).
match(/^\s*/)[0].length;
However this reads it from the start to end, How can I read it from end to start.
Thanks
I need to target the starting tag of the last top level LI in a list that may or may-not contain sublists in various positions - without using CSS or Javascript.
Is there a simple/elegant regexp that can help with this? I'm no guru w/ them, but it appears the need for greedy/non-greedy selectors when I'm selecting all the middle text (.*) / (.+) changes as nested lists are added and moved around in the list - and this is throwing me off.
$pattern = '/^(<ul>.*)<li>(.+<\/li><\/ul>)$/';
$replacement = '$1<li id="lastLi">$3';
Perhaps there is an easier approach?? converting to XML to target the LI and then convert back?
ie:
Single Element
<ul>
<li>TARGET</li>
</ul>
Multiple Elements
<ul>
<li>foo</li>
<li>TARGET</li>
</ul>
Nested Lists before end
<ul>
<li>
foo
<ul>
<li>bar</li>
</ul>
<li>
<li>TARGET</li>
</ul>
Nested List at end
<ul>
<li>foo</li>
<li>
TARGET
<ul>
<li>bar</li>
</ul>
</li>
</ul>
I'm new to Python scripting, so please forgive me in advance if the answer to this question seems inherently obvious.
I'm trying to put together a large-scale find-and-replace script using Python. I'm using code similar to the following:
findreplace = [
('term1', 'term2'),
]
inF = open(infile,'rb')
s=unicode(inF.read(),charenc)
inF.close()
for couple in findreplace:
outtext=s.replace(couple[0],couple[1])
s=outtext
outF = open(outFile,'wb')
outF.write(outtext.encode('utf-8'))
outF.close()
How would I go about having the script do a find and replace for regular expressions?
Specifically, I want it to find some information (metadata) specified at the top of a text file. Eg:
Title: This is the title
Author: This is the author
Date: This is the date
and convert it into LaTeX format. Eg:
\title{This is the title}
\author{This is the author}
\date{This is the date}
Maybe I'm tackling this the wrong way. If there's a better way than regular expressions please let me know!
Thanks!
Anyone has experience measuring glibc regexp functions?
Are there any generic tests I need to run to make such a measurements (in addition to testing the exact patterns I intend to search)?
Thanks.
I want to validate login name with special characters !@#S%^*()+_-?/<:"';. space using regular expression in ruby on rails. These special characters should not be acceptable. What is the code for that?
Thanks,
Pallavi
I have some code that looks more or less like this:
while(scanner.hasNext())
{
if(scanner.findInLine("Test") !=null) {
//do some things
}else{
scanner.nextLine();
}
}
I am using this to parse an ~10MB text file. The problem is, if I put a breakpoint on the while() and the scanner.nextLine(), I can see that sometimes the scanners position (in the debug window) goes back to zero. I think this is causing me some kind of loop blow up, because the regext in findInLine() starts at zero, looks through some amount of text, advancing the position, and then it randomly gets set back to zero, so it has to re-parse all that text again.
Any ideas what can be causing that? Am I even doing this the right way?
Thanks
Some additional info:
The Scanner is instantiated from an InputStream. After diubg sine debugging, it appears that there is a HearCharBuffer that Scanner uses and it only allows 1024 characters at a time, and then resets. Is there a way to avoid this, or do things differently? That seems like a small amount of characters to be able to scan.
Derek
I'm migrating wiki pages from the FlexWiki engine to the FOSwiki engine using Python regular expressions to handle the differences between the two engines' markup languages.
The FlexWiki markup and the FOSwiki markup, for reference.
Most of the conversion works very well, except when I try to convert the renamed links.
Both wikis support renamed links in their markup.
For example, Flexwiki uses:
"Link To Wikipedia":[http://www.wikipedia.org/]
FOSwiki uses:
[[http://www.wikipedia.org/][Link To Wikipedia]]
both of which produce something that looks like
I'm using the regular expression
renameLink = re.compile ("\"(?P<linkName>[^\"]+)\":\[(?P<linkTarget>[^\[\]]+)\]")
to parse out the link elements from the FlexWiki markup, which after running through something like
"Link Name":[LinkTarget]
is reliably producing groups
<linkName> = Link Name
<linkTarget = LinkTarget
My issue occurs when I try to use re.sub to insert the parsed content into the FOSwiki markup.
My experience with regular expressions isn't anything to write home about, but I'm under the impression that, given the groups
<linkName> = Link Name
<linkTarget = LinkTarget
a line like
line = renameLink.sub ( "[[\g<linkTarget>][\g<linkName>]]" , line )
should produce
[[LinkTarget][Link Name]]
However, in the output to the text files I'm getting
[[LinkTarget [[Link Name]]
which breaks the renamed links.
After a little bit of fiddling I managed a workaround, where
line = renameLink.sub ( "[[\g<linkTarget>][ [\g<linkName>]]" , line )
produces
[[LinkTarget][ [[Link Name]]
which, when displayed in FOSwiki looks like
<[[Link Name> <--- Which WORKS, but isn't very pretty.
I've also tried
line = renameLink.sub ( "[[\g<linkTarget>]" + "[\g<linkName>]]" , line )
which is producing
[[linkTarget [[linkName]]
There are probably thousands of instances of these renamed links in the pages I'm trying to convert, so fixing it by hand isn't any good.
For the record I've run the script under Python 2.5.4 and Python 2.7.3, and gotten the same results.
Am I missing something really obvious with the syntax? Or is there an easy workaround?
Hi,
I'm converting patch scripts using a commandline script - within these scripts there's the combination two lines like:
--- /dev/null
+++ filename.txt
which needs to be converted to:
--- filename.txt
+++ filename.txt
Initially I tried:
less file.diff | sed -e "s/---\/dev\null\n+++ \(.*\)/--- \1\n+++ \1/"
But I had to find out that multiline-handling is much more complex in sed :(
Any help is appreciated...
Hello
please help me
<html>
<body>
http://domainname.com/abc/xyz.zip
http://domainname2.com/abc/xyz.zip
</body>
</html>
I want replace with link and out put like
<html>
<body>
<a href="http://domainname.com/abc/xyz.zip">http://domainname.com/abc/xyz.zip</a>
<a href="http://domainname2.com/abc/xyz.zip">http://domainname2.com/abc/xyz.zip</a>
</body>
</html>
Great Thank
I want to convert
<p>Code is following</p>
<pre>
<html><br></html>
</pre>
to
<p>Code is following</p>
<pre>
<html>
</html>
</pre>
I don't know how to write regular expression for replace between pre tag in PHP.
I tried this code http://stackoverflow.com/questions/1517102/replace-newlines-with-br-tags-but-only-inside-pre-tags
but it's not working for me.
Given a string that ends in a whitespace character return true.
I'm sure I should be able to do this with regular expressions but I'm not having any luck. MSDN reference for regular expressions tells me that \s should match on a whitespace, but I can't figure the rest out.
I'm working on 2 cases:
assume I have those var:
a = "hello"
b = "hello-SP"
b = "not_hello"
1 - Any partial matches
I want to accept any string that has the var a inside, so b and c would match.
2 - Patterned match
I want to match a string that has a inside, followed by '-', so b would match, c does not.
I am having problem, because I always used the syntax /expression/ to define Regexp, so how dinamicaly define an RegExp on Ruby??
I have a list of phone numbers that start with the below numbers and in different formats...i need to grab the numbers that start only with the below numbers/format using php......
020 8
07974
+44 (0) 20
+44 0
440203
any help will be appreciated..
I have about 40000 records in that table that contains plain text and within the plain text, contains that kind of tags which its only characteristic is that they are braced between [ ]
[caption id="attachment_2948" align="alignnone" width="480" caption="the caption goes here"]
How could I remove those? (replace by nothing)
I could also run a PHP program if necessary to do the cleanup.
I am working on project where I need to find Frequency from a given text. I wrote a Regular expression that try to detect frequency, however I am stuck with how C# handle it and how exactly I use it in my software
My regular experssion is (\d*)(([,\.]?\s*((k|m)?hz)*)|(\s*((k|m)?hz)*))$
And I am trying to find value from
23,2 Hz
24,4Hz
25,0 Hzsadf
26 Hz
27Khz
28hzzhzhzhdhdwe
29
30.4Hz
31.8 Hz
4343.34.234 Khz
65SD
Further Explanation:
System needs to work for US and Belgium Culture hence, 23.2 (US) = 23,2 (Be)
I try to find a Digit, followed by either khz,mhz,hz or space or , or .
If it is , or . then it should have another Digit followed by khz, mhz, hz
Any help is appericated.
I have got a file with following format.
1234, 'US', 'IN',......
324, 'US', 'IN',......
...
...
53434, 'UK', 'XX', ....
...
...
253, 'IN', 'UP',....
253, 'IN', 'MH',....
Here I want to extract only those lines having 'IN' as 2nd keyword. i.e.
253, 'IN', 'UP',....
253, 'IN', 'MH',....
Can any one please tell me a command to grep it.
i am trying to replace all the special characters including white space, hyphen, etc, to underscore, from a string variable in tcl. I wrote the code below but it doesn't seem to be working.
set varname $origVar
puts "Variable Name :>> $varname"
if {$varname != ""} {
regsub -all {[\s-\]\[$^?+*()|\\%&#]} $varname "_" $newVar
}
puts "New Variable :>> $newVar"
one issue is that, instead of replacing the string in $varname, it is replacing the data inside $origVar. No idea why, and also i read the example code (for proper syntax) in my tcl book and according to that it should be something like this
regsub -all {[\s-][$^?+*()|\\%&#]} $varname "_" newVar
so i used the same syntax but it didn't work and gave the same result as modifying the $origVar instead of required $varname value.
Using the following to load images base on two ids one is the and bookid and the out is the client.
My folder structures is this.
root path = flipbooks
subfolders under flipbooks are books and clients
in subfolder books I have and .net page title tablet.
the tablet code behind checks the bookid of client and render a the tablet page with images in a flipbook fashion.
because we have over 15000 records and flipbooks already created and stored in the database. I don't move the client folder under the books subfolders. I need the code below to get to the client subfolder in the query string and help to change this would be helpful.
The result now is http://www.somewebsite.com/books/client/images/someimage1.jpg[^]
I need the results to be http://www.somewebsite.com/client/images/someimage1.jpg[^].
I tried moving the tablet.aspx file to the root flipbooks and it works but i have provide a user name and password each time. This need to be access by the public and my root is protected. Don't want to have to change permission.
I am trying to remove the /books
function getParameterByName(name) {
var results = RegExp('[?&]' + name + '=([^&]*)').exec(window.location.search);
return results ?
decodeURIComponent(results[1].replace(/\+/g, ' '))
: null;
}
Thanks Mission Critical
I've started the process of moving my blog to Octopress, but unfortunately, a limitation of Jekyll doesn't allow me to use abbreviated month names for my permalinks. Therefore I'm looking to just get rid of the month and day bits altogether.
I'ved read in this article that you can use rack-rewrite to take care of the redirection, since I am using Heroku to host this.
So how would I turn:
This: example.com/journal/2012/jan/03/post-of-the-day/
Into this: example.com/journal/2012/post-of-the-day/
Extra points: If I had another rule that redirected /blog/ to /journal/, would that rule still adhere to the above one as well? So from:
This: example.com/blog/2012/jan/03/post-of-the-day/
To this: example.com/journal/2012/jan/03/post-of-the-day/
And finally to: example.com/journal/2012/post-of-the-day/
Thanks for the assistance in advance. :)