I try to break down the http://stackoverflow.com/questions/2711961/decoding-algorithm-wanted question into smaller questions. This is Part I.
Question:
two strings: s1 and s2
part of s1 is identical to part of s2
space is separator
how to extract the identical part(s)?
example 1:
s1 = "12 November 2010 - 1 visitor"
s2 = "6 July 2010 - 100 visitors"
the identical parts are "2010", "-", "1" and "visitor"
example 2:
s1 = "Welcome, John!"
s2 = "Welcome, Peter!"
the identical parts are "Welcome," and "!"
Python and Ruby preferred. Thanks
How do I split on all nonalphanumeric characters, EXCEPT the apostrophe?
re.split('\W+',text)
works, but will also split on apostrophes. How do I add an exception to this rule?
Thanks!
Hi
Please help me with a regular expression to validate the following format
dd/mm
This is for validating a Birthday field and the year is not required.
Thanks
In Brzozowski's "Derivatives of Regular Expressions" and elsewhere, the function d(R) returning ? if a R is nullable, and Ø otherwise, includes clauses such as the following:
d(R1 + R2) = d(R1) + d(R2)
d(R1 · R2) = d(R1) ? d(R2)
Clearly, if both R1 and R2 are nullable then (R1 · R2) is nullable, and if either R1 or R2 is nullable then (R1 + R2) is nullable. It is unclear to me what the above clauses are supposed to mean, however. My first thought, mapping (+), (·), or the Boolean operations to regular sets is nonsensical, since in the base case,
d(a) = Ø (for all a ? S)
d(?) = ?
d(Ø) = Ø
and ? is not a set (nor is the return type of d, which is a regular expression). Furthermore, this mapping isn't indicated, and there is a separate notation for it. I understand nullability, but I'm lost on the definition of the sum, product, and Boolean operations in the definition of d: how are ? or Ø returned from d(R1) ? d(R2), for instance, in the definition off d(R1 · R2)?
I currently have a machine with an Opteron 275 (2.2Ghz), which is a dual core CPU, and 4GB of RAM, along with a very fast hard drive. I find that when compiling even somewhat simple projects that use C++ templates (think boost, etc.), my compile times can take quite a while (minutes for small things, much longer for bigger projects). Unfortunately only one of the cores is pegged at 100%, so I know it's not the I/O, and it would seem that there is no way to take advantage of the other core for C++ compilation?
I'm parsing some big log files and have some very simple string matches for example
if(m/Some String Pattern/o){
#Do something
}
It seems simple enough but in fact most of the matches I have could be against the start of the line, but the match would be "longer" for example
if(m/^Initial static string that matches Some String Pattern/o){
#Do something
}
Obviously this is a longer regular expression and so more work to match. However I can use the start of line anchor which would allow an expression to be discarded as a failed match sooner.
It is my hunch that the latter would be more efficient. Can any one back me up/shoot me down :-)
Regular expressions are often pointed to as the classical example of a language that is not Turning complete. For example "regular expressions" is given in as the answer to this SO question looking for languages that are not Turing complete.
In my, perhaps somewhat basic, understanding of the notion of Turning completeness, this means that regular expressions cannot be used check for patterns that are "balanced". Balanced meaning have an equal number of opening characters as closing characters. This is because to do this would require you to have some kind of state, to allow you to match the opening and closing characters.
However the .NET implementation of regular expressions introduces the notion of a balanced group. This construct is designed to let you backtrack and see if a previous group was matched. This means that a .NET regular expressions:
^(?<p>a)*(?<-p>b)*(?(p)(?!))$
Could match a pattern that:
ab
aabb
aaabbb
aaaabbbb
... etc. ...
Does this means .NET's regular expressions are Turing complete? Or are there other things that are missing that would be required for the language to be Turing complete?
Got a problem where preg_replace only replaces the first match it finds then jumps to the next line and skips the remaining parts on the same line that I also want to be replaced.
What I do is that I read a CSS file that sometimes have multiple "url(media/pic.gif)" on a row and replace "media/pic.gif" (the file is then saved as a copy with the replaced parts). The content of the CSS file is put into the variable $resource_content:
$resource_content = preg_replace('#(url\((\'|")?)(.*)((\'|")?\))#i', '${1}'.url::base(FALSE).'${3}'.'${4}', $resource_content);
Does anyone know a solution for why it only replaces the first match per line?
I have a regular expression, links = re.compile('<a(.+?)href=(?:"|\')?((?:https?://|/)[^\'"]+)(?:"|\')?(.*?)>(.+?)</a>',re.I).findall(data)
to find links in some html, it is taking a long time on certain html, any optimization advice?
One that it chokes on is http://freeyourmindonline.net/Blog/
Are there any programs for parsing and displaying in a nice format the c++ error messages generated by gcc.
I'm really looking for something like less that I can pipe my errors into that will collapse the template parameter lists by default, maybe with some nice highlighting so that my errors are actually readable.
(Yes, it's boost's fault I have such incomprehensible errors, in case you were wondering)
We can easily check for a match in a string
if (preg_match("/happy/i", "happy is he who has ")) {
echo "match found.";
} else {
echo "match not found.";
}
?>
But how to check for the occurrence of match in a webpage or given a url?
I've taken a regular expression from jQuery to detect if a browser's engine is WebKit and gets it's version number, it returns 3 values extracted from the userAgent string: webkit/….…, webkit and ….… [“….…” being the version number].
I would like the regular expression to return just 2 values: webkit and ….….
I'm rubbish at regular expressions, so please can you give an explanation of the expression with your answer.
The regular expression I'm currently working with and wish to improve is: /(webkit)[\/]([\w.]+)/.
I appreciate all your help, thanks in advance!
Take this string as input:
string s="planets {Sun|Mercury|Venus|Earth|Mars|Jupiter|Saturn|Uranus|Neptune}"
How would I choose randomly N from the set, then join them with comma. The set is defined between {} and options are separated with | pipe.
The order is maintained.
Some output could be:
string output1="planets Sun, Venus";
string output2="planets Neptune";
string output3="planets Earth, Saturn, Uranus, Neptune";
string output4="planets Uranus, Saturn";// bad example, order is not correct
Java 1.5
Hi,
Could anyone please tell me the reason of getting an output as: ab for the following RegExp code using Relcutant quantifier?
Pattern p = Pattern.compile("abc*?");
Matcher m = p.matcher("abcfoo");
while(m.find())
System.out.println(m.group()); // ab
and getting empty indices for the following code?
Pattern p = Pattern.compile(".*?");
Matcher m = p.matcher("abcfoo");
while(m.find())
System.out.println(m.group());
my string style like this:
expression1/field1+expression2*expression3+expression4/field2*expression5*expression6/field3
a real style mybe like this:
computer/(100)+web*mail+explorer/(200)*bbs*solution/(300)
"+" and "*" represent operator
"computer","web"...represent expression
(100),(200) represent field num . field num may not exist.
I want process the string to this:
/(100)+web*+explorer/(200)bbs/(300)
rules like this:
if expression length is more than 3 and its field is not (200), then add brackets to it.
I have used regExp quit a bit of times but still far from being an expert. This time I want to validate a formula (or math expression) by regExp. The difficult part here is to validate proper starting and ending parentheses with in the formula.
I believe, there would be some sample on the web but I could not find it. Can somebody post a link for such example? or help me by some other means?
Ok so i'm executing the following line of code in javascript
RegExp('(http:\/\/t.co\/)[a-zA-Z0-9\-\.]{8}').exec(tcont);
where tcont is equal to some string like 'Test tweet to http://t.co/GXmaUyNL' (the content of a tweet obtained by jquery).
However it is returning, in the case above for example, 'http://t.co/GXmaUyNL,http://t.co/'.
This is frustracting because I want the url without the bit on the end - after and including the comma.
Any ideas why this is appearing? Thanks
Hi,
how can I make sure a certain keyword just occurs once in the input with regular expression?
I think there is some mistakes in the expression below as I can repeat the same keywords,
if (!preg_match('/\b(.php?){1}\b/', $cfg_path))
{
$error = true;
echo '<error elementid="cfg_path" message="PATH - make sure you have a \'.php?\' in the path."/>';
}
I just want this to be true,
form.php?category=something or form.php?
but not this,
form.php?.php?category=something or form.php?.php?
please let me know how to fix it.
thanks.
Open a file in the Visual Studio binary editor that contains a null byte (0x00), then use the Quick Find feature (Ctrl +F) to find null bytes.
I would have thought I could use a regular expression such as \x00 to match null bytes but it doesn't work. Searching for any other hex value using this method works fine.
Is this a VS bug, 'feature', or am I just missing something? Is there a work around?
In my program I have a dataTable and I´d like to know if is there a column which name starts with abc.
For example I have a DataTable and its name is abcdef. I like to find this column using something like this:
DataTable.Columns.Constains(ColumnName.StartWith(abc))
Because I know only part of the column name, I cannot use a Contains method.
Is there any simple way how to do that?
Thanks a lot.
Hi have some forms that I want to use some basic php validation (regular expressions) on, how do you go about doing it? I have just general text input, usernames, passwords and date to validate. I would also like to know how to check for empty input boxes. I have looked on the interenet for this stuff but I haven't found any good tutorials.
Thanks
if (preg_match('(\p{Nd}{4}/\p{Nd}{2}/\p{Nd}{2}/\p{L}+)', '2010/02/14/this-is-something'))
{
// do stuff
}
The above code works. However this one doesn't.
if (preg_match('/\p{Nd}{4}/\p{Nd}{2}/\p{Nd}{2}/\p{L}+/u', '2010/02/14/this-is-something'))
{
// do stuff
}
Maybe someone could shed some light as to why the one below doesn't work. This is the error that is being produced:
A PHP Error was encountered
Severity: Warning
Message: preg_match()
[function.preg-match]: Unknown
modifier '\'