I need regular expression to match braces correct e.g for every open one close one
abc{abc{bc}xyz} I need it get all it from {abc{bc}xyz} not get {abc{bc} I tried this
({.*?})
I know charwise positions of matches like 1 3 7 8. I need to know their corresponding line number.
Example: file.txt
Match: X
Mathes: 1 3 7 8.
Want: 1 2 4 4
$ cat file.txt
X2
X
4
56XX
[Added: does not notice many linewise matches, there is probably easier way to do it with stacks]
$ java testt
1
2
4
$ cat testt.java
import java.io.*;
import java.util.*;
public class testt {
public static String data ="X2\nX\n4\n56XX";
public static String[] ar = data.split("\n");
public static void main(String[] args){
HashSet<Integer> hs = new HashSet<Integer>();
Integer numb = 1;
for(String s : ar){
if(s.contains("X")){
hs.add(numb);
numb++;
}else{
numb++;
}
}
for (Integer i : hs){
System.out.println(i);
}
}
}
For example, if I'm doing some form input validation and I'm using the following code for the name field.
preg_match("/^[a-zA-Z .-]$/", $firstname);
If someone types in Mr. (Awkward) Double-Barrelled I want to be able to display a message saying Invalid character(s): (, )
Dont ask how this works but currently it does ("^\|(.?)\|*$")....kinda. This removes all extra pipes...part one....I have searched all over no anwser yet. I am using VB2011 beta...asp web form......vb coding though!
I want to capture special character pipe (|) which is used to seperate words...i.e. car|truck|van|cycle
problem is users lead with, trail with, use multiple, and use spaces before and after...i.e. |||car||truck | van || cycle.
another example: george bush|micheal jordon|bill gates|steve jobs <-- this would be correct but when I do remove space it takes correct space out.
so I want to get rid of whitespace leading, trailing, any space before | and space after | and only allow one pipe (|)....in between alphanumeric of course.
Hi There,
Does anyone have a regurlar expression available which only accepts dates in the format dd/mm/yy but also has strict checking to make sure that the date is valid, including leap year support?
I am coding in vb.net and am struggling to work this one out.
Many Thanks
I have several very large XML files and I'm trying to find the lines that contain non-ASCII characters. I've tried the following:
grep -e "[\x{00FF}-\x{FFFF}]" file.xml
But this returns every line in the file, regardless of whether the line contains a character in the range specified.
Do I have the syntax wrong or am I doing something else wrong? I've also tried:
egrep "[\x{00FF}-\x{FFFF}]" file.xml
(with both single and double quotes surrounding the pattern).
Hello,
I would like to sanitize a string in to a URL so this is what I basically need.
Everything must be removed except alphanumeric characters and spaces and dashed.
Spaces should be converter into dashes.
Eg.
This, is the URL!
must return
this-is-the-url
Thanks
I have a string like this:
<![CDATA[<ClinicalDocument>rest of CCD here</ClinicalDocument>]]>
I'd like to replace the escape sequences with their non-escaped characters, to end up with:
<![CDATA[<ClinicalDocument>rest of CCD here</ClinicalDocument>]]>
Hi all,
I have a set of data that contains garbled text fields because of encoding errors during many import/exports from one database to another. Most of the errors were caused by converting UTF-8 to ISO-8859-1. Strangely enough, the errors are not consistent: the word 'München' appears as 'München' in some place and as 'MÜnchen'.
Is there a trick in SQL server to correct this kind of crap? The first thing that I can think of is to exploit the COLLATE clause, so that ü is interpreted as ü, but I don't exactly know how. If it isn't possible to make it in the DB level, do you know any tool that helps for a bulk correction? (no manual find/replace tool, but a tool that guesses the garbled text somehow and correct them)
I know this may be the simplest question ever asked on Stack Overflow, but what is the regular expression for a decimal with a precision of 2?
Valid examples:
123.12
2
56754
92929292929292.12
0.21
3.1
Invalid examples:
12.1232
2.23332
e666.76
Sorry for the lame question, but for the life of me I haven't been able to find anyone that can help!
The decimal place may be option, and that integers may also be included.
I'm new to Python scripting, so please forgive me in advance if the answer to this question seems inherently obvious.
I'm trying to put together a large-scale find-and-replace script using Python. I'm using code similar to the following:
findreplace = [
('term1', 'term2'),
]
inF = open(infile,'rb')
s=unicode(inF.read(),charenc)
inF.close()
for couple in findreplace:
outtext=s.replace(couple[0],couple[1])
s=outtext
outF = open(outFile,'wb')
outF.write(outtext.encode('utf-8'))
outF.close()
How would I go about having the script do a find and replace for regular expressions?
Specifically, I want it to find some information (metadata) specified at the top of a text file. Eg:
Title: This is the title
Author: This is the author
Date: This is the date
and convert it into LaTeX format. Eg:
\title{This is the title}
\author{This is the author}
\date{This is the date}
Maybe I'm tackling this the wrong way. If there's a better way than regular expressions please let me know!
Thanks!
Hi all, I'm basically trying to create my own tags - and replace them with the right HTML tags. So {B} {/B} would turn into <b> </b>
I have only got so far with this, here: http://www.nacremedia.com/text2.htm
Use the [B] button to bold stuff the current selection... it creates two bold tags and one closing for some reason.
I'm so close! But I just need a bit of direction to get the final bugs out - can anyone please help??
Also, if there is a better way of doing this altogether then I am more than welcome to new ideas.
I have a text that contains string of a following structure:
text I do not care about, persons name followed by two IDs.
I know that:
a person's name is always preceded by XYZ code and that is always followed by
two, space separated numbers.
Name is not always just a last name and first name. It can be multiple last or first names
(think Latin american names).
So, I am looking to extract string that follows the constant XYZ code and that is always terminated by two separate numbers.
You can say that my delimiter is XYZ and two numbers, but numbers need to be part of the extracted value as well.
From
blah, blah XYZ names, names 122322 344322 blah blah
I want to extract:
names, names 122322 344322
Would someone please advise on the regular expression for this that would work with Python's re package.
I'm trying to parse various info from log files, some of which is placed within square brackets. For example:
Tue, 06 Nov 2007 10:04:11 INFO processor:receive: [someuserid], [somemessage] msgtype=[T]
What's an elegant way to grab 'someuserid' from these lines, using sed, awk, or other unix utility?
My primary concern is with the Java flavor, but I'd also appreciate information regarding others.
Let's say you have a subpattern like this:
(.*)(.*)
Not very useful as is, but let's say these two capture groups (say, \1 and \2) are part of a bigger pattern that matches with backreferences to these groups, etc.
So both are greedy, in that they try to capture as much as possible, only taking less when they have to.
My question is: who's greedier? Does \1 get first priority, giving \2 its share only if it has to?
What about:
(.*)(.*)(.*)
Let's assume that \1 does get first priority. Let's say it got too greedy, and then spit out a character. Who gets it first? Is it always \2 or can it be \3?
Let's assume it's \2 that gets \1's rejection. If this still doesn't work, who spits out now? Does \2 spit to \3, or does \1 spit out another to \2 first?
I have a php variable that comes from a form that needs tidying up. I hope you can help.
The variable contains a list of items (possibly two or three word items with a space in between words).
I want to convert it to a comma separated list with no superfluous white space. I want the divisions to fall only at commas, semi-colons or new-lines. Blank cannot be an item.
Here's a comprehensive example (with a deliberately messy input):
Variable In: "dog, cat ,car,tea pot,, ,,, ;;(++NEW LINE++)fly, cake"
Variable Out "dog,cat,car,tea pot,fly,cake"
Can anyone help?
I've been working on this for a few hours now and can't find any help on it. Basically, I'm trying to strip a SQL string into various parts (fields, from, where, having, groupBy, orderBy). I refuse to believe that I'm the first person to ever try to do this, so I'd like to ask for some advise from the StackOverflow community. :)
To understand what I need, assume the following SQL string:
select * from table1 inner join table2 on table1.id = table2.id
where field1 = 'sam' having table1.field3 > 0
group by table1.field4 order by table1.field5
I created a regular expression to group the parts accordingly:
select\s+(?<fields>.+)\s+from\s+(?<from>.+)\s+where\s+(?<where>.+)\s+having\s+(?<having>.+)\s+group\sby\s+(?<groupby>.+)\s+order\sby\s+(?<orderby>.+)
This gives me the following results:
fields => *
from => table1 inner join table2 on table1.id = table2.id
where => field1 = 'sam'
having => table1.field3 > 0
groupby => table1.field4
orderby => table1.field5
The problem that I'm faced with is that if any part of the SQL string is missing after the 'from' clause, the regular expression doesn't match.
To fix that, I've tried putting each optional part in it's own (...)? group but that doesn't work. It simply put all the optional parts (where, having, groupBy, and orderBy) into the 'from' group.
Any ideas?
Anyone has experience measuring glibc regexp functions?
Are there any generic tests I need to run to make such a measurements (in addition to testing the exact patterns I intend to search)?
Thanks.
i use preg_match_all and need to grab all a href="" tags in my code, but i not relly understand how to its work.
i have this reg. exp. ( /(<([\w]+)[^])(.?)(<\/\2)/ ) its take all html codes, i need only all a href tags.
i hobe i can get help :)
I have a string in my code that I receive that contains some html tags. It is not part of the HTML page being displayed so I cannot grab the html tag contents using the DOM (i.e. document.getElementById('tag id').firstChild.data);
So, for example within the string of text would appear a tag like this:
12
My question is how would I use a regular expression to access the '12' numeric digit in this example? This quantity could be any number of digits (i.e. it is not always a double digit).
I have tried some regular expressions, but always end up getting the full span tag returned along with the contents. I only want the '12' in the example above, not the surrounding tag. The id of the tags will always be 'myQty' in the string of text I receive.
Thanks in advance for any help!
Hi,
I'm converting patch scripts using a commandline script - within these scripts there's the combination two lines like:
--- /dev/null
+++ filename.txt
which needs to be converted to:
--- filename.txt
+++ filename.txt
Initially I tried:
less file.diff | sed -e "s/---\/dev\null\n+++ \(.*\)/--- \1\n+++ \1/"
But I had to find out that multiline-handling is much more complex in sed :(
Any help is appreciated...
I have this site. Let's call it htp://www.mysite.com
I have a rewrite rule to change htp://www.mysite.com/?q=words%20etc/0/10 into http://www.mysite.com/words%20etc/0/10 (or http://www.mysite.com//0/10 or http://www.mysite.com/0/10)
.htaccess:ErrorDocument 404 htp://www.mysite.com/404.html
options +FollowSymlinks
rewriteEngine on
rewriteCond %{REQUEST_URI} !-f
rewriteCond %{REQUEST_URI} !-d
rewriteCond %{REQUEST_URI} !index\.php
rewriteRule ^/?([^/]+?)?/?([0-9]+?)/([0-9]+?)$ index.php/%{THE_REQUEST} [NC]
Now, this works on my local apache 2.2.11 server, no errors. However on my host's apache 1.3.41 server, I get the following error:
[Sat Mar 5 21:42:14 2011] [alert] [client [ip]] /home/_/public_html/mysite.com/.htaccess: RewriteRule: cannot compile regular expression '^/?([^/]+?)?/?([0-9]+?)/([0-9]+?)$'\n
I imagine it's something quirky about the apache version as other sites on this host use mod_rewrite without a hitch.
I've tried removing the +followSymlinks line, even the rewrite engine line. I haven't tried removing the conditions cause I don't think I should have to, I'm probably wrong.