regex cookbook - Page 56

Select different parts of an line

- by Ricardo Sa

I'm new to regexes and have a file that looks like this one: |60|493,93|1,6500| |60|95,72|1,6500| |60|43,88|1,6500| |60|972,46|1,6500| I used the regex (\|60|.*)(1,65) and I was able to find all the lines that have the information that I wanted to changed. How can I make an replace that when Notepad++ finds (\|60|.*)(1,65), the 60 should be replaced with 50: |50|493,93|1,6500| |50|95,72|1,6500| |50|43,88|1,6500| |50|972,46|1,6500| PS: here's an example of the full line: |C170|002|34067||44,14000|KG|493,93|0|0|020|1102||288,11|12,00|34,57|0|0|0|0|||0|0|0|60|493,93|1,6500|||8,15|60|493,93|7,6000|||37,54||

Read the article

htaccess filesMatch exclusion

- by Hikari

I have the following directive in my htaccess <filesMatch "\.(gif|jpe?g|png|js|css|swf|php|ico|txt|pdf|xml|html?)$"> FileETag None <ifModule mod_headers.c> Header unset ETag Header set Cache-Control "max-age=0, no-cache, no-store, must-revalidate" Header set Pragma "no-cache" Header set Expires "Wed, 11 Jan 1984 05:00:00 GMT" </ifModule> </filesMatch> I copied that regex from someplace in Web months ago. It should add those headers to any HTTP Response that does NOT have those extensions. But it's not working, it's adding them to any Response. I also need to create another directive to add Header set Cache-Control "max-age=3600, public" to Responses of files that DOES have them. Could anybody help me make proper fileMatch regexes?

Read the article

How to Convert Non-English Characters to English Using JavaScript

- by Adam Right

I have a c# function which converts all non-english characters to proper characters for a given text. like as follows public static string convertString(string phrase) { int maxLength = 100; string str = phrase.ToLower(); int i = str.IndexOfAny( new char[] { 's','ç','ö','g','ü','i'}); //if any non-english charr exists,replace it with proper char if (i > -1) { StringBuilder outPut = new StringBuilder(str); outPut.Replace('ö', 'o'); outPut.Replace('ç', 'c'); outPut.Replace('s', 's'); outPut.Replace('i', 'i'); outPut.Replace('g', 'g'); outPut.Replace('ü', 'u'); str = outPut.ToString(); } // if there are other invalid chars, convert them into blank spaces str = Regex.Replace(str, @"[^a-z0-9\s-]", ""); // convert multiple spaces and hyphens into one space str = Regex.Replace(str, @"[\s-]+", " ").Trim(); // cut and trim string str = str.Substring(0, str.Length <= maxLength ? str.Length : maxLength).Trim(); // add hyphens str = Regex.Replace(str, @"\s", "-"); return str; } but i should use same function on client side with javascript. is it possible to convert above function to js ? waiting all kinds of suggestion. thanks in advance..

Read the article

Getting text between quotes using regular expression

- by Camsoft

I'm having some issues with a regular expression I'm creating. I need a regex to match against the following examples and then sub match on the first quoted string: Input strings ("Lorem ipsum dolor sit amet, consectetur adipiscing elit.") ('Lorem ipsum dolor sit amet, consectetur adipiscing elit. ') ('Lorem ipsum dolor sit amet, consectetur adipiscing elit. ', 'arg1', "arg2") Must sub match Lorem ipsum dolor sit amet, consectetur adipiscing elit. Regex so far: $(["'])([^"']+)\1,?.*$ The regex does a sub match on the text between the first set of quotes and returns the sub match displayed above. This is almost working perfectly, but the problem I have is that if the quoted string contains quotes in the text the sub match stops at the first instance, see below: Failing input strings ("Lorem ipsum dolor \"sit\" amet, consectetur adipiscing elit.") Only sub matches: Lorem ipsum dolor ("Lorem ipsum dolor 'sit' amet, consectetur adipiscing elit.") The entire match fails. Notes The input strings are actually php code function calls. I'm writing a script that will scan .php source files for a specific function and grab the text from the first parameter.

Read the article

php regex word boundary matching in utf-8

- by dontomaso

Hi, I have the following php code in a utf-8 php file: var_dump(setlocale(LC_CTYPE, 'de_DE.utf8', 'German_Germany.utf-8', 'de_DE', 'german')); var_dump(mb_internal_encoding()); var_dump(mb_internal_encoding('utf-8')); var_dump(mb_internal_encoding()); var_dump(mb_regex_encoding()); var_dump(mb_regex_encoding('utf-8')); var_dump(mb_regex_encoding()); var_dump(preg_replace('/\bweiß\b/iu', 'weiss', 'weißbier')); I would like the last regex to replace only full words and not parts of words. On my windows computer, it returns: string 'German_Germany.1252' (length=19) string 'ISO-8859-1' (length=10) boolean true string 'UTF-8' (length=5) string 'EUC-JP' (length=6) boolean true string 'UTF-8' (length=5) string 'weißbier' (length=9) On the webserver (linux), I get: string(10) "de_DE.utf8" string(10) "ISO-8859-1" bool(true) string(5) "UTF-8" string(10) "ISO-8859-1" bool(true) string(5) "UTF-8" string(9) "weissbier" Thus, the regex works as I expected on windows but not on linux. So the main question is, how should I write my regex to only match at word boundaries? A secondary questions is how I can let windows know that I want to use utf-8 in my php application.

Read the article

Matching a Repeating Sub Series using a Regular Expression with PowerShell

- by Hinch

I have a text file that lists the names of a large number of Excel spreadsheets, and the names of the files that are linked to from the spreadsheets. In simplified form it looks like this: "Parent File1.xls" Link: ChildFileA.xls Link: ChildFileB.xls "ParentFile2.xls" "ParentFile3.xls" Blah Link: ChildFileC.xls Link: ChildFileD.xls More Junk Link: ChildFileE.xls "Parent File4.xls" Link: ChildFileF.xls In this example, ParentFile1.xls has embedded links to ChildFileA.xls and ChildFileB.xls, ParentFile2.xls has no embedded links, and ParentFile3.xls has 3 embedded links. I am trying to write a regular expression in PowerShell that will parse the text file producing output in the following form: ParentFile1.xls:ChildFileA.xls,ChildFileB.xls ParentFile3.xls:ChildFileC.xls,ChildFileD.xls,ChildFileE.xls etc The task is complicated by the fact that the text file contains a lot of junk between each of the lines, and a parent may not always have a child. Furthermore, a single file name may pass over multiple lines. However, it's not as bad as it sounds, as the parent and child file names are always clearly demarcated (the parent with quotes and the child with a prefix of Link: ). The PowerShell code I've been using is as follows: $content = [string]::Join([environment]::NewLine, (Get-Content C:\Temp\text.txt)) $regex = [regex]'(?im)\s*\"(.*)\r?\n?\s*(.*)\"[\s\S]*?Link: (.*)\r?\n?' $regex.Matches($content) | %{$_.Groups[1].Value + $_.Groups[2].Value + ":" + $_.Groups[3].Value} Using the example above, it outputs: ParentFile1.xls:ChildFileA.xls ParentFile2.xls""ParentFile3.xls:ChildFileC.xls ParentFile4.xls:ChildFileF.xls There are two issues. Firstly, the inclusion of the "" instead of a newline whenever a Parent without a Child is processed. And the second issue, which is the most important, is that only a single child is ever shown for each parent. I'm guessing I need to somehow recursively capture and display the multiple child links that exist for each parent, but I'm totally stumped as to how to do this with a regular expression. Amy help would be greatly appreciated. The file contains 100's of thousands of lines, and manual processing is not an option :)

Read the article

Overlapping matches with finditer() in Python

- by Raphink

Hi there, I'm using a regex to match Bible verse references in a text. The current regex is REF_REGEX = re.compile(r'(?<!\w)((?i)q(?:uote)?\s+)?((?:(?:[1-3]|I{1,3})\s*)?[A-Za-z]+)\.?(?:\s*(\d+)(?:[:.](\d+)(?:-(\d+))?)?)(?:\s+(?:(?i)(?:from\s+)|(?:in\s+)|(?P<lbrace>$))\s*(\w+)(?(lbrace)$))?', re.UNICODE) This matches the following expressions fine: "jn 3:16": (None, 'jn', '3', '16', None, None, None), "matt. 18:21-22": (None, 'matt', '18', '21', '22', None, None), "q matt. 18:21-22": ('q ', 'matt', '18', '21', '22', None, None), "QuOTe jn 3:16": ('QuOTe ', 'jn', '3', '16', None, None, None), "q 1co13:1": ('q ', '1co', '13', '1', None, None, None), "q 1 co 13:1": ('q ', '1 co', '13', '1', None, None, None), "quote 1 co 13:1": ('quote ', '1 co', '13', '1', None, None, None), "quote 1co13:1": ('quote ', '1co', '13', '1', None, None, None), "jean 3:18 (PDV)": (None, 'jean', '3', '18', None, '(', 'PDV'), "quote malachie 1.1-2 fRom Colombe": ('quote ', 'malachie', '1', '1', '2', None, 'Colombe'), "quote malachie 1.1-2 In Colombe": ('quote ', 'malachie', '1', '1', '2', None, 'Colombe'), "cinq jn 3:16 (test)": (None, 'jn', '3', '16', None, '(', 'test'), "Q IIKings5.13-58 from wolof": ('Q ', 'IIKings', '5', '13', '58', None, 'wolof'), "This text is about lv5.4-6 in KJV only": (None, 'lv', '5', '4', '6', None, 'KJV'), but it fails to parse: "Found in 2 Cor. 5:18-21 ( Ministers": (None, '2 Cor', '5', '18', '21', None, None), because it returns (None, 'in', '2', None, None, None, None) instead. Is there a way to get finditer() to return all matches, even if they overlap, or is there a way to improve my regex so it matches this last bit properly? Thanks.

Read the article

EOL Special Char not matching

- by Aurélien Ribon

Hello, I am trying to find every "a - b, c, d" pattern in an input string. The pattern I am using is the following : "^[ \t]*(\\w+)[ \t]*->[ \t]*(\\w+)((?:,[ \t]*\\w+)*)$" This pattern is a C# pattern, the "\t" refers to a tabulation (its a single escaped litteral, intepreted by the .NET String API), the "\w" refers to the well know regex litteral predefined class, double escaped to be interpreted as a "\w" by the .NET STring API, and then as a "WORD CLASS" by the .NET Regex API. The input is : a -> b b -> c c -> d The function is : private void ParseAndBuildGraph(String input) { MatchCollection mc = Regex.Matches(input, "^[ \t]*(\\w+)[ \t]*->[ \t]*(\\w+)((?:,[ \t]*\\w+)*)$", RegexOptions.Multiline); foreach (Match m in mc) { Debug.WriteLine(m.Value); } } The output is : c -> d Actually, there is a problem with the line ending "$" special char. If I insert a "\r" before "$", it works, but I thought "$" would match any line termination (with the Multiline option), especially a \r\n in a Windows environment. Is it not the case ?

Read the article

Replace apostrophe in json string with empty string

- by user572844

Hi, I have problem with deserialization of json string, because string is bad format. For example json object consist string property statusMessage with value "Hello "dog" ". The correct format should be "Hello \" dog \" " . I would like remove apostrophes from this property. Something Like this. "Hello "dog" ". - "Hello dog ". Here is it original json string which I work. "{\"jancl\":{\"idUser\":18438201,\"nick\":\"JANCl\",\"photo\":\"1\",\"sex\":1,\"photoAlbums\":1,\"videoAlbums\":0,\"sefNick\":\"jancl\",\"profilPercent\":75,\"emphasis\":false,\"age\":\"-\",\"isBlocked\":false,\"PHOTO\":{\"normal\":\"http://u.aimg.sk/fotky/1843/82/n_18438201.jpg?v=1\",\"medium\":\"http://u.aimg.sk/fotky/1843/82/m_18438201.jpg?v=1\",\"24x24\":\"http://u.aimg.sk/fotky/1843/82/s_18438201.jpg?v=1\"},\"PLUS\":{\"active\":false,\"activeTo\":\"0000-00-00\"},\"LOCATION\":{\"idRegion\":\"6\",\"regionName\":\"Trenciansky kraj\",\"idCity\":\"138\",\"cityName\":\"Trencianske Teplice\"},\"STATUS\":{\"isLoged\":true,\"isChating\":false,\"idChat\":0,\"roomName\":\"\",\"lastLogin\":1294925369},\"PROJECT_STATUS\":{\"photoAlbums\":1,\"photoAlbumsFavs\":0,\"videoAlbums\":0,\"videoAlbumsFavs\":0,\"videoAlbumsExts\":0,\"blogPosts\":0,\"emailNew\":0,\"postaNew\":0,\"clubInvitations\":0,\"dashboardItems\":1},\"STATUS_MESSAGE\":{\"statusMessage\":\"\"Status\"\",\"addTime\":\"1294872330\"},\"isFriend\":false,\"isIamFriend\":false}}" Problem is here, json string consist this object: "STATUS_MESSAGE": {"statusMessage":" "some "bad" value" ", "addTime" :"1294872330"} Condition of string which I want modified: string start with "statusMessage":" string can has any *lenght from 0 -N * string end with ", "addTime So I try write pattern for string which start with "statusMessage":", has any lenght and is ended with ", "addTime. Here is it: const string pattern = " \" statusMessage \" : \" .*? \",\"addTime\" "; var regex = new Regex(pattern, RegexOptions.IgnoreCase); //here i would replace " with empty string string result = regex.Replace(jsonString, match => ???); But I think pattern is wrong, also I don’t know how replace apostrophe with empty string (remove apostrophne). My goal is : "statusMessage":" "some "bad" value" to "statusMessage":" "some bad value" Thank for advice

Read the article

How can I optimize this or is there a better way to do it?(HTML Syntax Highlighter)

- by Tanner

Hello every one, I have made a HTML syntax highlighter in C# and it works great, but there's one problem. First off It runs pretty fast because it syntax highlights line by line, but when I paste more than one line of code or open a file I have to highlight the whole file which can take up to a minute for a file with only 150 lines of code. I tried just highlighting visible lines in the richtextbox but then when I try to scroll I can't it to highlight the new visible text. Here is my code:(note: I need to use regex so I can get the stuff in between < & characters) Highlight Whole File: public void AllMarkup() { int selectionstart = richTextBox1.SelectionStart; Regex rex = new Regex("<html>|</html>|<head.*?>|</head>|<body.*?>|</body>|<div.*?>|</div>|<span.*?>|</span>|<title.*?>|</title>|<style.*?>|</style>|<script.*?>|</script>|<link.*?/>|<meta.*?/>|<base.*?/>|<center.*?>|</center>|<a.*?>|</a>"); foreach (Match m in rex.Matches(richTextBox1.Text)) { richTextBox1.Select(m.Index, m.Value.Length); richTextBox1.SelectionColor = Color.Blue; richTextBox1.Select(selectionstart, -1); richTextBox1.SelectionColor = Color.Black; } richTextBox1.SelectionStart = selectionstart; } private void pasteToolStripMenuItem_Click(object sender, EventArgs e) { try { LockWindowUpdate(richTextBox1.Handle);//Stops text from flashing flashing richTextBox1.Paste(); AllMarkup(); }finally { LockWindowUpdate(IntPtr.Zero); } } I want to know if there's a better way to highlight this and make it faster or if someone can help me make it highlight only the visible text. Please help. :) Thanks, Tanner.

Read the article

Whats the wrong with this code?

- by girinie

Hi in this code first I am downloading a web-page source code then I am storing the code in text file. Again I am reading that file and matching with the regex to search a specific string. There is no compiler error. Exception in thread "main" java.lang.NoClassDefFoundError: java/lang/CharSequence Can anybody tell me Where I am wrong. import java.io.*; import java.net.*; import java.lang.*; import java.util.regex.Matcher; import java.util.regex.Pattern; public class WebDownload { public void getWebsite() { try{ URL url=new URL("www.gmail.com");// any URL can be given URLConnection urlc=url.openConnection(); BufferedInputStream buffer=new BufferedInputStream(urlc.getInputStream()); StringBuffer builder=new StringBuffer(); int byteRead; FileOutputStream fout; StringBuffer contentBuf = new StringBuffer(); while((byteRead=buffer.read()) !=-1) { builder.append((char)byteRead); fout = new FileOutputStream ("myfile3.txt"); new PrintStream(fout).println (builder.toString()); fout.close(); } BufferedReader in = new BufferedReader(new FileReader("myfile3.txt")); String buf = null; while ((buf = in.readLine()) != null) { contentBuf.append(buf);contentBuf.append("\n"); } in.close(); Pattern p = Pattern.compile("<div class=\"summarycount\">([^<]*)</div>"); Matcher matcher = p.matcher(contentBuf); if(matcher.find()) { System.out.println(matcher.group(1)); } else System.out.println("could not find"); } catch(MalformedURLException ex) { ex.printStackTrace(); } catch(IOException ex){ ex.printStackTrace(); } } public static void main(String [] args) { WebDownload web=new WebDownload(); web.getWebsite(); } }

Read the article

Best way to split a string by word (SQL Batch separator)

- by Paul Kohler

I have a class I use to "split" a string of SQL commands by a batch separator - e.g. "GO" - into a list of SQL commands that are run in turn etc. ... private static IEnumerable<string> SplitByBatchIndecator(string script, string batchIndicator) { string pattern = string.Concat("^\\s*", batchIndicator, "\\s*$"); RegexOptions options = RegexOptions.Compiled | RegexOptions.IgnoreCase | RegexOptions.Multiline; foreach (string batch in Regex.Split(script, pattern, options)) { yield return batch.Trim(); } } My current implementation uses a Regex with yield but I am not sure if it's the "best" way. It should be quick It should handle large strings (I have some scripts that are 10mb in size for example) The hardest part (that the above code currently does not do) is to take quoted text into account Currently the following SQL will incorrectly get split: var batch = QueryBatch.Parse(@"-- issue... insert into table (name, desc) values('foo', 'if the go is on a line by itself we have a problem...')"); Assert.That(batch.Queries.Count, Is.EqualTo(1), "This fails for now..."); I have thought about a token based parser that tracks the state of the open closed quotes but am not sure if Regex will do it. Any ideas!?

Read the article

How to catch YouTube embed code and turn into URL

- by Jonathan Vanasco

I need to strip YouTube embed codes down to their URL only. This is the exact opposite of all but one question on StackOverflow. Most people want to turn the URL into an embed code. This question addresses the usage patttern I want, but is tied to a specific embed code's regex ( Strip YouTube Embed Code Down to URL Only ) I'm not familiar with how YouTube has offered embeds over the years - or how the sizes differ. According to their current site, there are 2 possible embed templates and a variety of options. If that's it, I can handle a regex myself -- but I was hoping someone had more knowledge they could share, so I could write a proper regex pattern that matches them all and not run into endless edge-cases. The full use case scenario : user enters content in web based wysiwig editor backend cleans out youtube & other embed codes; reformats approved embeds into an internal format as the text is all converted to markdown. on display, appropriate current template/code display for youtube or other 3rd party site is generated At a previous company, our tech-team devised a plan where YouTube videos were embedded by listing the URL only. That worked great , but it was in a CMS where everyone was trained. I'm trying to create a similar storage, but for user-generated-content.

Read the article

regexp in java problem

- by Staszek28

Hello! I found some problem while testing my NLP system. I have a java regex "(.\.\s)*Dendryt.*" and for string "v Table of Contents List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . " it just dont stop computing. Its clear that this regex complexity is very high, I will try to refactor it. Have you some suggestions for me for a future regex development ??? Thanks.

Read the article

Which is the correct design pattern for my PHP application?

- by user1487141

I've been struggling to find good way to implement my system which essentially matches the season and episode number of a show from a string, you can see the current working code here: https://github.com/huddy/tvfilename I'm currently rewriting this library and want a nicer way to implement how the the match happens, currently essentially the way it works is: There's a folder with classes in it (called handlers), every handler is a class that implements an interface to ensure a method called match(); exists, this match method uses the regex stored in a property of that handler class (of which there are many) to try and match a season and episode. The class loads all of these handlers by instantiating each one into a array stored in a property, when I want to try and match some strings the method iterates over these objects calling match(); and the first one that returns true is then returned in a result set with the season and episode it matched. I don't really like this way of doing it, it's kind of hacky to me, and I'm hoping a design pattern can help, my ultimate goal is to do this using best practices and I wondered which one I should use? The other problems that exist are: More than one handler could match a string, so they have to be in an order to prevent the more greedy ones matching first, not sure if this is solvable as some of the regex patterns have to be greedy, but possibly a score system, something that shows a percentage of how likely the match is correct, i'd have no idea how to actually implement this though. I'm not if instantiating all those handlers is a good way of doing it, speed is important, but using best practices and sticking to design patterns to create good, extensible and maintainable code is my ultimate priority. It's worth noting the handler classes sometimes do other things than just regex matching, they sometimes prep the string to be matched by removing common words etc. Cheers for any help Billy

Read the article

C# regular expression for finding forms with input tags in HTML?

- by johnrl

Hi all. I have a simple problem: I want to construct a regex that matches a form in HTML, but only if the form has any input tags. Example: The following should be matched (ignoring attributes): .. <form> .. <input/> .. </form> .. But the following should not (ignoring attributes): .. <form> .. </form> .. I have tried everything from look-arounds to capture groups but it quickly gets complicated. I want to believe there is a simple regex to capture the problem. Please note that it is important that the regex pairs the opening and closing tags according to the HTML code which means the following does not work: <form>.+<input/>.+</form> because it matches wrongly like this: .. <form> <--- This is wrongly matched as the opening tag .. </form> <form> <-- This is the correct opening tag of the correct form .. <input/> .. </form> <--- This is matched as the closing tag ..

Read the article

Online job-searching is tedious. Help me automate it.

- by ehsanul

Many job sites have broken searches that don't let you narrow down jobs by experience level. Even when they do, it's usually wrong. This requires you to wade through hundreds of postings that you can't apply for before finding a relevant one, quite tedious. Since I'd rather focus on writing cover letters etc., I want to write a program to look through a large number of postings, and save the URLs of just those jobs that don't require years of experience. I don't require help writing the scraper to get the html bodies of possibly relevant job posts. The issue is accurately detecting the level of experience required for the job. This should not be too difficult as job posts are usually very explicit about this ("must have 5 years experience in..."), but there may be some issues with overly simple solutions. In my case, I'm looking for entry-level positions. Often they don't say "entry-level", but inclusion of the words probably means the job should be saved. Next, I can safely exclude a job the says it requires "5 years" of experience in whatever, so a regex like /\d\syears/ seems reasonable to exclude jobs. But then, I realized some jobs say they'll take 0-2 years of experience, matches the exclusion regex but is clearly a job I want to take a look at. Hmmm, I can handle that with another regex. But some say "less than 2 years" or "fewer than 2 years". Can handle that too, but it makes me wonder what other patterns I'm not thinking of, and possibly excluding many jobs. That's what brings me here, to find a better way to do this than regexes, if there is one. I'd like to minimize the false negative rate and save all the jobs that seem like they might not require many years of experience. Does excluding anything that matches /[3-9]\syears|1\d\syears/ seem reasonable? Or is there a better way? Training a bayesian filter maybe?

Read the article

how to read string part in java

- by Gandalf StormCrow

Hello everyone, I have this string : <meis xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" uri="localhost/naro-nei" onded="flpSW531213" identi="lemenia" id="75" lastStop="bendi" xsi:noNamespaceSchemaLocation="http://localhost/xsd/postat.xsd xsd/postat.xsd"> How can I get lastStop property value in JAVA? This regex worked when tested on http://www.myregexp.com/ But when I try it in java I don't see the matched text, here is how I tried : import java.util.regex.Pattern; import java.util.regex.Matcher; public class SimpleRegexTest { public static void main(String[] args) { String sampleText = <meis xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" uri=\"localhost/naro-nei\" onded=\"flpSW531213\" identi=\"lemenia\" id=\"75\" lastStop=\"bendi\" xsi:noNamespaceSchemaLocation=\"http://localhost/xsd/postat.xsd xsd/postat.xsd\">"; String sampleRegex = "(?<=lastStop=[\"']?)[^\"']*"; Pattern p = Pattern.compile(sampleRegex); Matcher m = p.matcher(sampleText); if (m.find()) { String matchedText = m.group(); System.out.println("matched [" + matchedText + "]"); } else { System.out.println("didn’t match"); } } }

Read the article

How to remove invalid UTF-8 characters from a JavaScript string?

- by msielski

I'd like to remove all invalid UTF-8 characters from a string in JavaScript. I've tried using the approach described here (link removed) and came up with the JavaScript: strTest = strTest.replace(/([\x00-\x7F]|[\xC0-\xDF][\x80-\xBF]|[\xE0-\xEF][\x80-\xBF]{2}|[\xF0-\xF7][\x80-\xBF]{3})|./, "$1"); It seems that the UTF-8 validation regex described here (link removed) is more complete and I adapted it in the same way like: strTest = strTest.replace(/([\x09\x0A\x0D\x20-\x7E]|[\xC2-\xDF][\x80-\xBF]|\xE0[\xA0-\xBF][\x80-\xBF]|[\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}|\xED[\x80-\x9F][\x80-\xBF]|\xF0[\x90-\xBF][\x80-\xBF]{2}|[\xF1-\xF3][\x80-\xBF]{3}|\xF4[\x80-\x8F][\x80-\xBF]{2})|./, "$1"); Both of these pieces of code seem to be allowing valid UTF-8 through, but aren't filtering out hardly any of the bad UTF-8 characters from my test data: UTF-8 decoder capability and stress test. Either the bad characters come through unchanged or seem to have some of their bytes removed creating a new, invalid character. I'm not very familiar with the UTF-8 standard or with multibyte in JavaScript so I'm not sure if I'm failing to represent proper UTF-8 in the regex or if I'm applying that regex improperly in JavaScript. Any help appreciated. Thanks!

Read the article

What would be the best approach to finding a date in a freeform text?

- by Matthew DeVos

What would be the best approach to finding a date in a freeform text? A post where a user may place a date in it in several different ways such as: July 14th & 15th 7/14 & 7/15 7-14 & 7-15 Saturday 14th and Sunday 15th Saturday July 14th and 15th and so on. Is regex my best choice for this type of thing with preg_match? I would also like to search if there are two dates, one for a start date and a second for an end date, but in the text I'm searching there may be one date or two. This is my PHP code so far: $dates1 = '01-01'; $dates2 = 'July 14th & 15th'; $dates3 = '7/14 & 7/15'; $dates4 = '7-14 & 7-15'; $dates5 = 'Saturday 14th and Sunday 15th'; $dates6 = 'Saturday July 14th and 15th'; $regexes = array( '/\s(1|2|3|4|5|6|7|8|9|10|11|12)\/\d{1,2}/', //finds a date '/\s(1|2|3|4|5|6|7|8|9|10|11|12)-\d{1,2}/', //finds another date '%\b(0?[1-9]|[12][0-9]|3[01])[- /.](0?[1-9]|1[012])\b%', //finds date format dd-mm or dd.mm ); foreach($regexes as $regex){ preg_match($regex,$dates,$matches); } var_dump($matches);

Read the article

Find Lines with N occurrences of a char

- by Martín Marconcini

I have a txt file that I’m trying to import as flat file into SQL2008 that looks like this: “123456”,”some text” “543210”,”some more text” “111223”,”other text” etc… The file has more than 300.000 rows and the text is large (usually 200-500 chars), so scanning the file by hand is very time consuming and prone to error. Other similar (and even more complex files) were successfully imported. The problem with this one, is that “some lines” contain quotes in the text… (this came from an export from an old SuperBase DB that didn’t let you specify a text quantifier, there’s nothing I can do with the file other than clear it and try to import it). So the “offending” lines look like this: “123456”,”this text “contains” a quote” “543210”,”And the “above” text is bad” etc… You can see the problem here. Now, 300.000 is not too much if I could perform a search using a text editor that can use regex, I’d manually remove the quotes from each line. The problem is not the number of offending lines, but the impossibility to find them with a simple search. I’m sure there are less than 500, but spread those in a 300.000 lines txt file and you know what I mean. Based upon that, what would be the best regex I could use to identify these lines? My first thought is: Tell me which lines contain more than 4 quotes (“). But I couldn’t come up with anything (I’m not good at Regex beyond the basics).

Read the article

Get the package of a Java source file

- by Oak

My goal is to find the package (as string) of a Java source file, given as plaintext and not already sorted in folders. I can't just locate the first instance of the keyword package in the file, because it may appear inside a comment. So I was thinking about two alternatives: Scan the file word-by-word, maintaining a "inside-a-comment" flag for the scanner. The first time the package keyword is encountered while not inside a comment, stop the scanning and report the result. Use a regex - should be theoretically possible because block comments do not next in Java, but I tried making such a regex and it turned out to be quite complicated - for me, at least. Another difference between the two approaches is that when scanning manually I can stop the scan when I can be certain the package keyword can no longer appear, saving some time... and I'm not sure I can do something similar with regexes. On the other hand, the decision "when it can no longer appear" is not necessarily simple, though I could use some heuristic for that. I would like to hear any input on this problem, and would welcome any help with the regex. My solution is written in Java as well.

Read the article

automatically execute an Excel macro on a cell change

- by namin

How can I automatically execute an Excel macro each time a value in a particular cell changes? Right now, my working code is: Private Sub Worksheet_Change(ByVal Target As Range) If Not Intersect(Target, Range("H5")) Is Nothing Then Macro End Sub where "H5" is the particular cell being monitored and Macro is the name of the macro. Is there a better way?

Read the article

htaccess not properly rewriting urls

- by Cameron Ball

This is a bit of a weird one. I'm doing some work on a server, and I need rewrite rules for directories that actually exist (in some cases, they are more than one level deep) At the moment my .htaccess looks like this: RewriteEngine on RewriteRule ^simfiles/([-\ a-zA-Z0-9:/]+)$ http://mydomain.com/?portal=simfiles&folder=$1 [L] And this is working OK, for example, a url like: mydomain.com/sifmiles/my-files Will get redirected to mydomain.com/?portal=simfiles&folder=my-files Or in the case of a directory structure that is deeper than one level: mydomain.com/sifmiles/my-files/more-of-my-files Will get redirected to mydomain.com/?portal=simfiles&folder=my-files/more-of-my-files I wrote the regex so that it won't match things with a . in the path, because there are css and js files which reside in simfiles/somedirectory, and if I redirect everything then these cannot be loaded. I tried a configuration like this: RewriteEngine on RewriteCond %{REQUEST_FILENAME} !-f RewriteRule ^simfiles/([-\ a-zA-Z0-9:/\.]+)$ http://mydomain.com/?portal=simfiles&folder=$1 [L] But that doesn't work, things still don't load properly. So my first question is, how can I achieve this "properly"? I don't like my solution because it means redirects won't occur if the folder has a . in its name. My second problem, is that while the redirection is happening properly, the url becomes: http://mydomain.com/?portal=simfiles&folder=my-files I want the URL to remain clean, like: http://mydomain.com/sifmiles/my-files How can I achieve this?

Read the article

Input field separator in awk

- by Matthijs

I have many large data files. The delimiter between the fields is a semicolon. However, I have found that there are semicolons in some of the fields, so I cannot simply use the semicolon as a field separator. The following example has 4 fields, but awk sees only 3, because the '1' in field 3 is stripped by the regex (which includes a '-' because some of the numerical data are negative): echo '"This";"is";1;"line of; data"' | awk -F'[0-9"-];[0-9"-]' '{print "No. of fields:\t"NF; print "Field 3:\t" $3}' No. of fields: 3 Field 3: ;"line of; data" Of course, echo '"This";"is";1;"line of; data"' | awk -F';' '{print "No. of fields:\t"NF}' No. of fields: 5 solves that problem, but counts the last field as two separate fields. Does anyone know a solution to this? Thanks! Matthijs

Search Results

Search found 3956 results on 159 pages for 'regex cookbook'.

Page 56/159 | < Previous Page | 52 53 54 55 56 57 58 59 60 61 62 63 | Next Page >

- by Ricardo Sa

- by Hikari

- by Adam Right

- by Camsoft

- by dontomaso

- by Hinch

- by Raphink

- by Aurélien Ribon

- by user572844

- by Tanner

- by girinie

- by Paul Kohler

- by Jonathan Vanasco

- by Staszek28

- by user1487141

- by johnrl

- by ehsanul

- by Gandalf StormCrow

- by msielski

- by Matthew DeVos

- by Martín Marconcini

- by Oak

- by namin

- by Cameron Ball

- by Matthijs

< Previous Page | 52 53 54 55 56 57 58 59 60 61 62 63 | Next Page >