regex group - Page 86 - Developer IT

php regex word boundary matching in utf-8

- by dontomaso

Hi, I have the following php code in a utf-8 php file: var_dump(setlocale(LC_CTYPE, 'de_DE.utf8', 'German_Germany.utf-8', 'de_DE', 'german')); var_dump(mb_internal_encoding()); var_dump(mb_internal_encoding('utf-8')); var_dump(mb_internal_encoding()); var_dump(mb_regex_encoding()); var_dump(mb_regex_encoding('utf-8')); var_dump(mb_regex_encoding()); var_dump(preg_replace('/\bweiß\b/iu', 'weiss', 'weißbier')); I would like the last regex to replace only full words and not parts of words. On my windows computer, it returns: string 'German_Germany.1252' (length=19) string 'ISO-8859-1' (length=10) boolean true string 'UTF-8' (length=5) string 'EUC-JP' (length=6) boolean true string 'UTF-8' (length=5) string 'weißbier' (length=9) On the webserver (linux), I get: string(10) "de_DE.utf8" string(10) "ISO-8859-1" bool(true) string(5) "UTF-8" string(10) "ISO-8859-1" bool(true) string(5) "UTF-8" string(9) "weissbier" Thus, the regex works as I expected on windows but not on linux. So the main question is, how should I write my regex to only match at word boundaries? A secondary questions is how I can let windows know that I want to use utf-8 in my php application.

Read the article

Matching a Repeating Sub Series using a Regular Expression with PowerShell

- by Hinch

I have a text file that lists the names of a large number of Excel spreadsheets, and the names of the files that are linked to from the spreadsheets. In simplified form it looks like this: "Parent File1.xls" Link: ChildFileA.xls Link: ChildFileB.xls "ParentFile2.xls" "ParentFile3.xls" Blah Link: ChildFileC.xls Link: ChildFileD.xls More Junk Link: ChildFileE.xls "Parent File4.xls" Link: ChildFileF.xls In this example, ParentFile1.xls has embedded links to ChildFileA.xls and ChildFileB.xls, ParentFile2.xls has no embedded links, and ParentFile3.xls has 3 embedded links. I am trying to write a regular expression in PowerShell that will parse the text file producing output in the following form: ParentFile1.xls:ChildFileA.xls,ChildFileB.xls ParentFile3.xls:ChildFileC.xls,ChildFileD.xls,ChildFileE.xls etc The task is complicated by the fact that the text file contains a lot of junk between each of the lines, and a parent may not always have a child. Furthermore, a single file name may pass over multiple lines. However, it's not as bad as it sounds, as the parent and child file names are always clearly demarcated (the parent with quotes and the child with a prefix of Link: ). The PowerShell code I've been using is as follows: $content = [string]::Join([environment]::NewLine, (Get-Content C:\Temp\text.txt)) $regex = [regex]'(?im)\s*\"(.*)\r?\n?\s*(.*)\"[\s\S]*?Link: (.*)\r?\n?' $regex.Matches($content) | %{$_.Groups[1].Value + $_.Groups[2].Value + ":" + $_.Groups[3].Value} Using the example above, it outputs: ParentFile1.xls:ChildFileA.xls ParentFile2.xls""ParentFile3.xls:ChildFileC.xls ParentFile4.xls:ChildFileF.xls There are two issues. Firstly, the inclusion of the "" instead of a newline whenever a Parent without a Child is processed. And the second issue, which is the most important, is that only a single child is ever shown for each parent. I'm guessing I need to somehow recursively capture and display the multiple child links that exist for each parent, but I'm totally stumped as to how to do this with a regular expression. Amy help would be greatly appreciated. The file contains 100's of thousands of lines, and manual processing is not an option :)

Read the article

Overlapping matches with finditer() in Python

- by Raphink

Hi there, I'm using a regex to match Bible verse references in a text. The current regex is REF_REGEX = re.compile(r'(?<!\w)((?i)q(?:uote)?\s+)?((?:(?:[1-3]|I{1,3})\s*)?[A-Za-z]+)\.?(?:\s*(\d+)(?:[:.](\d+)(?:-(\d+))?)?)(?:\s+(?:(?i)(?:from\s+)|(?:in\s+)|(?P<lbrace>$))\s*(\w+)(?(lbrace)$))?', re.UNICODE) This matches the following expressions fine: "jn 3:16": (None, 'jn', '3', '16', None, None, None), "matt. 18:21-22": (None, 'matt', '18', '21', '22', None, None), "q matt. 18:21-22": ('q ', 'matt', '18', '21', '22', None, None), "QuOTe jn 3:16": ('QuOTe ', 'jn', '3', '16', None, None, None), "q 1co13:1": ('q ', '1co', '13', '1', None, None, None), "q 1 co 13:1": ('q ', '1 co', '13', '1', None, None, None), "quote 1 co 13:1": ('quote ', '1 co', '13', '1', None, None, None), "quote 1co13:1": ('quote ', '1co', '13', '1', None, None, None), "jean 3:18 (PDV)": (None, 'jean', '3', '18', None, '(', 'PDV'), "quote malachie 1.1-2 fRom Colombe": ('quote ', 'malachie', '1', '1', '2', None, 'Colombe'), "quote malachie 1.1-2 In Colombe": ('quote ', 'malachie', '1', '1', '2', None, 'Colombe'), "cinq jn 3:16 (test)": (None, 'jn', '3', '16', None, '(', 'test'), "Q IIKings5.13-58 from wolof": ('Q ', 'IIKings', '5', '13', '58', None, 'wolof'), "This text is about lv5.4-6 in KJV only": (None, 'lv', '5', '4', '6', None, 'KJV'), but it fails to parse: "Found in 2 Cor. 5:18-21 ( Ministers": (None, '2 Cor', '5', '18', '21', None, None), because it returns (None, 'in', '2', None, None, None, None) instead. Is there a way to get finditer() to return all matches, even if they overlap, or is there a way to improve my regex so it matches this last bit properly? Thanks.

Read the article

Quering container with Linq + group by ?

- by Prix

public class ItemList { public int GuID { get; set; } public int ItemID { get; set; } public string Name { get; set; } public entityType Status { get; set; } public class Waypoint { public int Zone { get; set; } public int SubID { get; set; } public int Heading { get; set; } public float PosX { get; set; } public float PosY { get; set; } public float PosZ { get; set; } } public List<Waypoint> Routes = new List<Waypoint>(); } I have a list of items using the above class and now I need to group it by ItemID and join the first entry of Routes of each iqual ItemID. So for example, let's say on my list I have: GUID ItemID ListOfRoutes 1 23 first entry only 2 23 first entry only 3 23 first entry only 4 23 first entry only 5 23 first entry only 6 23 first entry only 7 23 first entry only Means I have to group entries 1 to 7 as 1 Item with all the Routes entries. So I would have one ItemID 23 with 7 Routes on it where those routes are the first element of that given GUID Routes List. My question is if it is possible using LINQ to make a statment to do something like that this: var query = from ItemList entry in myList where status.Contains(entry.Status) group entry by entry.ItemID into result select new { items = new { ID = entry.ItemID, Name = entry.Name }, routes = from ItemList m in entry group m.Routes.FirstOrDefault() by n.NpcID into m2 }; So basicly I would have list of unique IDS information with a inner list of all the first entry of each GUID route that had the same ItemID.

Read the article

EOL Special Char not matching

- by Aurélien Ribon

Hello, I am trying to find every "a - b, c, d" pattern in an input string. The pattern I am using is the following : "^[ \t]*(\\w+)[ \t]*->[ \t]*(\\w+)((?:,[ \t]*\\w+)*)$" This pattern is a C# pattern, the "\t" refers to a tabulation (its a single escaped litteral, intepreted by the .NET String API), the "\w" refers to the well know regex litteral predefined class, double escaped to be interpreted as a "\w" by the .NET STring API, and then as a "WORD CLASS" by the .NET Regex API. The input is : a -> b b -> c c -> d The function is : private void ParseAndBuildGraph(String input) { MatchCollection mc = Regex.Matches(input, "^[ \t]*(\\w+)[ \t]*->[ \t]*(\\w+)((?:,[ \t]*\\w+)*)$", RegexOptions.Multiline); foreach (Match m in mc) { Debug.WriteLine(m.Value); } } The output is : c -> d Actually, there is a problem with the line ending "$" special char. If I insert a "\r" before "$", it works, but I thought "$" would match any line termination (with the Multiline option), especially a \r\n in a Windows environment. Is it not the case ?

Read the article

Replace apostrophe in json string with empty string

- by user572844

Hi, I have problem with deserialization of json string, because string is bad format. For example json object consist string property statusMessage with value "Hello "dog" ". The correct format should be "Hello \" dog \" " . I would like remove apostrophes from this property. Something Like this. "Hello "dog" ". - "Hello dog ". Here is it original json string which I work. "{\"jancl\":{\"idUser\":18438201,\"nick\":\"JANCl\",\"photo\":\"1\",\"sex\":1,\"photoAlbums\":1,\"videoAlbums\":0,\"sefNick\":\"jancl\",\"profilPercent\":75,\"emphasis\":false,\"age\":\"-\",\"isBlocked\":false,\"PHOTO\":{\"normal\":\"http://u.aimg.sk/fotky/1843/82/n_18438201.jpg?v=1\",\"medium\":\"http://u.aimg.sk/fotky/1843/82/m_18438201.jpg?v=1\",\"24x24\":\"http://u.aimg.sk/fotky/1843/82/s_18438201.jpg?v=1\"},\"PLUS\":{\"active\":false,\"activeTo\":\"0000-00-00\"},\"LOCATION\":{\"idRegion\":\"6\",\"regionName\":\"Trenciansky kraj\",\"idCity\":\"138\",\"cityName\":\"Trencianske Teplice\"},\"STATUS\":{\"isLoged\":true,\"isChating\":false,\"idChat\":0,\"roomName\":\"\",\"lastLogin\":1294925369},\"PROJECT_STATUS\":{\"photoAlbums\":1,\"photoAlbumsFavs\":0,\"videoAlbums\":0,\"videoAlbumsFavs\":0,\"videoAlbumsExts\":0,\"blogPosts\":0,\"emailNew\":0,\"postaNew\":0,\"clubInvitations\":0,\"dashboardItems\":1},\"STATUS_MESSAGE\":{\"statusMessage\":\"\"Status\"\",\"addTime\":\"1294872330\"},\"isFriend\":false,\"isIamFriend\":false}}" Problem is here, json string consist this object: "STATUS_MESSAGE": {"statusMessage":" "some "bad" value" ", "addTime" :"1294872330"} Condition of string which I want modified: string start with "statusMessage":" string can has any *lenght from 0 -N * string end with ", "addTime So I try write pattern for string which start with "statusMessage":", has any lenght and is ended with ", "addTime. Here is it: const string pattern = " \" statusMessage \" : \" .*? \",\"addTime\" "; var regex = new Regex(pattern, RegexOptions.IgnoreCase); //here i would replace " with empty string string result = regex.Replace(jsonString, match => ???); But I think pattern is wrong, also I don’t know how replace apostrophe with empty string (remove apostrophne). My goal is : "statusMessage":" "some "bad" value" to "statusMessage":" "some bad value" Thank for advice

Read the article

How can I optimize this or is there a better way to do it?(HTML Syntax Highlighter)

- by Tanner

Hello every one, I have made a HTML syntax highlighter in C# and it works great, but there's one problem. First off It runs pretty fast because it syntax highlights line by line, but when I paste more than one line of code or open a file I have to highlight the whole file which can take up to a minute for a file with only 150 lines of code. I tried just highlighting visible lines in the richtextbox but then when I try to scroll I can't it to highlight the new visible text. Here is my code:(note: I need to use regex so I can get the stuff in between < & characters) Highlight Whole File: public void AllMarkup() { int selectionstart = richTextBox1.SelectionStart; Regex rex = new Regex("<html>|</html>|<head.*?>|</head>|<body.*?>|</body>|<div.*?>|</div>|<span.*?>|</span>|<title.*?>|</title>|<style.*?>|</style>|<script.*?>|</script>|<link.*?/>|<meta.*?/>|<base.*?/>|<center.*?>|</center>|<a.*?>|</a>"); foreach (Match m in rex.Matches(richTextBox1.Text)) { richTextBox1.Select(m.Index, m.Value.Length); richTextBox1.SelectionColor = Color.Blue; richTextBox1.Select(selectionstart, -1); richTextBox1.SelectionColor = Color.Black; } richTextBox1.SelectionStart = selectionstart; } private void pasteToolStripMenuItem_Click(object sender, EventArgs e) { try { LockWindowUpdate(richTextBox1.Handle);//Stops text from flashing flashing richTextBox1.Paste(); AllMarkup(); }finally { LockWindowUpdate(IntPtr.Zero); } } I want to know if there's a better way to highlight this and make it faster or if someone can help me make it highlight only the visible text. Please help. :) Thanks, Tanner.

Read the article

Whats the wrong with this code?

- by girinie

Hi in this code first I am downloading a web-page source code then I am storing the code in text file. Again I am reading that file and matching with the regex to search a specific string. There is no compiler error. Exception in thread "main" java.lang.NoClassDefFoundError: java/lang/CharSequence Can anybody tell me Where I am wrong. import java.io.*; import java.net.*; import java.lang.*; import java.util.regex.Matcher; import java.util.regex.Pattern; public class WebDownload { public void getWebsite() { try{ URL url=new URL("www.gmail.com");// any URL can be given URLConnection urlc=url.openConnection(); BufferedInputStream buffer=new BufferedInputStream(urlc.getInputStream()); StringBuffer builder=new StringBuffer(); int byteRead; FileOutputStream fout; StringBuffer contentBuf = new StringBuffer(); while((byteRead=buffer.read()) !=-1) { builder.append((char)byteRead); fout = new FileOutputStream ("myfile3.txt"); new PrintStream(fout).println (builder.toString()); fout.close(); } BufferedReader in = new BufferedReader(new FileReader("myfile3.txt")); String buf = null; while ((buf = in.readLine()) != null) { contentBuf.append(buf);contentBuf.append("\n"); } in.close(); Pattern p = Pattern.compile("<div class=\"summarycount\">([^<]*)</div>"); Matcher matcher = p.matcher(contentBuf); if(matcher.find()) { System.out.println(matcher.group(1)); } else System.out.println("could not find"); } catch(MalformedURLException ex) { ex.printStackTrace(); } catch(IOException ex){ ex.printStackTrace(); } } public static void main(String [] args) { WebDownload web=new WebDownload(); web.getWebsite(); } }

Read the article

ABGroupAddMember - Cannot Add Contact to Group (iPhone Address Book)

- by cookeecut

Hi, I wonder if someone can help me out on this: I'm writing some code to copy a set of contacts received through a web application into the Address Book. I want to put all these new contacts under a certain group. My code successfully creates the group and adds the contacts into the address book. However, the 'ABGroupAddMember' operation fails. It fails without an error and the result it returns is true, meaning that according to the debugger the contact should have been added to the group. However, this does not work. The portion of my loop code that adds the contact to the address book and then assigns it to the group is this: ABAddressBookAddRecord(addressBook,person, &anError); ABAddressBookSave(addressBook,&anError); ABGroupAddMember(fusionLiveGroupRef,person, &anError); ABAddressBookSave(addressBook,&anError); All references are valid. No errors are returned. All operations return true. What is going wrong?

Read the article

Best way to split a string by word (SQL Batch separator)

- by Paul Kohler

I have a class I use to "split" a string of SQL commands by a batch separator - e.g. "GO" - into a list of SQL commands that are run in turn etc. ... private static IEnumerable<string> SplitByBatchIndecator(string script, string batchIndicator) { string pattern = string.Concat("^\\s*", batchIndicator, "\\s*$"); RegexOptions options = RegexOptions.Compiled | RegexOptions.IgnoreCase | RegexOptions.Multiline; foreach (string batch in Regex.Split(script, pattern, options)) { yield return batch.Trim(); } } My current implementation uses a Regex with yield but I am not sure if it's the "best" way. It should be quick It should handle large strings (I have some scripts that are 10mb in size for example) The hardest part (that the above code currently does not do) is to take quoted text into account Currently the following SQL will incorrectly get split: var batch = QueryBatch.Parse(@"-- issue... insert into table (name, desc) values('foo', 'if the go is on a line by itself we have a problem...')"); Assert.That(batch.Queries.Count, Is.EqualTo(1), "This fails for now..."); I have thought about a token based parser that tracks the state of the open closed quotes but am not sure if Regex will do it. Any ideas!?

Read the article

ldap login form works, but need to add active-directory group access

- by Brad

I created a form that asks you to log in, then verifies the user/pass against the ldap server/active-directory, if successful, it creates a session, which will be checked on every page. Now I want to check the session, which is the username of the person who is logged in, and do a search for them using ldap_search, so I can check what group they belong to and pass that group thru a function to verify that they can view that page. Each page will accessible to a certain group or groups of users, which those groups are defined within Active Directory. I am unsure on how I can do that using ldap_search, or maybe that is just one piece of the puzzle I am trying to solve. Any help is appreciated - thank you! In the example code below, it is seeing if the user belongs to the student active-directory group (I do not know if this code works, but it should give you an idea of what I want to accomplish). $filter = "CN=StudentCN=Users,dc=domain,dc=control"; $result = ldap_search($ldapconn,$filter,$valid_session_username); if($result == TRUE) { print $valid_session_username.' does have access to this page'; } else { print $valid_session_username.' does NOT have access to this page'; }

Read the article

How to catch YouTube embed code and turn into URL

- by Jonathan Vanasco

I need to strip YouTube embed codes down to their URL only. This is the exact opposite of all but one question on StackOverflow. Most people want to turn the URL into an embed code. This question addresses the usage patttern I want, but is tied to a specific embed code's regex ( Strip YouTube Embed Code Down to URL Only ) I'm not familiar with how YouTube has offered embeds over the years - or how the sizes differ. According to their current site, there are 2 possible embed templates and a variety of options. If that's it, I can handle a regex myself -- but I was hoping someone had more knowledge they could share, so I could write a proper regex pattern that matches them all and not run into endless edge-cases. The full use case scenario : user enters content in web based wysiwig editor backend cleans out youtube & other embed codes; reformats approved embeds into an internal format as the text is all converted to markdown. on display, appropriate current template/code display for youtube or other 3rd party site is generated At a previous company, our tech-team devised a plan where YouTube videos were embedded by listing the URL only. That worked great , but it was in a CMS where everyone was trained. I'm trying to create a similar storage, but for user-generated-content.

Read the article

Running single test class or group with Surefire and TestNG

- by Slartibartfast

I want to run single test class from command line using Maven and TestNG Things that doesn't work: mvn -Dtest=ClassName test I have defined groups in pom.xml, and this class isn't in one of those groups. So it got excluded on those grounds. mvn -Dgroups=skipped-group test mvn -Dsurefire.groups=skipped-group test when config is <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-surefire-plugin</artifactId> <version>2.7.1</version> <configuration> <groups>functest</groups> </configuration> </plugin> Parameters work fine in there are no groups defined in pom.xml. Similarly, when surefire is configured with <configuration> <includes> <include>**/*UnitTest.java</include> </includes> </configuration> I can add another test with -Dtest parameter, but cannot add group. In any combination, I can narrow down tests to be executed with groups, but not expand them. What's wrong with my configuration? Is there a way to run a single test or group outside of those defined in pom.xml? Tried on Ubuntu 10.04 with Maven 2.2.1, TestNG 5.14.6 and Surefire 2.7.1

Read the article

MS Office Excel Ribbon - Cannot change/hide Editing group in Home tab

- by A9S6

I have a .net addin for Excel. The addin creates the Ribbon UI for Excel 2007 and re-purposes some existing commands such as Cut, Copy, Paste, Sort etc. For Cut, Copy and Paste I am just overriding their OnAction value to call my own procedure when the buttons are clicked. But for Sort, Sort Asc and Sort Desc commands the case is a little different. When either of the Sort, Sort Asc or Sort Desc buttons are clicked, I want to get notified and then call the default functionality. This was possible in Excel 2003 commandsbars by calling the Execute() method on the CommandBarControl. In Excel 2007, there is a ExecuteMso() method to programmatically click a ribbon element but when the OnAction is overridden, this ExecuteMso() method just executes my own procedure and not the default functionality of that button. So I thought that I will HIDE the Sort buttons in the "Editing" group in Home tab and add my own Sort, Sort Asc and Sort Desc buttons to it. The buttons will call into my procedure first from where I will call the default behavior. Now the problem is that I am unable to change/hide the Editing group (idMso="GroupEditing"). Is this built-in group not editable? I can however HIDE the Clipboard and other groups(but can't add buttons to them). <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <customUI xmlns="http://schemas.microsoft.com/office/2006/01/customui"> <ribbon> <tabs> <tab idMso="TabHome"> <group idMso="GroupEditing" visible="false" /> </tab> </tabs> </ribbon> </customUI>

Read the article

regexp in java problem

- by Staszek28

Hello! I found some problem while testing my NLP system. I have a java regex "(.\.\s)*Dendryt.*" and for string "v Table of Contents List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . " it just dont stop computing. Its clear that this regex complexity is very high, I will try to refactor it. Have you some suggestions for me for a future regex development ??? Thanks.

Read the article

Which is the correct design pattern for my PHP application?

- by user1487141

I've been struggling to find good way to implement my system which essentially matches the season and episode number of a show from a string, you can see the current working code here: https://github.com/huddy/tvfilename I'm currently rewriting this library and want a nicer way to implement how the the match happens, currently essentially the way it works is: There's a folder with classes in it (called handlers), every handler is a class that implements an interface to ensure a method called match(); exists, this match method uses the regex stored in a property of that handler class (of which there are many) to try and match a season and episode. The class loads all of these handlers by instantiating each one into a array stored in a property, when I want to try and match some strings the method iterates over these objects calling match(); and the first one that returns true is then returned in a result set with the season and episode it matched. I don't really like this way of doing it, it's kind of hacky to me, and I'm hoping a design pattern can help, my ultimate goal is to do this using best practices and I wondered which one I should use? The other problems that exist are: More than one handler could match a string, so they have to be in an order to prevent the more greedy ones matching first, not sure if this is solvable as some of the regex patterns have to be greedy, but possibly a score system, something that shows a percentage of how likely the match is correct, i'd have no idea how to actually implement this though. I'm not if instantiating all those handlers is a good way of doing it, speed is important, but using best practices and sticking to design patterns to create good, extensible and maintainable code is my ultimate priority. It's worth noting the handler classes sometimes do other things than just regex matching, they sometimes prep the string to be matched by removing common words etc. Cheers for any help Billy

Read the article

C# regular expression for finding forms with input tags in HTML?

- by johnrl

Hi all. I have a simple problem: I want to construct a regex that matches a form in HTML, but only if the form has any input tags. Example: The following should be matched (ignoring attributes): .. <form> .. <input/> .. </form> .. But the following should not (ignoring attributes): .. <form> .. </form> .. I have tried everything from look-arounds to capture groups but it quickly gets complicated. I want to believe there is a simple regex to capture the problem. Please note that it is important that the regex pairs the opening and closing tags according to the HTML code which means the following does not work: <form>.+<input/>.+</form> because it matches wrongly like this: .. <form> <--- This is wrongly matched as the opening tag .. </form> <form> <-- This is the correct opening tag of the correct form .. <input/> .. </form> <--- This is matched as the closing tag ..

Read the article

Online job-searching is tedious. Help me automate it.

- by ehsanul

Many job sites have broken searches that don't let you narrow down jobs by experience level. Even when they do, it's usually wrong. This requires you to wade through hundreds of postings that you can't apply for before finding a relevant one, quite tedious. Since I'd rather focus on writing cover letters etc., I want to write a program to look through a large number of postings, and save the URLs of just those jobs that don't require years of experience. I don't require help writing the scraper to get the html bodies of possibly relevant job posts. The issue is accurately detecting the level of experience required for the job. This should not be too difficult as job posts are usually very explicit about this ("must have 5 years experience in..."), but there may be some issues with overly simple solutions. In my case, I'm looking for entry-level positions. Often they don't say "entry-level", but inclusion of the words probably means the job should be saved. Next, I can safely exclude a job the says it requires "5 years" of experience in whatever, so a regex like /\d\syears/ seems reasonable to exclude jobs. But then, I realized some jobs say they'll take 0-2 years of experience, matches the exclusion regex but is clearly a job I want to take a look at. Hmmm, I can handle that with another regex. But some say "less than 2 years" or "fewer than 2 years". Can handle that too, but it makes me wonder what other patterns I'm not thinking of, and possibly excluding many jobs. That's what brings me here, to find a better way to do this than regexes, if there is one. I'd like to minimize the false negative rate and save all the jobs that seem like they might not require many years of experience. Does excluding anything that matches /[3-9]\syears|1\d\syears/ seem reasonable? Or is there a better way? Training a bayesian filter maybe?

Read the article

how to read string part in java

- by Gandalf StormCrow

Hello everyone, I have this string : <meis xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" uri="localhost/naro-nei" onded="flpSW531213" identi="lemenia" id="75" lastStop="bendi" xsi:noNamespaceSchemaLocation="http://localhost/xsd/postat.xsd xsd/postat.xsd"> How can I get lastStop property value in JAVA? This regex worked when tested on http://www.myregexp.com/ But when I try it in java I don't see the matched text, here is how I tried : import java.util.regex.Pattern; import java.util.regex.Matcher; public class SimpleRegexTest { public static void main(String[] args) { String sampleText = <meis xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" uri=\"localhost/naro-nei\" onded=\"flpSW531213\" identi=\"lemenia\" id=\"75\" lastStop=\"bendi\" xsi:noNamespaceSchemaLocation=\"http://localhost/xsd/postat.xsd xsd/postat.xsd\">"; String sampleRegex = "(?<=lastStop=[\"']?)[^\"']*"; Pattern p = Pattern.compile(sampleRegex); Matcher m = p.matcher(sampleText); if (m.find()) { String matchedText = m.group(); System.out.println("matched [" + matchedText + "]"); } else { System.out.println("didn’t match"); } } }

Read the article

How to remove invalid UTF-8 characters from a JavaScript string?

- by msielski

I'd like to remove all invalid UTF-8 characters from a string in JavaScript. I've tried using the approach described here (link removed) and came up with the JavaScript: strTest = strTest.replace(/([\x00-\x7F]|[\xC0-\xDF][\x80-\xBF]|[\xE0-\xEF][\x80-\xBF]{2}|[\xF0-\xF7][\x80-\xBF]{3})|./, "$1"); It seems that the UTF-8 validation regex described here (link removed) is more complete and I adapted it in the same way like: strTest = strTest.replace(/([\x09\x0A\x0D\x20-\x7E]|[\xC2-\xDF][\x80-\xBF]|\xE0[\xA0-\xBF][\x80-\xBF]|[\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}|\xED[\x80-\x9F][\x80-\xBF]|\xF0[\x90-\xBF][\x80-\xBF]{2}|[\xF1-\xF3][\x80-\xBF]{3}|\xF4[\x80-\x8F][\x80-\xBF]{2})|./, "$1"); Both of these pieces of code seem to be allowing valid UTF-8 through, but aren't filtering out hardly any of the bad UTF-8 characters from my test data: UTF-8 decoder capability and stress test. Either the bad characters come through unchanged or seem to have some of their bytes removed creating a new, invalid character. I'm not very familiar with the UTF-8 standard or with multibyte in JavaScript so I'm not sure if I'm failing to represent proper UTF-8 in the regex or if I'm applying that regex improperly in JavaScript. Any help appreciated. Thanks!

Read the article

Optimising ruby regexp -- lots of match groups

- by Farcaller

I'm working on a ruby baser lexer. To improve performance, I joined up all tokens' regexps into one big regexp with match group names. The resulting regexp looks like: /\A(?<__anonymous_-1038694222803470993>(?-mix:\n+))|\A(?<__anonymous_-1394418499721420065>(?-mix:\/\/[\A\n]*))|\A(?<__anonymous_3077187815313752157>(?-mix:include\s+"[\A"]+"))|\A(?<LET>(?-mix:let\s))|\A(?<IN>(?-mix:in\s))|\A(?<CLASS>(?-mix:class\s))|\A(?<DEF>(?-mix:def\s))|\A(?<DEFM>(?-mix:defm\s))|\A(?<MULTICLASS>(?-mix:multiclass\s))|\A(?<FUNCNAME>(?-mix:![a-zA-Z_][a-zA-Z0-9_]*))|\A(?<ID>(?-mix:[a-zA-Z_][a-zA-Z0-9_]*))|\A(?<STRING>(?-mix:"[\A"]*"))|\A(?<NUMBER>(?-mix:[0-9]+))/ I'm matching it to my string producing a MatchData where exactly one token is parsed: bigregex =~ "\n ... garbage" puts $~.inspect Which outputs #<MatchData "\n" __anonymous_-1038694222803470993:"\n" __anonymous_-1394418499721420065:nil __anonymous_3077187815313752157:nil LET:nil IN:nil CLASS:nil DEF:nil DEFM:nil MULTICLASS:nil FUNCNAME:nil ID:nil STRING:nil NUMBER:nil> So, the regex actually matched the "\n" part. Now, I need to figure the match group where it belongs (it's clearly visible from #inspect output that it's _anonymous-1038694222803470993, but I need to get it programmatically). I could not find any option other than iterating over #names: m.names.each do |n| if m[n] type = n.to_sym resolved_type = (n.start_with?('__anonymous_') ? nil : type) val = m[n] break end end which verifies that the match group did have a match. The problem here is that it's slow (I spend about 10% of time in the loop; also 8% grabbing the @input[@pos..-1] to make sure that \A works as expected to match start of string (I do not discard input, just shift the @pos in it). You can check the full code at GH repo. Any ideas on how to make it at least a bit faster? Is there any option to figure the "successful" match group easier?

Read the article

What would be the best approach to finding a date in a freeform text?

- by Matthew DeVos

What would be the best approach to finding a date in a freeform text? A post where a user may place a date in it in several different ways such as: July 14th & 15th 7/14 & 7/15 7-14 & 7-15 Saturday 14th and Sunday 15th Saturday July 14th and 15th and so on. Is regex my best choice for this type of thing with preg_match? I would also like to search if there are two dates, one for a start date and a second for an end date, but in the text I'm searching there may be one date or two. This is my PHP code so far: $dates1 = '01-01'; $dates2 = 'July 14th & 15th'; $dates3 = '7/14 & 7/15'; $dates4 = '7-14 & 7-15'; $dates5 = 'Saturday 14th and Sunday 15th'; $dates6 = 'Saturday July 14th and 15th'; $regexes = array( '/\s(1|2|3|4|5|6|7|8|9|10|11|12)\/\d{1,2}/', //finds a date '/\s(1|2|3|4|5|6|7|8|9|10|11|12)-\d{1,2}/', //finds another date '%\b(0?[1-9]|[12][0-9]|3[01])[- /.](0?[1-9]|1[012])\b%', //finds date format dd-mm or dd.mm ); foreach($regexes as $regex){ preg_match($regex,$dates,$matches); } var_dump($matches);

Read the article

Find Lines with N occurrences of a char

- by Martín Marconcini

I have a txt file that I’m trying to import as flat file into SQL2008 that looks like this: “123456”,”some text” “543210”,”some more text” “111223”,”other text” etc… The file has more than 300.000 rows and the text is large (usually 200-500 chars), so scanning the file by hand is very time consuming and prone to error. Other similar (and even more complex files) were successfully imported. The problem with this one, is that “some lines” contain quotes in the text… (this came from an export from an old SuperBase DB that didn’t let you specify a text quantifier, there’s nothing I can do with the file other than clear it and try to import it). So the “offending” lines look like this: “123456”,”this text “contains” a quote” “543210”,”And the “above” text is bad” etc… You can see the problem here. Now, 300.000 is not too much if I could perform a search using a text editor that can use regex, I’d manually remove the quotes from each line. The problem is not the number of offending lines, but the impossibility to find them with a simple search. I’m sure there are less than 500, but spread those in a 300.000 lines txt file and you know what I mean. Based upon that, what would be the best regex I could use to identify these lines? My first thought is: Tell me which lines contain more than 4 quotes (“). But I couldn’t come up with anything (I’m not good at Regex beyond the basics).

Read the article

Get the package of a Java source file

- by Oak

My goal is to find the package (as string) of a Java source file, given as plaintext and not already sorted in folders. I can't just locate the first instance of the keyword package in the file, because it may appear inside a comment. So I was thinking about two alternatives: Scan the file word-by-word, maintaining a "inside-a-comment" flag for the scanner. The first time the package keyword is encountered while not inside a comment, stop the scanning and report the result. Use a regex - should be theoretically possible because block comments do not next in Java, but I tried making such a regex and it turned out to be quite complicated - for me, at least. Another difference between the two approaches is that when scanning manually I can stop the scan when I can be certain the package keyword can no longer appear, saving some time... and I'm not sure I can do something similar with regexes. On the other hand, the decision "when it can no longer appear" is not necessarily simple, though I could use some heuristic for that. I would like to hear any input on this problem, and would welcome any help with the regex. My solution is written in Java as well.

Read the article

Why is my global security group being filtered out of my logon token?

- by Jay Michaud

While investigating the effects of filtered tokens on my file permissions, I noticed that one of my global security groups is being filtered in addition to the regular system-defined filtered groups. My Active Directory environment is a single-domain forest on the Windows Server 2003 functional level. I'll call the domain "mydomain.example.com". I am logged onto a Windows Server 2008 Enterprise Edition machine (not a domain controller) as a member of the "MYDOMAIN\Domain Admins" group and the "MYDOMAIN\MySecurityGroup" global security group (among others). When I run "whoami /groups" from an elevated command prompt, I see the full list of groups to which my account belongs as expected. When I run "whoami /groups" from a regular, non-elevated command prompt, I see the same list of groups, but the following groups are described as "Group used for deny only". BUILTIN\Administrators MYDOMAIN\Schema Admins MYDOMAIN\Offer Remote Assistance Helpers MYDOMAIN\MySecurityGroup Numbers 1 through 3 above are expected based on Microsoft documentation; number 4 is not. The "MYDOMAIN\MySecurityGroup" global security group is a group that I created. It contains three non-built-in global security groups, and these security groups contain only non-built-in user accounts. (That is, I created all of the accounts and groups that are members of the "MYDOMAIN\MySecurityGroup" global security group.) There are other, similar groups of which my account is a member that are not being filtered out of my logon token, and this group is not granted any specific user rights in the security settings of this computer or in Group Policy. What would cause this one group to be filtered out of my logon token?

Search Results

Search found 14260 results on 571 pages for 'regex group'.

Page 86/571 | < Previous Page | 82 83 84 85 86 87 88 89 90 91 92 93 | Next Page >

- by dontomaso

- by Hinch

- by Raphink

- by Prix

- by Aurélien Ribon

- by user572844

- by Tanner

- by girinie

- by cookeecut

- by Paul Kohler

- by Brad

- by Jonathan Vanasco

- by Slartibartfast

- by A9S6

- by Staszek28

- by user1487141

- by johnrl

- by ehsanul

- by Gandalf StormCrow

- by msielski

- by Farcaller

- by Matthew DeVos

- by Martín Marconcini

- by Oak

- by Jay Michaud

< Previous Page | 82 83 84 85 86 87 88 89 90 91 92 93 | Next Page >