Search Results

Search found 3804 results on 153 pages for 'regex'.

Page 59/153 | < Previous Page | 55 56 57 58 59 60 61 62 63 64 65 66  | Next Page >

  • Merging two Regular Expressions to Truncate Words in Strings

    - by Alix Axel
    I'm trying to come up with the following function that truncates string to whole words (if possible, otherwise it should truncate to chars): function Text_Truncate($string, $limit, $more = '...') { $string = trim(html_entity_decode($string, ENT_QUOTES, 'UTF-8')); if (strlen(utf8_decode($string)) > $limit) { $string = preg_replace('~^(.{1,' . intval($limit) . '})(?:\s.*|$)~su', '$1', $string); if (strlen(utf8_decode($string)) > $limit) { $string = preg_replace('~^(.{' . intval($limit) . '}).*~su', '$1', $string); } $string .= $more; } return trim(htmlentities($string, ENT_QUOTES, 'UTF-8', true)); } Here are some tests: // Iñtërnâtiônàlizætiøn and then the quick brown fox... (49 + 3 chars) echo dyd_Text_Truncate('Iñtërnâtiônàlizætiøn and then the quick brown fox jumped overly the lazy dog and one day the lazy dog humped the poor fox down until she died.', 50, '...'); // Iñtërnâtiônàlizætiøn_and_then_the_quick_brown_fox_... (50 + 3 chars) echo dyd_Text_Truncate('Iñtërnâtiônàlizætiøn_and_then_the_quick_brown_fox_jumped_overly_the_lazy_dog and one day the lazy dog humped the poor fox down until she died.', 50, '...'); They both work as it is, however if I drop the second preg_replace() I get the following: Iñtërnâtiônàlizætiøn_and_then_the_quick_brown_fox_jumped_overly_the_lazy_dog and one day the lazy dog humped the poor fox down until she died.... I can't use substr() because it only works on byte level and I don't have access to mb_substr() ATM, I've made several attempts to join the second regex with the first one but without success. Please help S.M.S., I've been struggling with this for almost an hour. EDIT: I'm sorry, I've been awake for 40 hours and I shamelessly missed this: $string = preg_replace('~^(.{1,' . intval($limit) . '})(?:\s.*|$)?~su', '$1', $string); Still, if someone has a more optimized regex (or one that ignores the trailing space) please share: "Iñtërnâtiônàlizætiøn and then " "Iñtërnâtiônàlizætiøn_and_then_" EDIT 2: I still can't get rid of the trailing whitespace, can someone help me out?

    Read the article

  • preg_match_all and newlines inside quotes

    - by David
    Another noob regex problem/question. I'm probably doing something silly so I thought I'd exploit the general ingenuity of the SO regulars ;) Trying to match newlines but only if they occur within either double quotes or single quotes. I also want to catch strings that are between quotes but contain no newlines. Okay so there's what i got, with output. Below that, will be the output I would like to get. Any help would be greatly appreciated! :) I use Regex Coach to help me create my patterns, being a novice and all. According to RC, The pattern I supply does match all occurances within the data, but in my PHP, it skips over the multi-line part. I have tried with the 'm' pattern modifier already, to no avail. Contents of $CompressedData: <?php $Var = "test"; $Var2 = "test2"; $Var3 = "blah blah blah blah blah blah blah blah blah"; $Var4 = "hello"; ?> Pattern / Code: preg_match_all('!(\'|")(\b.*\b\n*)*(\'|")!', $CompressedData, $Matches); Current print_r output of $Matches: Array ( [0] => Array ( [0] => "test" [1] => "test2" [2] => "hello" ) ... } DESIRED print_r output of $Matches: Array ( [0] => Array ( [0] => "test" [1] => "test2" [2] => "blah blah blah blah blah blah blah blah blah" [3] => "hello" ) ... }

    Read the article

  • Extracting email addresses in an html block in ruby/rails

    - by corroded
    I am creating a parser that wards off against spamming and harvesting of emails from a block of text that comes from tinyMCE (so it may or may not have html tags in it) I've tried regexes and so far this has been successful: /\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b/i problem is, i need to ignore all email addresses with mailto hrefs. for example: <a href="mailto:[email protected]">[email protected]</a> should only return the second email add. To get a background of what im doing, im reversing the email addresses in a block so the above example would look like this: <a href="mailto:[email protected]">moc.liam@tset</a> problem with my current regex is that it also replaces the one in href. Is there a way for me to do this with a single regex? Or do i have to check for one then the other? Is there a way for me to do this just by using gsub or do I have to use some nokogiri/hpricot magicks and whatnot to parse the mailtos? Thanks in advance! Here were my references btw: so.com/questions/504860/extract-email-addresses-from-a-block-of-text so.com/questions/1376149/regexp-for-extracting-a-mailto-address im also testing using this: http://rubular.com/ edit here's my current helper code: def email_obfuscator(text) text.gsub(/\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b/i) { |m| m = "<span class='anti-spam'>#{m.reverse}</span>" } end which results in this: <a target="_self" href="mailto:<span class='anti-spam'>moc.liamg@tset</span>"><span class="anti-spam">moc.liamg@tset</span></a>

    Read the article

  • Need variable width negative lookbehind replacement

    - by Technoh
    I have looked at many questions here (and many more websites) and some provided hints but none gave me a definitive answer. I know regular expressions but I am far from being a guru. This particular question deals with regex in PHP. I need to locate words in a text that are not surrounded by a hyperlink of a given class. For example, I might have This <a href="blabblah" class="no_check">elephant</a> is green and this elephant is blue while this <a href="blahblah">elephant</a> is red. I would need to match against the second and third elephants but not the first (identified by test class "no_check"). Note that there could more attributes than just href and class within hyperlinks. I came up with ((?<!<a .*class="no_check".*>)\belephant\b) which works beautifully in regex test software but not in PHP. Any help is greatly appreciated. If you cannot provide a regular expression but can find some sort of PHP code logic that would circumvent the need for it, I would be equally grateful.

    Read the article

  • Extracting one word based on special character using Regular Expression in C#

    - by Jankhana
    I am not very good at regular expression but want to do some thing like this : string="c test123 d split" I want to split the word based on "c" and "d". this can be any word which i already have. The string will be given by the user. i want "test123" and "split" as my output. and there can be any number of words i.e "c test123 d split e new" etc. c d e i have already with me. I want just the next word after that word i.e after c i have test123 and after d i have split and after e i have new so i need test123 and split and new. how can I do this??? And one more thing I will pass just c first than d and than e. not together all of them. I tried string strSearchWord="c "; Regex testRegex1 = new Regex(strSearchWord); List lstValues = testRegex1.Split("c test123 d split").ToList(); But it's working only for last character i.e for d it's giving the last word but for c it includes test123 d split. How shall I do this???

    Read the article

  • Regular expression either/or not matching everything

    - by dwatransit
    I'm trying to parse an HTTP GET request to determine if the url contains any of a number of file types. If it does, I want to capture the entire request. There is something I don't understand about ORing. The following regular expression only captures part of it, and only if .flv is the first int the list of ORd values. (I've obscured the urls with spaces because Stackoverflow limits hyperlinks) regex: GET.?(.flv)|(.mp4)|(.avi).? test text: GET http: // foo.server.com/download/0/37/3000016511/.flv?mt=video/xy match output: GET http: // foo.server.com/download/0/37/3000016511/.flv I don't understand why the .*? at the end of the regex isnt callowing it to capture the entire text. If I get rid of the ORing of file types, then it works. Here is the test code in case my explanation doesn't make sense: public static void main(String[] args) { // TODO Auto-generated method stub String sourcestring = "GET http: // foo.server.com/download/0/37/3000016511/.flv?mt=video/xy"; Pattern re = Pattern.compile("GET .?\.flv."); // this works //output: // [0][0] = GET http :// foo.server.com/download/0/37/3000016511/.flv?mt=video/xy // the match from the following ends with the ".flv", not the entire url. // also it only works if .flv is the first of the 3 ORd options //Pattern re = Pattern.compile("GET .?(\.flv)|(\.mp4)|(\.avi).?"); // output: //[0][0] = GET http: // foo.server.com/download/0/37/3000016511/.flv // [0][1] = .flv // [0][2] = null // [0][3] = null Matcher m = re.matcher(sourcestring); int mIdx = 0; while (m.find()){ for( int groupIdx = 0; groupIdx < m.groupCount()+1; groupIdx++ ){ System.out.println( "[" + mIdx + "][" + groupIdx + "] = " + m.group(groupIdx)); } mIdx++; } } }

    Read the article

  • How to delete sentences starting with a lower case letter?

    - by Ron
    Hello: In the example below the following regex (".*?") was used to remove all dialogue first. The next step is to remove all remaining sentences starting with a lower case letter. Only sentences starting with an upper case letter should remain. Example: exclaimed Wade. Indeed, below them were villages, of crude huts made of timber and stone and mud. Rubble work walls, for they needed little shelter here, and the people were but savages. asked Arcot, his voice a bit unsteady with suppressed excitement. replied Morey without turning from his station at the window. Below them now, less than half a mile down on the patchwork of the Nile valley, men were standing, staring up, collecting in little groups, gesticulating toward the strange thing that had materialized in the air above them. In the example above the following should be deleted only: exclaimed Wade. asked Arcot, his voice a bit unsteady with suppressed excitement. replied Morey without turning from his station at the window. A useful regex or simple Perl or python code is appreciated. I'm using version 7 of Textpipe. Thanks.

    Read the article

  • Python Regular Expressions: Capture lookahead value (capturing text without consuming it)

    - by Lattyware
    I wish to use regular expressions to split words into groups of (vowels, not_vowels, more_vowels), using a marker to ensure every word begins and ends with a vowel. import re MARKER = "~" VOWELS = {"a", "e", "i", "o", "u", MARKER} word = "dog" if word[0] not in VOWELS: word = MARKER+word if word[-1] not in VOWELS: word += MARKER re.findall("([%]+)([^%]+)([%]+)".replace("%", "".join(VOWELS)), word) In this example we get: [('~', 'd', 'o')] The issue is that I wish the matches to overlap - the last set of vowels should become the first set of the next match. This appears possible with lookaheads, if we replace the regex as follows: re.findall("([%]+)([^%]+)(?=[%]+)".replace("%", "".join(VOWELS)), word) We get: [('~', 'd'), ('o', 'g')] Which means we are matching what I want. However, it now doesn't return the last set of vowels. The output I want is: [('~', 'd', 'o'), ('o', 'g', '~')] I feel this should be possible (if the regex can check for the second set of vowels, I see no reason it can't return them), but I can't find any way of doing it beyond the brute force method, looping through the results after I have them and appending the first character of the next match to the last match, and the last character of the string to the last match. Is there a better way in which I can do this? The two things that would work would be capturing the lookahead value, or not consuming the text on a match, while capturing the value - I can't find any way of doing either.

    Read the article

  • Regular expression for finding non-breaking string names in code and then breaking them up for SQL q

    - by Rob Segal
    I am trying to devlop a regex for finding camel case strings in several code files I am working with so I can break them up into separate words for use in a SQL query. I have strings of the form... EmailAddress FirstName MyNameIs And I want them like this... Email Address First Name My Name Is An example SQL query which I currently have is... select FirstName, MyNameIs from MyTables I need the queries in the form... select FirstName as 'First Name', MyNameIs as 'My Name Is' from MyTables Any time a new capital letter appears that should be a new grouping which I can pick out of the matched string. I currently have the following regex... ([A-Z][a-z]+)+ Which does match the cases I have shown above but when I want to perform a replace I need to define groups. Currently I have tried... (([A-Z])([a-z]+))+ Which sort of works. It will pick out "Address" as the first grouping from "EmailAddress" as opposed to "Email" which is what I was expecting. No doubt there is something I'm misunderstanding here so any help is greatly appreciated.

    Read the article

  • Regular Expression Newbie

    - by Registered User
    I need to parse strings inputs where the columns are separated by columns and any field that contains a comma in the data is wrapped in quotes (commas separated, quoted text identifiers). For this project I need to remove the quotes and any commas that occur between pairs of quotes. Basically, I need to remove commas and quotes that are contained in fields while preserving the commas that are used to separate the fields. Here's a little code I put together that handles the simple scenario: // Sample input 1: This works and covers 99% of the records that I need to parse. string str1 = "[email protected],2010/03/27 12:2:02,,some_first_name,some_last_name,,\"This Address Works, Suite 200\",Some City,TN,09876-5432,9795551212x123,XYZ"; str1 = Regex.Replace(str1, "\"([^\"^,]*),([^\"^,]*)\"", "$1$2"); Console.WriteLine(str1); // Outputs: [email protected],2010/03/27 12:2:02,,some_first_name,some_last_name,,This Address Works Suite 200,Some City,TN,09876-5432,9795551212x123,XYZ Although this code works for most of my records, it doesn't work when a field contains more than one commas. What I would like to do is modify the code so that it remove each instance of a comma contained within the column no matter how many commas there are in the field. I don't want to hard code only handling 2 commas, or 3 commas, or 25 commas. The code should just remove all the commas in the field. Below is an example of what my code doesn't handle properly. // Sample input 2: This doesn't work since there is more than 1 comma between the quotes. string str2 = "[email protected],2010/03/27 12:2:02,,some_first_name,some_last_name,,\"i,l,k,e, c,o,m,m,a,s, i,n ,m,y, f,i,e,l,d\",Some City,TN,09876-5432,9795551212x123,XYZ"; str2 = Regex.Replace(str2, "\"([^\"^,]*),([^\"^,]*)\"", "$1$2"); Console.WriteLine(str2); // Desired output: [email protected],2010/03/27 12:2:02,,some_first_name,some_last_name,,i like commas in my field,Some City,TN,09876-5432,9795551212x123,XYZ Any help would be appreciated for this Regular Expression newbie.

    Read the article

  • Detect remote charset in php

    - by yallaa
    Hello, I would like to determine a remote page's encoding through detection of the Content-Type header tag <meta http-equiv="Content-Type" content="text/html; charset=XXXXX" /> if present. I retrieve the remote page and try to do a regex to find the required setting if present. I am still learning hence the problem below... Here is what I have: $EncStart = 'charset='; $EncEnd = '" \/\>'; preg_match( "/$EncStart(.*)$EncEnd/s", $RemoteContent, $RemoteEncoding ); echo = $RemoteEncoding[ 1 ]; The above does indeed echo the name of the encoding but it does not know where to stop so it prints out the rest of the line then most of the rest of the remote page in my test. Example: When testing a remote russian page it printed: windows-1251" / rest of page .... Which means that $EncStart was okay, but the $EncEnd part of the regex failed to stop the matching. This meta header usually ends in 3 different possibility after the name of the encoding. "> | "/> | " /> I do not know weather this is usable to satisfy the end of the maching and if yes how to escape it. I played with different ways of doing it but none worked. Thank you in advance for lending a hand.

    Read the article

  • C# Regex - Match and replace, Auto Increment

    - by Marc Still
    I have been toiling with a problem and any help would be appreciated. Problem: I have a paragraph and I want to replace a variable which appears several times (Variable = @Variable). This is the easy part, but the portion which I am having difficulty is trying to replace the variable with different values. I need for each occurrence to have a different value. For instance, I have a function that does a calculation for each variable. What I have thus far is below: private string SetVariables(string input, string pattern){ Regex rx = new Regex(pattern); MatchCollection matches = rx.Matches(input); int i = 1; if(matches.Count > 0) { foreach(Match match in matches) { rx.Replace(match.ToString(), getReplacementNumber(i)); i++ } } I am able to replace each variable that I need to with the number returned from getReplacementNumber(i) function, but how to I put it back into my original input with the replaced values, in the same order found in the match collection? Thanks in advance! Marcus

    Read the article

  • removing phone number from a document.

    - by Grant Collins
    Hi, I've got a challenge that I am hoping that the SO community is able to help me with. I trying to parse a lot of html documents in my PHP application to remove personal details, such as names, addresses and phone numbers. I can remove most of these details without too much trouble, however the phone number is a real problem for me. My idea is to take the text from these documents and the use a regex to identify the phone numbers and replace them with another value such as 'xxxx'. I've got 2 regex that I am using one for UK landline numbers and one for UK cell/mobile numbers. However when I try and run them against the text it just returns an empty string. I am using the following preg_replace code: $pattens = array( '/^(((\+44\s?\d{4}|\(?0\d{4}\)?)\s?\d{3}\s?\d{3})|((\+44\s?\d{3}|\(?0\d{3}\)?)\s?\d{3}\s?\d{4})|((\+44\s?\d{2}|\(?0\d{2}\)?)\s?\d{4}\s?\d{4}))(\s?\#(\d{4}|\d{3}))?$/', '/^(\+44\s?7\d{3}|\(?07\d{3}\)?)\s?\d{3}\s?\d{3}$/' ); $replace = array('xxxxx', 'xxxxx'); //do the search for the numbers. $updatedContents = preg_replace($pattens, $replace, $htmlContents); At the moment this is causing me a lot of head scratching as I thought that I had this nailed, but at the moment I can't see what's wrong?? I am sure that it is something really simple. Thanks, Grant

    Read the article

  • Regular expression test can't decide between true and false (JavaScript)

    - by nw
    I get this behavior in both Chrome (Developer Tools) and Firefox (Firebug). Note the regex test returns alternating true/false values: > var re = /.*?\bbl.*\bgr.*/gi; undefined > re /.*?\\bbl.*\\bgr.*/gi > re.test("Blue-Green"); true > re.test("Blue-Green"); false > re.test("Blue-Green"); true > re.test("Blue-Green"); false However, testing the same regex as a literal: > /.*?\bbl.*\bgr.*/gi.test("Blue-Green"); true > /.*?\bbl.*\bgr.*/gi.test("Blue-Green"); true > /.*?\bbl.*\bgr.*/gi.test("Blue-Green"); true > /.*?\bbl.*\bgr.*/gi.test("Blue-Green"); true I can't explain this and it's making debugging very difficult. Can anyone explain this behavior?

    Read the article

  • AWK: compare apache dates without using regular expression

    - by smallmeans
    I'm writing a loganalysis application and wanted to grab apache log records between two certain dates. Assume that a date is formated as such: 22/Dec/2009:00:19 (day/month/year:hour:minute) Currently, I'm using a regular expression to replace the month name with its numeric value, remove the separators, so the above date is converted to: 221220090019 making a date comparison trivial.. but.. Running a regex on each record for large files, say, one containing a quarter million records, is extremely costly.. is there any other method not involving regex substitution? Thanks in advance Edit: here's the function doing the convertion/comparison function dateInRange(t, from, to) { sub(/[[]/, "", t); split(t, a, "[/:]"); match("JanFebMarAprMayJunJulAugSepOctNovDec", a[2]); a[2] = sprintf("%02d", (RSTART + 2) / 3); s = a[3] a[2] a[1] a[4] a[5]; return s >= from && s <= to; } "from" and "to" are the intervals in the aforementioned format, and "t" is the raw apache log date/time field (e.g [22/Dec/2009:00:19:36)

    Read the article

  • C# comparing two files regex problem.

    - by Mike
    Hi everyone, what I'm trying to do is open a huge list of files (about 40k records, and match them on a line in a file that contains 2 millions records. And if my line from file A matches a line in file B write out that line. File A contains a bunch of files without extensions and file B contains full file paths including extensions. i'm using this but i cant get it to go... string alphaFilePath = (@"C:\Documents and Settings\g\Desktop\Arrp\Find\natst_ready.txt"); List<string> alphaFileContent = new List<string>(); using (FileStream fs = new FileStream(alphaFilePath, FileMode.Open)) using (StreamReader rdr = new StreamReader(fs)) { while (!rdr.EndOfStream) { alphaFileContent.Add(rdr.ReadLine()); } } string betaFilePath = @"C:\Documents and Settings\g\Desktop\Arryup\Find\eble.txt"; StringBuilder sb = new StringBuilder(); using (FileStream fs = new FileStream(betaFilePath, FileMode.Open)) using (StreamReader rdr = new StreamReader(fs)) { while (!rdr.EndOfStream) { string betaFileLine = rdr.ReadLine(); string matchup = Regex.Match(alphaFileContent, @"(\\)(\\)(\\)(\\)(\\)(\\)(\\)(\\)(.*)(\.)").Groups[9].Value; if (alphaFileContent.Equals(matchup)) { File.AppendAllText(@"C:\array_tech.txt", betaFileLine); } } } This doesnt work because the alphafilecontent is a single line only and i'm having a hard time figuring out how to get my regex to work on the file that contains all the file paths (Betafilepath) here is a sample of the beta file path. C:\arres_i\Grn\Ora\SEC\DBZ_EX1\Nes\001\DZO-EX00001.txt Here is the line i'm trying to compare from my alpha DZO-EX00001

    Read the article

  • need to clean malformed tags using regular expression

    - by Brian
    Looking to find the appropriate regular expression for the following conditions: I need to clean certain tags within free flowing text. For example, within the text I have two important tags: <2004:04:12 and . Unfortunately some of tags have missing "<" or "" delimiter. For example, some are as follows: 1) <2004:04:12 , I need this to be <2004:04:12> 2) 2004:04:12>, I need this to be <2004:04:12> 3) <John Doe , I need this to be <John Doe> I attempted to use the following for situation 1: String regex = "<\\d{4}-\\d{2}-\\d{2}\\w*{2}[^>]"; String output = content.replaceAll(regex,"$0>"); This did find all instances of "<2004:04:12" and the result was "<2004:04:12 ". However, I need to eliminate the space prior to the ending tag. Not sure this is the best way. Any suggestions. Thanks

    Read the article

  • C# : Regular Expression

    - by Pramodh
    I'm having a set of row data as follows List<String> l_lstRowData = new List<string> { "Data 1 32:01805043*0FFFFFFF", "Data 3, 20.0e-3", "Data 2, 1.0e-3 172:?:CRC" , "Data 6" }; and two List namely "KeyList" and "ValueList" like List<string> KeyList = new List<string>(); List<string> ValueList = new List<string>(); I need to fill the two List<String> from the data from l_lstRowData using Pattern Matching And here is my Pattern for this String l_strPattern = @"(?<KEY>(Data|data|DATA)\s[0-9]*[,]?[ ][0-9e.-]*)[ \t\r\n]*(?<Value>[0-9A-Za-z:?*!. \t\r\n\-]*)"; Regex CompiledPattern=new Regex(l_strPattern,RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace); So finally the two Lists will contain KeyList { "Data 1" } { "Data 3, 20.0e-3" } { "Data 2, 1.0e-3" } { "Data 6" } ValueList { "32:01805043*0FFFFFFF" } { "" } { "172:?:CRC" } { "" } Scenerio: The Group KEY in the Pattern Should match "The data followed by an integer value , and the if there exist a comma(,) then the next string i.e a double value The Group Value in the Pattern should match string after the whitespace.In the first string it should match 32:01805043*0FFFFFFF but in the 3rd 172:?:CRC. Here is my sample code for (int i = 0; i < l_lstRowData.Count; i++) { MatchCollection M = CompiledPattern.Matches(l_lstRowData[i], 0); KeyList.Add(M[0].Groups["KEY"].Value); ValueList.Add(M[0].Groups["Value"].Value); } But my Pattern is not working in this situation. Please help me to rewrite my Pattern.

    Read the article

  • How to remove words based on a word count

    - by Chris
    Here is what I'm trying to accomplish. I have an object coming back from the database with a string description. This description can be up to 1000 characters long, but we only want to display a short view of this. So I coded up the following, but I'm having trouble in actually removing the number of words after the regular expression finds the total count of words. Does anyone have good way of dispalying the words which are less than the Regex.Matches? Thanks! if (!string.IsNullOrEmpty(myObject.Description)) { string original = myObject.Description; MatchCollection wordColl = Regex.Matches(original, @"[\S]+"); if (wordColl.Count < 70) // 70 words? { uxDescriptionDisplay.Text = string.Format("<p>{0}</p>", myObject.Description); } else { string shortendText = original.Remove(200); // 200 characters? uxDescriptionDisplay.Text = string.Format("<p>{0}</p>", shortendText); } }

    Read the article

  • [^.] causing headache in RewriteRule

    - by Ollie2893
    I am struggling with a very basic regex problem in my .htaccess file that I hope someone may be able to shed some light on. The basic premise is that I would like to teach Apache to switch any .html extension into a .var extension. I had thought that the rule would be positively trivial: RewriteRule ^([^.]+)\.html$ $1.var But the [^.] part simply doesn't work. Bizarrely, it works like so RewriteRule ^([^A-Z]+)\.html$ $1.var I do not understand why this latter rule works. Assume I am looking for a file called "index.html" then $1 should match to "index." and the ".html" bit should actually fail to match. To widen the scope of the question slightly, I am actually racking my brain on how to implement a multi-lingual site. I don't like Apache's MultiView option because it forces upon me a flat directory structure with file extensions that aren't recognizable to many development tools. I could go the .var type-map route but am finding that the default config for Apache doesn't support this all that well either (hence my excursions into regex land). So while I am using mod_rewrite, I am thinking that I might go the whole hog: whenever a request for a name.html file is received and this file does not exist, check whether there exists a XX/name.html file instead, where "XX" is the language code according to the user's preferences. This would give me a neater directory structure, though it does perhaps not perform as well as the .var approach in a situation where the language preference of the user's browser is not supported in by my site (in which situation .var would substitute EN or similar). Any thoughts? Thanks.

    Read the article

  • Grab two parts of a single, short string

    - by TankorSmash
    I'm looking to fill a python dict with TAG:definition pairs, and I'm using RegExr http://gskinner.com/RegExr/ to write the regex My first step is to parse a line, from http://www.id3.org/id3v2.3.0, or http://pastebin.com/VJEBGauL and pull out the ID3 tag and the associated definition. For example the first line: 4.20 AENC [#sec4.20 Audio encryption] would look like this myDict = {'AENC' : 'Audio encryption'} To grab the tag name, I've got it looking for at least 3 spaces, then 4 characters, then 4 spaces: {3}[a-zA-Z0-9]{4} {4} That part is easy enough. The second part, the definition, is not working out for me. So far, I've got (?<=(\[#.+?)) A Which should find, but not include the [# as well as an indeterminded set of characters until it finds: _A, but it's failing. If I remove .+? and replace _A with s it works out alright. What is going wrong? *The underscores represent spaces, which don't show up on SO. How do I grab the definition, ie,(Audio encryption) of the ID3v2 tag from the line, using RegEx?

    Read the article

  • Simple syntax question

    - by stabby
    Hey everyone, First off, sorry for my noob-ness. Believe me when i say ive been rtfm'ing. Im not lazy, im just dumb (apparently). On the bright side, this could earn someone some easy points here. I'm trying to do a match/replace with a pattern that contains special characters, and running into syntax errors in a Flex 3 app. I just want the following regex to compile... (while also replacing html tags with "") value.replace(/</?\w+((\s+\w+(\s*=\s*(?:".*?"|'.*?'|[^'">\s]+))?)+\s*|\s*)/?>/g, ""); On a side note, the pattern /<.*?/g wouldn't work in cases where there are html entities between tags, like so: <TEXTFORMAT LEADING="2"> <P ALIGN="LEFT"> <FONT FACE="Arial" SIZE="11" COLOR="#4F4A4A" LETTERSPACING="0" KERNING="0"><one</FONT> </P> </TEXTFORMAT><TEXTFORMAT LEADING="2"> <P ALIGN="LEFT"> <FONT FACE="Arial" SIZE="11" COLOR="#4F4A4A" LETTERSPACING="0" KERNING="0">two</FONT> </P> </TEXTFORMAT> The first regex would get both "<one" and "two", but the second would only get "hi" Thanks! Stabby L

    Read the article

  • PHP form validation function

    - by Barbs
    I am currently writing some PHP form validation (I have already validated clientside) and have some repetitive code that I think would work well in a nice little PHP function. However I have having trouble getting it to work. I'm sure it's just a matter of syntax but I just can't nail it down. Any help appreciated. //Validate phone number field to ensure 8 digits, no spaces. if(0 === preg_match("/^[0-9]{8}$/",$_POST['Phone']) { $errors['Phone'] = "Incorrect format for 'Phone'"; } if(!$errors) { //Do some stuff here.... } I found that I was writing the validation code a lot and I could save some time and some lines of code by creating a function. //Validate Function function validate($regex,$index,$message) { if(0 === preg_match($regex,$_POST[$index],$message) { $errors[$index] = $message; } And call it like so.... validate("/^[0-9]{8}$/","Phone","Incorrect format for Phone"); Can anyone see why this wouldn't work? Note I have disabled the client side validation while I work on this to try to trigger the error, so the value I am sending for 'Phone' is invalid.

    Read the article

  • Why is this exception thrown when trying to match this regex in Java?

    - by Scott Ferguson
    I'm trying to match a specific string out of a an HTML document and have this regex pattern to grab it: Pattern somePattern = Pattern.compile("var json = ({\"r\":\"^d1\".*});"); However when I try to hit that code at runtime, I get this error: FATAL EXCEPTION: Timer-0 java.util.regex.PatternSyntaxException: Syntax error U_REGEX_RULE_SYNTAX near index 13: var json = ({"r":"^d1".*}); ^ at com.ibm.icu4jni.regex.NativeRegEx.open(Native Method) at java.util.regex.Pattern.compileImpl(Pattern.java:383) at java.util.regex.Pattern.<init>(Pattern.java:341) at java.util.regex.Pattern.compile(Pattern.java:317) Can anybody tell me what I'm doing wrong?

    Read the article

  • i need some help with my vb.net codes..plzz

    - by akmalizhar
    currently i need to develop an application that can exctract information from few website.. this is what i have done up until now.. Imports System Imports System.Text.RegularExpressions Imports System.IO Imports System.Net Imports System.Web Imports System.Data.SqlClient Imports System.Threading Imports System.Data.DataSet Imports System.Data.OleDb Module module1 Dim url As String Dim hotelName As String = "" Sub Main() Dim url As String = "" Console.Write("enter url: ") url = Console.ReadLine() extractor(url) End Sub Public Sub extractor(ByVal url As String) Dim strConn As String = "Data Source = localhost; Initial Catalog = knowledgeBase; Integrated Security = True; Connection Timeout = 0;" Dim conn As SqlConnection = New SqlConnection(strConn) conn.Open() Dim strSQL1 As String Dim matchStn1 As String = "" Dim matchstn2 As String = "" Dim matchstn3 As String = "" Dim matchstn4 As String = "" Dim matchstn5 As String = "" Dim matchstn6 As String = "" Dim matchstn7 As String = "" Dim matchstn8 As String = "" Dim matchstn9 As String = "" Dim matchstn10 As String = "" Dim objRequest As WebRequest = HttpWebRequest.Create(url) Dim objResponse As WebResponse = objRequest.GetResponse() Dim objStreamReader As New StreamReader(objResponse.GetResponseStream()) Dim strpage As String = objStreamReader.ReadToEnd Dim RegExStr As String = "<[^>]*>" Dim R As New Regex(RegExStr) Dim sourcestring As String = strpage Dim re As Regex = New Regex("<h2 class=""name hotel""[^>]*>[\s\S]+?</h2>") Dim mc As MatchCollection = re.Matches(sourcestring) Dim mIdx As Integer = 0 For Each m As Match In mc For groupIdx As Integer = 0 To m.Groups.Count - 1 matchStn1 = m.Groups(groupIdx).Value matchStn1 = R.Replace(matchStn1, " ") matchStn1 = matchStn1.Trim() Next mIdx = mIdx + 1 Next Dim re9 As Regex = New Regex("<li class=""cuisine""[^>]*>[^>]+</li>") Dim mc9 As MatchCollection = re9.Matches(sourcestring) Dim mIdx9 As Integer = 0 For Each m As Match In mc9 For groupIdx As Integer = 0 To m.Groups.Count - 1 matchstn9 = m.Groups(groupIdx).Value matchstn9 = R.Replace(matchstn9, " ") matchstn9 = matchstn9.Trim() Next mIdx = mIdx + 1 Next Dim re2 As Regex = New Regex("<span class=""street-address""[^>]*>[^>]+</span>") Dim mc2 As MatchCollection = re2.Matches(sourcestring) Dim mIdx2 As Integer = 0 For Each m As Match In mc2 For groupIdx As Integer = 0 To m.Groups.Count - 1 matchstn2 = m.Groups(groupIdx).Value matchstn2 = R.Replace(matchstn2, " ") matchstn2 = matchstn2.Trim() Next mIdx2 = mIdx2 + 1 Next Dim re3 As Regex = New Regex("<span class=""locality""[^>]*>[\s\S]+?</span>") Dim mc3 As MatchCollection = re3.Matches(sourcestring) Dim mIdx3 As Integer = 0 For Each m As Match In mc3 For groupIdx As Integer = 0 To m.Groups.Count - 1 matchstn3 = m.Groups(groupIdx).Value matchstn3 = R.Replace(matchstn3, " ") matchstn3 = matchstn3.Trim() Next mIdx3 = mIdx3 + 1 Next Dim re4 As Regex = New Regex("<span property=""v:postal-code""[^>]*>[\s\S]+?</span>") Dim mc4 As MatchCollection = re4.Matches(sourcestring) Dim mIdx4 As Integer = 0 For Each m As Match In mc4 For groupIdx As Integer = 0 To m.Groups.Count - 1 matchstn4 = m.Groups(groupIdx).Value matchstn4 = R.Replace(matchstn4, " ") matchstn4 = matchstn4.Trim() Next mIdx4 = mIdx4 + 1 Next Dim re5 As Regex = New Regex("<span class=""country-name""[^>]*>[\s\S]+?</span>") Dim mc5 As MatchCollection = re5.Matches(sourcestring) Dim mIdx5 As Integer = 0 For Each m As Match In mc5 For groupIdx As Integer = 0 To m.Groups.Count - 1 matchstn5 = m.Groups(groupIdx).Value matchstn5 = R.Replace(matchstn5, " ") matchstn5 = matchstn5.Trim() Next mIdx5 = mIdx5 + 1 Next Dim re10 As Regex = New Regex("<address class=""adr""[^>]*>[\s\S]+?</address>") Dim mc10 As MatchCollection = re10.Matches(sourcestring) Dim mIdx10 As Integer = 0 For Each m As Match In mc10 For groupIdx As Integer = 0 To m.Groups.Count - 1 matchstn10 = m.Groups(groupIdx).Value matchstn10 = R.Replace(matchstn10, " ") matchstn10 = matchstn10.Trim() strSQL1 = "insert into infoRestaurant (nameRestaurant, cuisine, streetAddress, locality, postalCode, countryName, addressFull, tel, attractionType) values (N" & _ FormatSqlParam(matchStn1) & ",N" & _ FormatSqlParam(matchstn9) & ",N" & _ FormatSqlParam(matchstn2) & ",N" & _ FormatSqlParam(matchstn3) & ",N" & _ FormatSqlParam(matchstn4) & ",N" & _ FormatSqlParam(matchstn5) & ",N" & _ FormatSqlParam(matchstn10) & ",N" & _ FormatSqlParam(matchstn6) & ",N" & _ FormatSqlParam(matchstn7) & ")" Dim objCommand1 As New SqlCommand(strSQL1, conn) objCommand1.ExecuteNonQuery() Next mIdx4 = mIdx4 + 1 Next Dim re6 As Regex = New Regex("<span class=""tel""[^>]*>[\s\S]+?</span>") Dim mc6 As MatchCollection = re6.Matches(sourcestring) Dim mIdx6 As Integer = 0 For Each m As Match In mc6 For groupIdx As Integer = 0 To m.Groups.Count - 1 matchstn6 = m.Groups(groupIdx).Value matchstn6 = R.Replace(matchstn6, " ") matchstn6 = matchstn6.Trim() Next mIdx6 = mIdx6 + 1 Next Dim re7 As Regex = New Regex("<div><b>Attraction type:[^>]*>[\s\S]+?</div>") Dim mc7 As MatchCollection = re7.Matches(sourcestring) Dim mIdx7 As Integer = 0 For Each m As Match In mc7 For groupIdx As Integer = 0 To m.Groups.Count - 1 matchstn7 = m.Groups(groupIdx).Value matchstn7 = R.Replace(matchstn7, " ") matchstn7 = matchstn7.Trim() Next mIdx7 = mIdx7 + 1 Next Dim re8 As Regex = New Regex("(?=<p id).*(?<=</p>)") Dim mc8 As MatchCollection = re8.Matches(sourcestring) Dim mIdx8 As Integer = 0 For Each m As Match In mc8 For groupIdx As Integer = 0 To m.Groups.Count - 1 matchstn8 = m.Groups(groupIdx).Value matchstn8 = R.Replace(matchstn8, " ") matchstn8 = matchstn8.Trim() Dim strSQL2 As String = "insert into feedBackRestaurant (feedBackView) values(N" + FormatSqlParam(matchstn8) + ")" Dim objCommand2 As New SqlCommand(strSQL2, conn) objCommand2.ExecuteNonQuery() Next mIdx8 = mIdx8 + 1 Next objStreamReader.Close() conn.Close() End Sub Public Function FormatSqlParam(ByVal strParam As String) As String Dim newParamFormat As String If strParam = String.Empty Then newParamFormat = "'" & "NA" & "'" Else newParamFormat = strParam.Trim() newParamFormat = "'" & newParamFormat.Replace("'", "''") & "'" End If Return newParamFormat End Function End Module ---problems-- problem that i face are 1. the database foreign key is not working here..someone told me that need some codes to be added..but i dunno how. 2. the data repeats as i run the application. i guest it require update database function.but i hv no idea how. 3. i have to add in multithreading function as well..and last, how to make my application is flexible eventhough the HTML code changes..can anyone help me??plzzz website that i need to extract is http://www.tripadvisor.com/Tourism-g293951-Malaysia-Vacations.html i need the information about hotel, restaurant and attraction place..plzz..i need some help here..

    Read the article

< Previous Page | 55 56 57 58 59 60 61 62 63 64 65 66  | Next Page >