Extract and replace named group regex

Posted by user557670 on Stack Overflow See other posts from Stack Overflow or by user557670
Published on 2010-12-29T22:52:08Z Indexed on 2010/12/29 22:53 UTC
Read the original article Hit count: 180

Filed under:
|
|

I was able to extract href value of anchors in an html string. Now, what I want to achieve is extract the href value and replace this value with a new GUID. I need to return both the replaced html string and list of extracted href value and it's corresponding GUID.

Thanks in advance.

My existing code is like:

Dim sPattern As String = "<a[^>]*href\s*=\s*((\""(?<URL>[^\""]*)\"")|(\'(?<URL>[^\']*)\')|(?<URL>[^\s]* ))"

Dim matches As MatchCollection = Regex.Matches(html, sPattern, RegexOptions.IgnoreCase Or RegexOptions.IgnorePatternWhitespace)

If Not IsNothing(matches) AndAlso matches.Count > 0 Then
    Dim urls As List(Of String) = New List(Of String)

    For Each m As Match In matches
      urls.Add(m.Groups("URL").Value)
    Next
End If

Sample HTML string:

<html><body><a title="http://www.google.com" href="http://www.google.com">http://www.google.com</a><br /><a href="http://www.yahoo.com">http://www.yahoo.com</a><br /><a title="http://www.apple.com" href="http://www.apple.com">Apple</a></body></html>

© Stack Overflow or respective owner

Related posts about .NET

Related posts about regex