Extract and replace named group regex
Posted
by
user557670
on Stack Overflow
See other posts from Stack Overflow
or by user557670
Published on 2010-12-29T22:52:08Z
Indexed on
2010/12/29
22:53 UTC
Read the original article
Hit count: 182
I was able to extract href value of anchors in an html string. Now, what I want to achieve is extract the href value and replace this value with a new GUID. I need to return both the replaced html string and list of extracted href value and it's corresponding GUID.
Thanks in advance.
My existing code is like:
Dim sPattern As String = "<a[^>]*href\s*=\s*((\""(?<URL>[^\""]*)\"")|(\'(?<URL>[^\']*)\')|(?<URL>[^\s]* ))"
Dim matches As MatchCollection = Regex.Matches(html, sPattern, RegexOptions.IgnoreCase Or RegexOptions.IgnorePatternWhitespace)
If Not IsNothing(matches) AndAlso matches.Count > 0 Then
Dim urls As List(Of String) = New List(Of String)
For Each m As Match In matches
urls.Add(m.Groups("URL").Value)
Next
End If
Sample HTML string:
<html><body><a title="http://www.google.com" href="http://www.google.com">http://www.google.com</a><br /><a href="http://www.yahoo.com">http://www.yahoo.com</a><br /><a title="http://www.apple.com" href="http://www.apple.com">Apple</a></body></html>
© Stack Overflow or respective owner