RegEx to ignore / skip everything in html tags
- by Scott Sumpter
Looking for a way to combine two Regular Expressions. One to catch the urls and the other to ensure is skips text within html tags. See sample text below functions.
Need to pass a block of news text and format text by wrapping urls and email addresses in html tags so users don't have to. The below code works great until there are already html tags within the text. In that case it doubles the html tags.
There are plenty of examples to strip html, but I want to just ignore it since the url is already linkified. Also - if there is an easier was to accomplish this, with or without Regex, please let me know. none of my attempts to combine Regexs have worked.
coding in ASP.NET VB but will take any workable example/direction.
Thanks!
===== Functions =============
Public Shared Function InsertHyperlinks(ByVal inText As String) As String
Dim strBuf As String
Dim objMatches As Object
Dim iStart, iEnd As Integer
strBuf = ""
iStart = 1
iEnd = 1
Dim strRegUrlEmail As String = "\b(www|http|\S+@)\S+\b"
'RegEx to find urls and email addresses
Dim objRegExp As New Regex(strRegUrlEmail, RegexOptions.IgnoreCase)
'Match URLs and emails
Dim MatchList As MatchCollection = objRegExp.Matches(inText)
If MatchList.Count <> 0 Then
objMatches = objRegExp.Matches(inText)
For Each Match In MatchList
iEnd = Match.Index
strBuf = strBuf & Mid(inText, iStart, iEnd - iStart + 1)
If InStr(1, Match.Value, "@") Then
strBuf = strBuf & HrefGet(Match.Value, "EMAIL", "_BLANK")
Else
strBuf = strBuf & HrefGet(Match.Value, "WEB", "_BLANK")
End If
iStart = iEnd + Match.Length + 1
Next
strBuf = strBuf & Mid(inText, iStart)
InsertHyperlinks = strBuf
Else
'No hyperlinks to replace
InsertHyperlinks = inText
End If
End Function
Shared Function HrefGet(ByVal url As String, ByVal urlType As String, ByVal Target As String) As String
Dim strBuf As String
strBuf = "<a href="""
If UCase(urlType) = "WEB" Then
If LCase(Left(url, 3)) = "www" Then
strBuf = "<a href=""http://" & url & """ Target=""" & _
Target & """>" & url & "</a>"
Else
strBuf = "<a href=""" & url & """ Target=""" & _
Target & """>" & url & "</a>"
End If
ElseIf UCase(urlType) = "EMAIL" Then
strBuf = "<a href=""mailto:" & url & """ Target=""" & _
Target & """>" & url & "</a>"
End If
HrefGet = strBuf
End Function
===== Sample Text =============
This would be the inText parameter.
Midway through the ride, we see a Skip this too. But sometimes we go here [insert normal www dot link dot com]. If you'd like to join us contact Bill Smith at [email protected]. Thanks!
sorry stack overflow won't allow multiple hyperlinks to be added.
===== End Sample Text =============