0

I am looking for help creating a regular expression so that I can replace text with an anchor tag. The text is coming from a SQL field (VarChar(max)) and is formatted so:

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua (1954, c. 12; 1968, c. 300; 1994, c. 98)

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua (1998, cc. 553, 568; 2001, c. 300)

In the above text, I need to replace all chapters after 1994 with anchor tags. So for example, 98, 553, 568 and 300 would all be replaced. The following code finds the entire text of 1994, c.98 for example, but I'm not sure how I would replace just the "98" in that text.

Public Shared Function ReplaceChapterTag1(lang As String) As String
    Dim l As String = lang
    Dim r As Regex = New Regex("199[4-9][/,][/ ][/c]*[/.][/ ][0-9]+(?:\.[0-9]*)?")

    Dim applyEvaluator As MatchEvaluator = New MatchEvaluator(AddressOf applyCodeLink)
    l = r.Replace(l, applyEvaluator)

    Return l

End Function

Private Shared Function applyCodeLink(ByVal m As Match) As String
    Dim r As Regex = New Regex("^[0-9]*[\-][0-9]*")
    Dim str As String = m.ToString
    Dim strReturn As String = ""

    Dim match As Match = r.Match(str)
    If match.Success Then
        strReturn = str
    Else
        strReturn = "<a href='link?id=" & m.Value & "'>" & m.Value & "</a>"
    End If

    Return strReturn
End Function

1 Answer 1

0

Solution

I'm not sure how I would replace just the "98" in that text.

You can use Regex.Replace. However, the regex you have built need to be tuned like this:

(?<=199[4-9][^;]+)(?<=[/c]*[/.][/\x20]|,\x20)(\d+(?:\.\d*)?)(?=[,;)])

Description

Regular expression visualization

Sample code

' Input
Dim InputText As String = "..." ' Lorem ipsum...

' Regex
Dim r As Regex = New Regex( _
      "(?<=199[4-9][^;]+)" + _
      "(?<=[/c]*[/.][/\x20]|,\x20)" + _
      "(\d+(?:\.\d*)?)" + _
      "(?=[,;)])", _
    RegexOptions.IgnoreCase _
    Or RegexOptions.CultureInvariant _
    Or RegexOptions.Compiled _
    )

' This is the replacement string
Dim Replacement As String = "<a href='link?id=$1'>$1</a>"

'' Replace the matched text in the InputText using the replacement pattern
Dim Result As String = r.Replace(InputText,Replacement)

Input

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua (1954, c. 12; 1968, c. 300; 1994, c. 98)

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua (1998, cc. 553, 568; 2001, cc. 17, 300)

Output

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua (1954, c. 12; 1968, c. 300; 1994, c. <a href='link?id=98'>98</a>)

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua (1998, cc. <a href='link?id=553'>553</a>, <a href='link?id=568'>568</a>; 2001, cc. 17, 300)

Discussion

Basically, the idea behind the tuned regex in my answer is to look for one or more digit(s) (\d+) that are preceded AND followed by some characters.

I took the liberty to simplify and make clearer the initial regexp. Mainly, I replaced:

  • [0-9] with \d
  • (space char) with \x20
Sign up to request clarification or add additional context in comments.

3 Comments

Alex, thanks for your fine work, you've gotten me 95% of the way there!
Alex, wanted to add a little: 1986, c. 617 1989, c. 119 (§§ 2.02, 9.03) 1993, c. 3 (§ 2.02) 1994, cc. 129, 239 (§§ 12.02, 12.05 [added]) 2003, c. 873 (§§ 11.04, 11.05 [added]) Lorem ipsum dolor sit amet, consectetur (1986, c. 617; 1994, cc. 129, 239) Lorem ipsum dolor sit amet, consectetur (1986, c. 617; 2003, c. 873) --Finds a match on the first 1994, cc. 129, 239 and the 2003, c. 873. But then it matches on the 1986, c. 617 in the first and second paragraph (and the 1994 value in the first paragraph), but then does not match on the 2003, c. 873 at the end of the 2nd paragraph.
@JohnS.Warrick Post your comment as a new question.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.