0

I have a function pulled from here. My problem is that I don't know what RegEx pattern I need to use to split out the following data:

+1 vorpal unholy longsword +31/+26/+21/+16 (2d6+13)
+1 vorpal flaming whip +30/+25/+20 (1d4+7 plus 1d6 fire and entangle)
2 slams +31 (1d10+12)

I want it to look like:

+1 vorpal unholy longsword, 31 
+1 vorpal flaming whip, 30 
2 slams, 31

Here is the VBA code that does the RegExp validation:

Public Function RXGET(ByRef find_pattern As Variant, _
                        ByRef within_text As Variant, _
                        Optional ByVal submatch As Long = 0, _
                        Optional ByVal start_num As Long = 0, _
                        Optional ByVal case_sensitive As Boolean = True) As Variant
' RXGET - Looks for a match for regular expression pattern find_pattern
' in the string within_text and returns it if found, error otherwise.
' Optional long submatch may be used to return the corresponding submatch
' if specified - otherwise the entire match is returned.
' Optional long start_num specifies the number of the character to start
' searching for in within_text. Default=0.
' Optional boolean case_sensitive makes the regex pattern case sensitive
' if true, insensitive otherwise. Default=true.

Dim objRegex As VBScript_RegExp_55.RegExp
Dim colMatch As VBScript_RegExp_55.MatchCollection
Dim vbsMatch As VBScript_RegExp_55.Match
Dim colSubMatch As VBScript_RegExp_55.SubMatches
Dim sMatchString As String

Set objRegex = New VBScript_RegExp_55.RegExp

' Initialise Regex object
With objRegex
    .Global = False
    ' Default is case sensitive
    If case_sensitive Then
        .IgnoreCase = False
    Else: .IgnoreCase = True
    End If
    .pattern = find_pattern
End With

' Return out of bounds error
If start_num >= Len(within_text) Then
    RXGET = CVErr(xlErrNum)
    Exit Function
End If
sMatchString = Right$(within_text, Len(within_text) - start_num)

' Create Match collection
Set colMatch = objRegex.Execute(sMatchString)
If colMatch.Count = 0 Then ' No match
    RXGET = CVErr(xlErrNA)
Else
    Set vbsMatch = colMatch(0)
    If submatch = 0 Then ' Return match value
        RXGET = vbsMatch.Value
    Else
        Set colSubMatch = vbsMatch.SubMatches ' Use the submatch collection
        If colSubMatch.Count < submatch Then
            RXGET = CVErr(xlErrNum)
        Else
            RXGET = CStr(colSubMatch(submatch - 1))
        End If
    End If
End If
End Function
4
  • 1
    A few things for clarification. Do you have 3 (or more) strings or one multiline string? In the latter case, do you want a multiline string back, or are you good with a bunch of single line strings (one for each match)? Is it okay if you get every of those lines in the result as one string, or do you need the 3 separate parts (first number, name, last number)? Commented Sep 28, 2012 at 17:03
  • The first grey code block is one long line with a comma, and or or separating the information. i.e. cell B2 The second grey code block should be in three subsequent cells. i.e. C2, D2 and E2. Commented Sep 28, 2012 at 17:10
  • Here is what I have currently, but it gets me the first + and the second. =PERSONAL.xlsb!RXGET("([A-Za-z0-9\+{1} ]+)",CX2,1,0,FALSE) Commented Sep 28, 2012 at 17:23
  • You don't need to escape + within character classes. Also, I don't think that the {1} does any good in there. Have a look at my attempt. If it doesn't work, let me know. Then it's something Excel specific and I need to figure out the specifics of your regex flavor. Commented Sep 28, 2012 at 17:30

1 Answer 1

3

I don't know about Excel but this should get you started on the RegEx:

/(?:^|, |and |or )(\+?\d?\s?[^\+]*?) (?:\+|-)(\d+)/

NOTE: There is a slight caveat here. This will also match if an element begins with + only (not being followed by a digit).

Capture groups 1 and 2 contain the strings that go left and right of your comma (if the whole pattern has index 0). So you can something like capture[1] + ', ' + capture[2] (whatever your syntax for that is).

Here is an explanation of the regex:

/(?:^|, |and |or )         # make sure that we only start looking after
                           # the beginning of the string, after a comma, after an
                           # and or after an or; the "?:" makes sure that this
                           # subpattern is not capturing
 (\+?                      # a literal "+"
 \d+                       # at least one digit
                           # a literal space
 [^+]*?)                   # arbitrarily many non-plus characters; the ? makes it
                           # non-greedy, otherwise it might span multiple lines
                           # a literal space
 \+                        # a literal "+"
 (\d+)/                    # at least one digit (and the brakets are for capturing)
Sign up to request clarification or add additional context in comments.

20 Comments

I receive an error with the above RegEx. Here is the allowed literals for VBAScript RegEx. msdn.microsoft.com/en-us/library/…
try removing the ?: at the beginning. however, the capturing groups you're interested in, would then be 2 and 3 instead of 1 and 2, respectively
"an error"? Do you have something more specific by any chance?
your input does look like this, doesn't it? +1 vorpal unholy longsword +31/+26/+21/+16 (2d6+13) and +1 vorpal flaming whip +30/+25/+20 (1d4+7 plus 1d6 fire and entangle), 2 slams +31 (1d10+12)
this is getting gradually more complicated. next time you ask a regex question, I suggest that you make a very extensive list of possible inputs, make clear what the variables and what the fixed parts are, and show us what you want as the output. also include these tiny details like how your parts are actually separated (that there are 'and' and 'or' and ',' in between is pretty essential). take this as an example: stackoverflow.com/questions/12608152/…
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.