1

I am looking to split a cell after the first instance of a word with multiple caps.

Example 1:

Input1: Floor 4 InformatiqueNoosavilleSep

Desired Output1: NoosavilleSep

Pattern: The split should occurs at the second instance of uppercase in the last word. "InformatiqueNoosavilleSep"

Example 2:

Input: Floor 13 InformatiqueSurfers ParadiseSep

Output: Surfers ParadiseSep

Pattern: The split shouldn't occurs at the last word but on "InformatiqueSurfers" instead.

The problem: The pattern to find where to split the word differs from a cell to another.

What we know:

1: If the last word contains three uppercase letter it's always at this word that we want to split the string.Example1

2: If the last word contains only two uppercase letter "ParadiseSep" we have to split the string on the the word before it. Example2

I found this code which allow to split string with caps lock and add a spaces.

Dim objRegex As Object
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
    .Global = True
    .Pattern = "([a-z])([A-Z])"
    SplitCaps = .Replace(strIn, "$1 $2")
End With
End Function

As I am trying to learn VBA this Regex is getting a bit out of my realm.

Thanks for reading!

6
  • This seems like more of a parsing problem than a regex one, although I'm no expert. In other words: start from the end and find the "word". If it has three caps then split there. Else find the next to last word and split there. Commented Dec 3, 2017 at 0:07
  • maybe something like this regex101.com/r/aW2enR/3. Looking at your original question, it seems like the requirements may keep changing as you encounter different edge cases Commented Dec 3, 2017 at 0:08
  • Is another way of defining it that the new value is just the part of the cell value after "Informatique "? Commented Dec 3, 2017 at 0:13
  • So in your given example 2, the output is correct (what you want) or incorrect? If it is incorrect, could you tell us what your desired output for example 2 is? Commented Dec 3, 2017 at 0:23
  • Thanks Slai with the great explanation of Peter and your idea I learned a lot, Commented Dec 3, 2017 at 2:02

2 Answers 2

3

You are looking for the first instance of a word with multiple caps. Let's build a regular expression to find that as a first step.

You are looking for uppercase letters: [A-Z]

They must be in the same word, so between the two uppercase letters you can only have lowercase letters: [a-z]

There can be zero or more lowercase letters between the 2 uppercase letters: [A-Z][a-z]*[A-Z]

Anything after this occurrence should be part of the result, so we are also interest in the remainder of this string. This can be zero or more occurrences of any character: [A-Z][a-z]*[A-Z].*

Before this occurrence, we also can have zero or more occurrences of any character. But we want to find the first instance, so we will have to make that part of our regular expression "non-greedy" by appending a ? to it: .*?[A-Z][a-z]*[A-Z].*

Now we have completed a regular expression to find what you were looking for.


Second step: You want to split the word starting at the second uppercase letter within the found pattern. So we put parentheses (round brackets) starting just before that uppercase letter and after the end of the expression: .*?[A-Z][a-z]*([A-Z].*)


Third step: You want to retrieve the part of the string which you isolated in the second step using the parentheses. We do that by using $1, meaning the contents of the first set of parentheses in the regular expression: .Replace(inputString, "$1")


If we put this in a vba function, we would get something like:

Function SplitMultipleCaps(inputString As String)
    Dim objRegex As Object
    Set objRegex = CreateObject("vbscript.regexp")
    With objRegex
        .Global = False
        .Pattern = ".*?[A-Z][a-z]*([A-Z].*)"
        SplitMultipleCaps = .Replace(inputString, "$1")
    End With
End Function

Note: I put .Global = False because you only want to do it once. The function might also work with .Global = True, but I felt that False was more appropriate in this case.

Sign up to request clarification or add additional context in comments.

1 Comment

Great explanation!
0

Try this:

Function afterFirstUpperCaseWord(strIn As String)
    Dim objRegex As Object
    Set objRegex = CreateObject("vbscript.regexp")
    With objRegex
        .Global = True
        .Pattern = "\w+ \d+ [A-Z][a-z]+ ?(.*)"
        afterFirstUpperCaseWord = .Replace(strIn, "$1")
    End With
End Function

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.