0

I have a very large "work request" data set that I need to clean up. The data set has some consistent elements, a series of numbers that are a set length this changes about about half way through the data set but the change is predictable. One issue with the data set is that there are multiple deliminators in places, sometimes no deliminator, sometimes text in front etc. I pulled a sample of the variables that I am dealing with and separated them manually to show the desired result.

+----+--------------------------------+------------+--------+----------------------+
|    |               A                |     B      |   C    |          D           |
+----+--------------------------------+------------+--------+----------------------+
|  1 | Work Request                   | Cell 1     | Cell 2 | Cell 3               |
|  2 | 2097947.A                      | 2097947    | A      |                      |
|  3 | 2590082.A/4900 REPLACE DXAC    | 2590082    | A      | 4900 Replace DXAC    |
|  4 | 2679314.C                      | 2679314    | C      |                      |
|  5 | 2864142B/DEMOLISH STRUCTURES   | 2864142    | B      | DEMOLISH STRUCTURES  |
|  6 | 3173618                        | 3173618    |        |                      |
|  7 | 3251628/4800 REPLACE ASPHALT   | 3251628    |        | 4800 REPLACE ASPHALT |
|  8 | 4109066A                       | 4109066    | A      |                      |
|  9 | 4374312D                       | 4374312    | D      |                      |
| 10 | 4465402, Building 4100         | 4465402    |        | Building 4100        |
| 11 | 4881715 DESIGN                 | 4881715    |        | DESIGN               |
| 12 | 4998608\                       | 4998608    |        |                      |
| 13 | ADMIN                          | ADMIN      |        |                      |
| 14 | PGM MGMT                       | PGM MGMT   |        |                      |
| 15 | FWR # 4958989 /Bldg 4000       | 4958989    |        | Bldg 4000            |
| 16 | NICC FEDISR000744416/4000 UPS  | R000744416 |        | 4000 UPS             |
| 17 | R000451086/4300 MODS TO RM5006 | R000451086 |        | 4300 MODS TO RM5006  |
+----+--------------------------------+------------+--------+----------------------+

As you can see there are a few predictable variables and some that are user input errors. Notice that in some cases the numbers have a single character behind the 7 digit work request number most of the time separated by a "." but sometimes no separation as in A8 and A9. Sometime there are deliminators, "/" or "space", or "," but this isn't consistent. I am currently working with a VBA that manages to strip the numbers for some but fails when it encounters no numbers or extra numbers. Eventual the work request numbers were changed to add the R00 this is the "new" number and over half of the data uses this in some form.

The VBA that I am using:

Option Explicit
Public Function Strip(ByVal x As String, LeaveNums As Boolean) As Variant
Dim y As String, z As String, n As Long
    For n = 1 To Len(x)
        y = Mid(x, n, 1)
        If LeaveNums = False Then
            If y Like "[A-Za-z ]" Then z = z & y 'False keeps Letters and spaces only
        Else
            If y Like "[0-9. ]" Then z = z & y   'True keeps Numbers and decimal points
        End If
    Next n
Strip = Trim(z)
End Function
=NUMBERVALUE(Strip(A1,TRUE)) 
=Strip(A1,FALSE)

This works in some places but not others. It also doesn't separate out C and D respectively. The most important issue is stripping out the work request number as seen in B.

Thanks for any help.

3 Answers 3

1

Here's a function using Regular Expressions that returns an array of the results.

Option Explicit
'Set reference to Microsoft VBScript Regular Expressions 5.5
'  or use late binding
Function Splitter(S As String) As String()
    Dim re As RegExp, MC As MatchCollection
    Const sPat As String = "^(?:\D*?(?=R?\d)(R?\d+)[,.]?([A-Z])?\s*[/\\]?\s*(.*\S)?)|\s*(.*\S)"
    Dim sTemp(2) As String
    
Set re = New RegExp
With re
    .Global = True
    .MultiLine = True
    .Pattern = sPat
    If .Test(S) = True Then
        Set MC = .Execute(S)
        With MC(0)
            sTemp(0) = .SubMatches(0) & .SubMatches(3)
            sTemp(1) = .SubMatches(1)
            sTemp(2) = .SubMatches(2)
        End With
    Splitter = sTemp
    End If
End With
    
End Function

With the data in A2:An, if you have Excel O365 with dynamic arrays, you can enter:

B2:  =Splitter(A2)

and fill down. The results of the array will spill right to columns C & D.

If you do not have dynamic arrays, then:

B2: =INDEX(Splitter($A2),COLUMNS($A:A))

Fill Right to D2. Then select B2:D2 and fill down as far as necessary.

enter image description here

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you for the solution. I get an error when trying to use the code. The error I get is "Ambiguous name detected: Splitter". I made sure that the VBA reference for Regular Expression was selected but I couldn't get anything to work. I will note that I started on a MAC before realizing that RegExp didn't work on the MAC so I switched over to my Windows machine, the above error is on that version. Both O365. I'm not very good with VBA so any help is very appreciated!
Update: I had the the same thing pasted into two modules and failed to notice the second module. I am going to keep tinkering with it but it looks like it is working now!
0

Try this code

Private Sub UserForm_Click()
    Dim Sp() As String: Sp = Split(Strip("2590082.A/4900 REPLACE DXAC"), "|")
    Sheet1.Range("B2", Sheet1.Cells(RowIndex:=2, ColumnIndex:=UBound(Sp) + 2)).Value = Sp
End Sub

Function Strip(s As String) As String
    If s = "" Then Exit Function
    Dim tmp As String
    tmp = s
    Dim Sp() As String: Sp = Split("0,1,2,3,4,5,6,7,8,9,.", ",")
    For i = 0 To 10
        tmp = Replace(tmp, Sp(i), "|")
    Next
    Dim words As String
    Sp = Split(tmp, "|")
    For i = 0 To UBound(Sp)
        If Sp(i) <> "" Then words = words & Sp(i) & "|"
    Next
    If Right$(words, 1) = "|" Then words = Mid(words, 1, Len(words) - 1)
    
    tmp = s
    Sp = Split(words, "|")
    
    For i = 0 To UBound(Sp)
        tmp = Replace(tmp, Sp(i), "|" & Sp(i) & "|")
    Next
    If Right$(tmp, 1) = "|" Then tmp = Mid(tmp, 1, Len(tmp) - 1)
    Strip = tmp
End Function

Comments

0

Here's an example using a regular expression.

Sub WorkRequests()
    
    Dim re As Object, allMatches, m, rv, sep, c As Range
    
    Set re = CreateObject("VBScript.RegExp")
    re.Pattern = "(((R00)?\d{7})[\.]?([A-Z])?)"
    re.ignorecase = True
    re.MultiLine = True
    re.Global = True
    
    For Each c In Range("B5:B20").Cells 'for example
        c.Offset(0, 1).Resize(1, 3).ClearContents 'clear output cells
        If re.test(c.Value) Then
            Set allMatches = re.Execute(c.Value)
            For Each m In allMatches
                c.Offset(0, 1).Value = m 'order#+letter
                c.Offset(0, 2).Value = m.submatches(1) 'order #
                c.Offset(0, 3).Value = m.submatches(3) 'letter
            Next m
        End If
    Next c
    
End Sub

Regular expressions reference: https://learn.microsoft.com/en-us/previous-versions/windows/internet-explorer/ie-developer/scripting-articles/ms974570(v=msdn.10)?redirectedfrom=MSDN

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.