1

The Situation:

I have an column which contains information which needs to be extracted. Here is some example content:

Row Content
1 CompanyName1 OrderNumber1 SomeUnimportantStuf1
2 CompanyName2 CompanySurname2 OrderNumber2 SomeUnimportantStuff2
3 CompanyName3 CompanySurname3 CompanyAddition3 OrderNumber_ABC3 SomeUnimportantStuff3 SomeMoreUnimportantStuff3

So basically The Company Name (containing from 0 up to 3 spaces), an order number and some unneccesary information at the end.

I need to extract the OrderNumber. The Problems:

  • Company names varies from one-word up to three-words
  • No unique separators like comma
  • The OrderNumber hasn't always the same length and sometimes an suffix like "_v3" or even more (but is has no space - so basically it's always the longest word in each cell)

What I've succesfully done so far:

  • Extracted the CompanyName in an new column "CompanyName"

And this is the point where I'm stuck. For my understanding the easiest way would be:

  • Split the column "Content" by the Delimiter in "CompanyName" Since the OrderNumber has no space within itself, i could split the column again and have the OrderNumber standing alone.

Another idea is to search for the longest word in the column "Content" and extract it. But I was unable to find any solution for that either.

Is there anyone who can give me an helpful hint?

1 Answer 1

1

This will return the longest word in a string:

  • split the string
  • get length of each split
  • get maximum length
  • match max length to position
  • return word at that position
let
    Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Content", type text}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each 
        Text.Split([Content]," "){
            List.PositionOf(
                List.Transform(
                    Text.Split([Content]," "), 
                        each Text.Length(_)),
                                List.Max(
                                    List.Transform(
                                        Text.Split([Content]," "), 
                                            each Text.Length(_))))})
in
    #"Added Custom"

enter image description here

Edit: M Code rewritten to better show algorithm
not sure if it is more or less efficient, but it is easier to understand

let
    Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Content", type text}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each 
        let 
          wordList = Text.Split([Content]," "),
          lengthList = List.Transform(wordList, each Text.Length(_)),
          lengthLongestWord = List.Max(lengthList),
          positionLongestWord = List.PositionOf(lengthList,lengthLongestWord),
          longestWord = wordList{positionLongestWord}
        in 
          longestWord)
in
    #"Added Custom"
Sign up to request clarification or add additional context in comments.

2 Comments

Hey @Ron Rosenfeld! Many thanks for your code - it works like an charm! After checking the results I've figured out, there are a few entries, where the CompanyName is longer than the OrderNumber. But this is an minor problem and can be fixed manually. The main part is done with this solution and I thank you very much!
@DerKamiKatze Depending on your data set, perhaps, before applying this algorithm, 1. Remove leading articles (a,and,the,... only if they are the first word). 2. Split by space -- left-most only, then delete that column. That might remove that part of the company name that is very long. And hopefully none of the company names are the same as an article.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.