The Situation:
I have an column which contains information which needs to be extracted. Here is some example content:
| Row | Content |
|---|---|
| 1 | CompanyName1 OrderNumber1 SomeUnimportantStuf1 |
| 2 | CompanyName2 CompanySurname2 OrderNumber2 SomeUnimportantStuff2 |
| 3 | CompanyName3 CompanySurname3 CompanyAddition3 OrderNumber_ABC3 SomeUnimportantStuff3 SomeMoreUnimportantStuff3 |
So basically The Company Name (containing from 0 up to 3 spaces), an order number and some unneccesary information at the end.
I need to extract the OrderNumber. The Problems:
- Company names varies from one-word up to three-words
- No unique separators like comma
- The OrderNumber hasn't always the same length and sometimes an suffix like "_v3" or even more (but is has no space - so basically it's always the longest word in each cell)
What I've succesfully done so far:
- Extracted the CompanyName in an new column "CompanyName"
And this is the point where I'm stuck. For my understanding the easiest way would be:
- Split the column "Content" by the Delimiter in "CompanyName" Since the OrderNumber has no space within itself, i could split the column again and have the OrderNumber standing alone.
Another idea is to search for the longest word in the column "Content" and extract it. But I was unable to find any solution for that either.
Is there anyone who can give me an helpful hint?
