I have a dataframe with two text columns. The column value of one column (lets say Col B) is basically a substring/part of the whole string of the other column (lets say Col A). I want to find patterns in each of them and want to check the trend of the positioning or beginning letters of the string of Col A. So I want to generate three columns, one is the position of the substring the other two are the preceding and the following characters.
Here is the how the dataframe looks like:
| Col A | Col B |
----------------------
AGHXXXJ002 | XXX |
AGHGHJJ002 | GHJ |
ABCRTGHP001 | RTGH |
ABCDFFP01 | DFF |
ABCXGHJD09 | XGH |
Now based on the above pattern I want to generate two columns:
| Col A | Col B | Position | Preceding Chars | Following Chars |
-------------------------------------------------------------------------------------
AGHXXXJ002 | XXX | [3, 5] | AGH | J002 |
(Because XXX starts at index 3 and ends at 5) | | |
AGHGHJJ002 | GHJ | [3, 5] | AGH | J002 |
ABCRTGHP001 | RTGH | [3, 6] | ABC | P001 |
ABCDFFP01 | DFFP | [3, 5] | ABC | 01 |
ABCXGHJD09 | XGH | [3, 5] | ABC | D09 |
HGMQQUTV01 | HGM | [0, 2] | NaN | QQUTV01 |
GBHUJJS099 | BHU | [1, 3] | G | JJS099 |
This is my desired output. I tried using a for loop and scrape out the substrings, but never got executed, hence removed the code. Till now I have been doing manually but there are more than 50k rows and its not feasible. Also, the position column can be split into two different columns, start position and end position.