I have the following dataset:
test_column
AB124
3847937BB
HP111
PG999-HP222
1222HP
HP3333-22HP
111HP3939DN
I want to work the following logic:
- find all alphabet in the test column
- if length of that alphabet string is greater than 2 and if there is an instance of "HP" in that string, then remove it once from the rest of the string.
- if the length of that alphabet string is greater than 2 and the there is NO instance of "HP" in that string, then keep the entire string.
- if the length of that alphabet string is less than or equal to 2, then keep the entire string.
So my desired output would look like this:
desired_column
AB
BB
HP
PG
HP
HP
DN
I am attempting a loop, but am unsuccessful in generating the desired result.
for index,row in df.iterrows():
target_value = row['test_column'] #array
predefined_code = ['HP'] #array
for code in re.findall("[a-zA-Z]+", target_value): #find all alphabets in the target_column
if (len(code)>2) and not (code in predefined_code):
possible_code = code
if (len(code)>2) and (code in predefined_code):
possible_code = possible_code.Select(code.replace(predefined_code,'',1))
if (len(code)<=2):
possible_code = code