1

I have the following dataset:

test_column

AB124
3847937BB
HP111
PG999-HP222
1222HP
HP3333-22HP
111HP3939DN

I want to work the following logic:

  1. find all alphabet in the test column
  2. if length of that alphabet string is greater than 2 and if there is an instance of "HP" in that string, then remove it once from the rest of the string.
  3. if the length of that alphabet string is greater than 2 and the there is NO instance of "HP" in that string, then keep the entire string.
  4. if the length of that alphabet string is less than or equal to 2, then keep the entire string.

So my desired output would look like this:

desired_column

AB
BB
HP
PG
HP
HP
DN

I am attempting a loop, but am unsuccessful in generating the desired result.

for index,row in df.iterrows():
target_value = row['test_column']     #array
predefined_code = ['HP']      #array     
for code in re.findall("[a-zA-Z]+", target_value):  #find all alphabets in the target_column
    if (len(code)>2) and not (code in predefined_code):
        possible_code = code
    if (len(code)>2) and (code in predefined_code):
        possible_code = possible_code.Select(code.replace(predefined_code,'',1))
    if (len(code)<=2):
        possible_code = code

1 Answer 1

1

Since the cases are mutually exclusive and complete, the logic can be simplified to

"For alphabetical substring of length > 2 and has 'HP' in it, remove the first 'HP', else keep the substring as it is."

First use regex to remove the non-alphabetical parts of each string, then implement the logic using a simple if-else statement.

import pandas as pd
import re

df= pd.DataFrame({'test_column': ['AB124','3847937BB','HP111','PG999-HP222','1222HP','HP3333-22HP','111HP3939DN']})

for index,row in df.iterrows():
    target_value = row['test_column']     #array
    regex = re.compile("[^A-Z]")
    code = regex.sub('',target_value)

    if len(code) > 2 and 'HP' in code:
        possible_code = code.replace('HP','',1)
    else:
        possible_code = code
    print(possible_code)

gives as desired:

AB
BB
HP
PG
HP
HP
DN
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.