how to replace a substring in a loop array [pandas]

Question

I have the following dataset:

test_column

AB124
3847937BB
HP111
PG999-HP222
1222HP
HP3333-22HP
111HP3939DN

I want to work the following logic:

find all alphabet in the test column
if length of that alphabet string is greater than 2 and if there is an instance of "HP" in that string, then remove it once from the rest of the string.
if the length of that alphabet string is greater than 2 and the there is NO instance of "HP" in that string, then keep the entire string.
if the length of that alphabet string is less than or equal to 2, then keep the entire string.

So my desired output would look like this:

desired_column

AB
BB
HP
PG
HP
HP
DN

I am attempting a loop, but am unsuccessful in generating the desired result.

for index,row in df.iterrows():
target_value = row['test_column']     #array
predefined_code = ['HP']      #array     
for code in re.findall("[a-zA-Z]+", target_value):  #find all alphabets in the target_column
    if (len(code)>2) and not (code in predefined_code):
        possible_code = code
    if (len(code)>2) and (code in predefined_code):
        possible_code = possible_code.Select(code.replace(predefined_code,'',1))
    if (len(code)<=2):
        possible_code = code

Troy · Accepted Answer · 2018-10-11 04:13:38Z

1

Since the cases are mutually exclusive and complete, the logic can be simplified to

"For alphabetical substring of length > 2 and has 'HP' in it, remove the first 'HP', else keep the substring as it is."

First use regex to remove the non-alphabetical parts of each string, then implement the logic using a simple if-else statement.

import pandas as pd
import re

df= pd.DataFrame({'test_column': ['AB124','3847937BB','HP111','PG999-HP222','1222HP','HP3333-22HP','111HP3939DN']})

for index,row in df.iterrows():
    target_value = row['test_column']     #array
    regex = re.compile("[^A-Z]")
    code = regex.sub('',target_value)

    if len(code) > 2 and 'HP' in code:
        possible_code = code.replace('HP','',1)
    else:
        possible_code = code
    print(possible_code)

gives as desired:

AB
BB
HP
PG
HP
HP
DN

answered Oct 11, 2018 at 4:13

Troy

5485 silver badges13 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

how to replace a substring in a loop array [pandas]

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related