0

I have a dataframe with the original column 'All' , which I split into RegionName1 and RegioName2 columns. There are duplicate entries, for example, Duluth and Duluth (University of Minnesota Duluth. I want to convert strings like Duluth (University of Minnesota Duluth to NaN values. So I have tried

unitown['RegionName2'] = [np.nan if '(' in x else x for x in unitown['RegionName2']]

and got an error that TypeError: argument of type 'float' is not iterable. What else can I try?

enter image description here

unitown=pd.read_table('university_towns.txt', header=None).rename(columns={0:'All'})
unitown['State']=unitown['All'].apply(lambda x: x.split('[edi')[0].strip() if x.count('[edi') else np.NaN).fillna(method="ffill")                       #.fillna(method="ffill")
unitown['RegionName1'] = unitown['All'].apply(lambda x: x.split('(')[0].strip() if x.count('(') else np.NaN)
unitown['RegionName2'] = unitown['All'].apply(lambda x: x.split(',')[0].strip() if x.count(',') else np.NaN)
unitown['RegionName2'] = [np.nan if '(' in x else x for x in     unitown['RegionName2']]
return unitown[unitown.State=='Minnesota']  

2 Answers 2

1

You can either use:

unitown.loc[unitown.RegionName2.str.contains("("), 'RegionName2'] = np.NaN

Or add this logic directly to the code that generates RegionName2 as in:

unitown['RegionName2'] = unitown['All'].apply(
    lambda x: x.split(',')[0].strip() if x.count(',') and "(" not in x.split(',')[0] else np.NaN
)
Sign up to request clarification or add additional context in comments.

3 Comments

thanks, foglerit! this is exactly what I was looking for.
My pleasure @MariaBruevich. Could you hit the accept button so others can easily know this answer solves your problem? Thanks
I dont see the accept button? I clicked on 'this answer is useful' next your answer. By the way, I discovered that I should convert the NaNs to type 'string' for my list comprehension to work.
0
#input data
d = {'RegionName1': ["a", "b", "c", "d"], 'RegionName2': ['Duluth and Duluth (University of Minnesota Duluth', "Monkato(Minnesota", 'Other1', 'Other2']}
df = pd.DataFrame(data=d)
print("Input dataframe:")
print(df)

#searching for '(' in RegionName2 column and replacing with NaN
z=0
for i, row in df.iterrows():
  k = df.loc[z,'RegionName2']
  if '(' in str(k):
    df.loc[z,'RegionName2'] = np.nan
  z = z+1
print("Output dataframe:")
print(df)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.