How to split strings inside a numpy array?

Question

I have the following table:

As the column 'location' has the state repeating inside it, I am trying to remove the state from location so that it only has the city name.

year    location    state   success
2009    New York, NY    NY  1
2009    New York, NY    NY  1
2009    Chicago, IL IL  1
2009    New York, NY    NY  1
2009    Boston, MA  MA  1
2009    Long Beach, CA  CA  1
2009    Atlanta, GA GA  1

I have tried the following code:

x = KS_clean.column(1)
np.chararray.split(x, ',')

How can I split the string so the result only contains the city name like the following:

array('New York', 'New York', 'Chicago', ...,)

so that I can put it back inside the table?

Sorry it is basic question but I am new to python and still learning. Thanks

Your data looks like a pandas DataFrame, not a numpy array. Please check. — DYZ
– DYZ, Commented Aug 12, 2017 at 6:37
It is a pandas DataFrame but when I extract the column (var x) and check its type it says numpy.ndarray — Hamza Khawar
– Hamza Khawar, Commented Aug 12, 2017 at 6:42
How did you get the dataframe in the first place? It looks odd. When you select a column, you must get a Series, not anything-numpy. — DYZ
– DYZ, Commented Aug 12, 2017 at 6:54

jezrael · Accepted Answer · 2017-08-12 07:05:12Z

I think you need working with DataFrame first (e.g. by read_csv):

import numpy as np
from pandas.compat import StringIO

temp=u"""year;location;state;success
2009;New York, NY;NY;1
2009;New York, NY;NY;1
2009;Chicago, IL;IL;1
2009;New York, NY;NY;1
2009;Boston, MA;MA;1
2009;Long Beach, CA;CA;1
2009;Atlanta, GA;GA;1"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp), sep=";")

print (type(df))
<class 'pandas.core.frame.DataFrame'>

print (df)
   year        location state  success
0  2009    New York, NY    NY        1
1  2009    New York, NY    NY        1
2  2009     Chicago, IL    IL        1
3  2009    New York, NY    NY        1
4  2009      Boston, MA    MA        1
5  2009  Long Beach, CA    CA        1
6  2009     Atlanta, GA    GA        1

Then split by str.split and select first list by str[0]:

df['location'] = df['location'].str.split(', ').str[0]
print (df)
   year    location state  success
0  2009    New York    NY        1
1  2009    New York    NY        1
2  2009     Chicago    IL        1
3  2009    New York    NY        1
4  2009      Boston    MA        1
5  2009  Long Beach    CA        1
6  2009     Atlanta    GA        1

Last if necessary convert by values to numpy array:

arr = df.values
print (arr)
[[2009 'New York' 'NY' 1]
 [2009 'New York' 'NY' 1]
 [2009 'Chicago' 'IL' 1]
 [2009 'New York' 'NY' 1]
 [2009 'Boston' 'MA' 1]
 [2009 'Long Beach' 'CA' 1]
 [2009 'Atlanta' 'GA' 1]]

Collectives™ on Stack Overflow

How to split strings inside a numpy array?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related