Splitting one column into multiple columns with python pandas

Question

I would like to split a column into multiple columns in my data frame. It is separated with commas.

I would like to apply something like 'text to columns' function in excel.

I will give my own headings after I split the columns. 'Turnstile' is the name of my column. I have:

(A006, R079, 00-00-04, 5 AVE-59 ST)

types of data in each row. In the end I would like to have:

A006    R079   00-00-04   5 AVE-59 ST

with the headings I will create.

I lastly tried:

df.Turnstile.str.split().tolist()

But all i have is 'nan'

When I check the type of 'Turnstile' column, it says 'object. I tried to convert that series into string with:

df['Turnstile'] = df[['Turnstile'].astype(str)]

but it gives me:

AttributeError: 'list' object has no attribute 'astype'

Please advise.

Thank you.

can you check the dtype of each of the tuple entries ? i.e. [type(df.Turnstile.values[0][i]) for i in range(4) — maxymoo
– maxymoo, Commented Sep 28, 2015 at 1:43

measureallthethings · Accepted Answer · 2015-09-28 16:25:59Z

Maybe another way of looking at this is converting a column of tuples to a DataFrame, like so:

In [10]: DataFrame(df['Turnstile'].tolist())
Out[10]:
      0     1         2            3
0  A006  R079  00-00-04  5 AVE-59 ST
1  A006  R079  00-00-04  5 AVE-59 ST
2  A006  R079  00-00-04  5 AVE-59 ST
3  A006  R079  00-00-04  5 AVE-59 ST
4  A006  R079  00-00-04  5 AVE-59 ST
5  A006  R079  00-00-04  5 AVE-59 ST
6  A006  R079  00-00-04  5 AVE-59 ST
7  A006  R079  00-00-04  5 AVE-59 ST
8  A006  R079  00-00-04  5 AVE-59 ST
9  A006  R079  00-00-04  5 AVE-59 ST

If that's the case, here's an example that converts the column of tuples to a DataFrame and adds it back to the original dataframe:

import numpy as np
import pandas as pd
from pandas import Series, DataFrame

# create a fake dataframe, repeating the tuple given in the example
In [2]: df = DataFrame(data={'Observations': np.random.randn(10) * np.arange(10),
...:     'Turnstile': (('A006', 'R079', '00-00-04', '5 AVE-59 ST'),)*10})

In [3]: df.head()
Out[3]:
   Observations                            Turnstile
0     -0.000000  (A006, R079, 00-00-04, 5 AVE-59 ST)
1     -0.022668  (A006, R079, 00-00-04, 5 AVE-59 ST)
2     -2.380515  (A006, R079, 00-00-04, 5 AVE-59 ST)
3     -4.209983  (A006, R079, 00-00-04, 5 AVE-59 ST)
4      3.932902  (A006, R079, 00-00-04, 5 AVE-59 ST)

# all at once turn the column of tuples into a dataframe and concat that with the original df
In [4]: df = pd.concat([df,DataFrame(df['Turnstile'].tolist())], axis=1, join='outer')

In [5]: df.head()
Out[5]:
       Observations                            Turnstile     0     1         2  \
    0     -0.000000  (A006, R079, 00-00-04, 5 AVE-59 ST)  A006  R079  00-00-04
    1     -0.022668  (A006, R079, 00-00-04, 5 AVE-59 ST)  A006  R079  00-00-04
    2     -2.380515  (A006, R079, 00-00-04, 5 AVE-59 ST)  A006  R079  00-00-04
    3     -4.209983  (A006, R079, 00-00-04, 5 AVE-59 ST)  A006  R079  00-00-04
    4      3.932902  (A006, R079, 00-00-04, 5 AVE-59 ST)  A006  R079  00-00-04

         3
0  5 AVE-59 ST
1  5 AVE-59 ST
2  5 AVE-59 ST
3  5 AVE-59 ST
4  5 AVE-59 ST

# i assume you don't need this column anymore
In [6]: del df['Turnstile']

If that works you can of course name the new columns as needed.

@measureallthethings this is a much better answer than mine; i didn't realise that you can create a data frame from a list of tuples

maxymoo · Accepted Answer · 2015-09-27 22:43:21Z

0

Couple options here, if your data is in true csv format, say as an export from Excel, you can use pandas.read_csv to read in the file, and it will automatically be split into columns based on the column delimiters.

If your data is a string column with commas, you can use str.split to redefine your columns, but as far as I know, you need to dump the resulting column as a raw Python list and then recast as dataframe:

import pandas as pd
df = pd.DataFrame([["A006, R079, 00-00-04, 5 AVE-59 ST"]])
df2 = pd.DataFrame(df[0].str.split(',').tolist())

answered Sep 27, 2015 at 22:43

maxymoo

36.7k12 gold badges97 silver badges121 bronze badges

1 Comment

lorelai Over a year ago

It gives me KeyError. Doesn't work. @maxymoo As I mentioned, I put ---> df.Turnstile.str.split().tolist() it gives me all 'nan'

ah bon · Accepted Answer · 2020-02-13 08:04:14Z

0

try doing df.Turnstile.str.split(',')

edited Feb 13, 2020 at 8:04

ah bon

10.1k22 gold badges82 silver badges185 bronze badges

answered Jun 2, 2017 at 8:03

lightyagami96

3563 gold badges6 silver badges15 bronze badges

1 Comment

Nuageux Over a year ago

When answering a question, please provide explanation associated with your code. Some people might not understand your code or don't see how it answers the question. See how to write a good answer

Collectives™ on Stack Overflow

Splitting one column into multiple columns with python pandas

3 Answers 3

2 Comments

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related