2

I would like to split a column into multiple columns in my data frame. It is separated with commas.

I would like to apply something like 'text to columns' function in excel.

I will give my own headings after I split the columns. 'Turnstile' is the name of my column. I have:

(A006, R079, 00-00-04, 5 AVE-59 ST)

types of data in each row. In the end I would like to have:

A006    R079   00-00-04   5 AVE-59 ST

with the headings I will create.

I lastly tried:

df.Turnstile.str.split().tolist()

But all i have is 'nan'

When I check the type of 'Turnstile' column, it says 'object. I tried to convert that series into string with:

df['Turnstile'] = df[['Turnstile'].astype(str)]

but it gives me:

AttributeError: 'list' object has no attribute 'astype'

Please advise.

Thank you.

7
  • What do you get when you do type(df.Turnstile.values[0])? Commented Sep 27, 2015 at 22:45
  • it says tuple. @maxymoo Commented Sep 28, 2015 at 1:03
  • can you check the dtype of each of the tuple entries ? i.e. [type(df.Turnstile.values[0][i]) for i in range(4) Commented Sep 28, 2015 at 1:43
  • it says it is string: <type 'str'> @maxymoo Commented Sep 28, 2015 at 3:06
  • can you please post the results of df.head()? Commented Sep 28, 2015 at 5:39

3 Answers 3

3

Maybe another way of looking at this is converting a column of tuples to a DataFrame, like so:

In [10]: DataFrame(df['Turnstile'].tolist())
Out[10]:
      0     1         2            3
0  A006  R079  00-00-04  5 AVE-59 ST
1  A006  R079  00-00-04  5 AVE-59 ST
2  A006  R079  00-00-04  5 AVE-59 ST
3  A006  R079  00-00-04  5 AVE-59 ST
4  A006  R079  00-00-04  5 AVE-59 ST
5  A006  R079  00-00-04  5 AVE-59 ST
6  A006  R079  00-00-04  5 AVE-59 ST
7  A006  R079  00-00-04  5 AVE-59 ST
8  A006  R079  00-00-04  5 AVE-59 ST
9  A006  R079  00-00-04  5 AVE-59 ST

If that's the case, here's an example that converts the column of tuples to a DataFrame and adds it back to the original dataframe:

import numpy as np
import pandas as pd
from pandas import Series, DataFrame

# create a fake dataframe, repeating the tuple given in the example
In [2]: df = DataFrame(data={'Observations': np.random.randn(10) * np.arange(10),
...:     'Turnstile': (('A006', 'R079', '00-00-04', '5 AVE-59 ST'),)*10})

In [3]: df.head()
Out[3]:
   Observations                            Turnstile
0     -0.000000  (A006, R079, 00-00-04, 5 AVE-59 ST)
1     -0.022668  (A006, R079, 00-00-04, 5 AVE-59 ST)
2     -2.380515  (A006, R079, 00-00-04, 5 AVE-59 ST)
3     -4.209983  (A006, R079, 00-00-04, 5 AVE-59 ST)
4      3.932902  (A006, R079, 00-00-04, 5 AVE-59 ST)

# all at once turn the column of tuples into a dataframe and concat that with the original df
In [4]: df = pd.concat([df,DataFrame(df['Turnstile'].tolist())], axis=1, join='outer')

In [5]: df.head()
Out[5]:
       Observations                            Turnstile     0     1         2  \
    0     -0.000000  (A006, R079, 00-00-04, 5 AVE-59 ST)  A006  R079  00-00-04
    1     -0.022668  (A006, R079, 00-00-04, 5 AVE-59 ST)  A006  R079  00-00-04
    2     -2.380515  (A006, R079, 00-00-04, 5 AVE-59 ST)  A006  R079  00-00-04
    3     -4.209983  (A006, R079, 00-00-04, 5 AVE-59 ST)  A006  R079  00-00-04
    4      3.932902  (A006, R079, 00-00-04, 5 AVE-59 ST)  A006  R079  00-00-04

         3
0  5 AVE-59 ST
1  5 AVE-59 ST
2  5 AVE-59 ST
3  5 AVE-59 ST
4  5 AVE-59 ST

# i assume you don't need this column anymore
In [6]: del df['Turnstile']

If that works you can of course name the new columns as needed.

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you @measureallthethings
@measureallthethings this is a much better answer than mine; i didn't realise that you can create a data frame from a list of tuples
0

Couple options here, if your data is in true csv format, say as an export from Excel, you can use pandas.read_csv to read in the file, and it will automatically be split into columns based on the column delimiters.

If your data is a string column with commas, you can use str.split to redefine your columns, but as far as I know, you need to dump the resulting column as a raw Python list and then recast as dataframe:

import pandas as pd
df = pd.DataFrame([["A006, R079, 00-00-04, 5 AVE-59 ST"]])
df2 = pd.DataFrame(df[0].str.split(',').tolist())

1 Comment

It gives me KeyError. Doesn't work. @maxymoo As I mentioned, I put ---> df.Turnstile.str.split().tolist() it gives me all 'nan'
0

try doing df.Turnstile.str.split(',')

1 Comment

When answering a question, please provide explanation associated with your code. Some people might not understand your code or don't see how it answers the question. See how to write a good answer

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.