16

I am aware of the following questions:

1.) How to split a column based on several string indices using pandas? 2.) How do I split text in a column into multiple rows?

I want to split these into several new columns though. Suppose I have a dataframe that looks like this:

id    | string
-----------------------------
1     | astring, isa, string
2     | another, string, la
3     | 123, 232, another

I know that using:

df['string'].str.split(',')

I can split a string. But as a next step, I want to efficiently put the split string into new columns like so:

id    | string_1 | string_2 | string_3
-----------------|---------------------
1     | astring  | isa      | string
2     | another  | string   | la
3     | 123      | 232      | another
---------------------------------------

I could for example do this:

for index, row in df.iterrows():
    i = 0
    for item in row['string'].split():
        df.set_values(index, 'string_{0}'.format(i), item)
        i = i + 1

But how could one achieve the same result more elegantly?a

2 Answers 2

20

The str.split method has an expand argument:

>>> df['string'].str.split(',', expand=True)
         0        1         2
0  astring      isa    string
1  another   string        la
2      123      232   another
>>>

With column names:

>>> df['string'].str.split(',', expand=True).rename(columns = lambda x: "string"+str(x+1))
   string1  string2   string3
0  astring      isa    string
1  another   string        la
2      123      232   another

Much neater with Python >= 3.6 f-strings:

>>> (df['string'].str.split(',', expand=True)
...              .rename(columns=lambda x: f"string_{x+1}"))
  string_1 string_2  string_3
0  astring      isa    string
1  another   string        la
2      123      232   another
Sign up to request clarification or add additional context in comments.

2 Comments

How can I add these 'string_x' columns to the original dataframe>
df[['new_column_1', 'new_column_2', 'new_column_3']] = above answer
1

Slightly less concise than the expand option, but here is an alternative way:

In [29]: cols = ['string_1', 'string_2', 'string_3']   

In [30]: pandas.DataFrame(df.string.str.split(', ').tolist(), columns=cols)
Out[30]: 
  string_1 string_2 string_3
0  astring      isa   string
1  another   string       la
2      123      232  another

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.