split multiple columns in pandas dataframe by delimiter

Question

I have survey data which annoying has returned multiple choice questions in the following way. It's in an excel sheet There is about 60 columns with responses from single to multiple that are split by /. This is what I have so far, is there any way to do this quicker without having to do this for each individual column

data = {'q1': ['one', 'two', 'three'],
   'q2' : ['one/two/three', 'a/b/c', 'd/e/f'],
   'q3' : ['a/b/c', 'd/e/f','g/h/i']}

df = pd.DataFrame(data)

df[['q2a', 'q2b', 'q2c']]= df['q2'].str.split('/', expand = True, n=0)
df[['q3a', 'q3b', 'q3c']]= df['q2'].str.split('/', expand = True, n=0)

clean_df = df.drop(df[['q2', 'q3']], axis=1)

What file format is that data in before your read it into memory or is it in fact a dict? — It_is_Chris
– It_is_Chris, Commented Oct 19, 2020 at 16:29

Erfan · Accepted Answer · 2020-10-19 17:00:09Z

6

We can use list comprehension with add_prefix, then we use pd.concat to concatenate everything to your final df:

splits = [df[col].str.split(pat='/', expand=True).add_prefix(col) for col in df.columns]
clean_df = pd.concat(splits, axis=1)

     q10  q20  q21    q22 q30 q31 q32
0    one  one  two  three   a   b   c
1    two    a    b      c   d   e   f
2  three    d    e      f   g   h   i

If you actually want your column names to be suffixed by a letter, you can do the following with string.ascii_lowercase:

from string import ascii_lowercase

dfs = []
for col in df.columns:
    d = df[col].str.split('/', expand=True)
    c = d.shape[1]
    d.columns = [col + l for l in ascii_lowercase[:c]]
    dfs.append(d)
    
clean_df = pd.concat(dfs, axis=1)

     q1a  q2a  q2b    q2c q3a q3b q3c
0    one  one  two  three   a   b   c
1    two    a    b      c   d   e   f
2  three    d    e      f   g   h   i

edited Oct 19, 2020 at 17:00

answered Oct 19, 2020 at 16:37

Erfan

43.3k10 gold badges75 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

David Erickson · Accepted Answer · 2020-10-19 17:01:23Z

You can create a dict d that transforms numbers to letters. Then loop through the columns and dynamically change their names:

input:

import pandas as pd
df = pd.DataFrame({'q1': ['one', 'two', 'three'],
   'q2' : ['one/two/three', 'a/b/c', 'd/e/f'],
   'q3' : ['a/b/c', 'd/e/f','g/h/i']})

code:

ltrs = list('abcdefghijklmonpqrstuvwxyz')
nmbrs = [i[0] for i in enumerate(ltrs)]
d = dict(zip(nmbrs, ltrs)) 

cols = df.columns[1:]
for col in cols:
    df1 = df[col].str.split('/', expand = True)
    df1.columns = df1.columns.map(d)
    df1 = df1.add_prefix(f'{col}')
    df = pd.concat([df,df1], axis=1)
df = df.drop(cols, axis=1)
df

output:

Out[1]: 
      q1  q2a  q2b    q2c q3a q3b q3c
0    one  one  two  three   a   b   c
1    two    a    b      c   d   e   f
2  three    d    e      f   g   h   i

Collectives™ on Stack Overflow

split multiple columns in pandas dataframe by delimiter

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related