10

I have survey data which annoying has returned multiple choice questions in the following way. It's in an excel sheet There is about 60 columns with responses from single to multiple that are split by /. This is what I have so far, is there any way to do this quicker without having to do this for each individual column

data = {'q1': ['one', 'two', 'three'],
   'q2' : ['one/two/three', 'a/b/c', 'd/e/f'],
   'q3' : ['a/b/c', 'd/e/f','g/h/i']}

df = pd.DataFrame(data)

df[['q2a', 'q2b', 'q2c']]= df['q2'].str.split('/', expand = True, n=0)
df[['q3a', 'q3b', 'q3c']]= df['q2'].str.split('/', expand = True, n=0)

clean_df = df.drop(df[['q2', 'q3']], axis=1)
2
  • 1
    What file format is that data in before your read it into memory or is it in fact a dict? Commented Oct 19, 2020 at 16:29
  • Sorry should have said it's in an excel sheet Commented Oct 19, 2020 at 16:55

2 Answers 2

6

We can use list comprehension with add_prefix, then we use pd.concat to concatenate everything to your final df:

splits = [df[col].str.split(pat='/', expand=True).add_prefix(col) for col in df.columns]
clean_df = pd.concat(splits, axis=1)
     q10  q20  q21    q22 q30 q31 q32
0    one  one  two  three   a   b   c
1    two    a    b      c   d   e   f
2  three    d    e      f   g   h   i

If you actually want your column names to be suffixed by a letter, you can do the following with string.ascii_lowercase:

from string import ascii_lowercase

dfs = []
for col in df.columns:
    d = df[col].str.split('/', expand=True)
    c = d.shape[1]
    d.columns = [col + l for l in ascii_lowercase[:c]]
    dfs.append(d)
    
clean_df = pd.concat(dfs, axis=1)
     q1a  q2a  q2b    q2c q3a q3b q3c
0    one  one  two  three   a   b   c
1    two    a    b      c   d   e   f
2  three    d    e      f   g   h   i
Sign up to request clarification or add additional context in comments.

Comments

0

You can create a dict d that transforms numbers to letters. Then loop through the columns and dynamically change their names:

input:

import pandas as pd
df = pd.DataFrame({'q1': ['one', 'two', 'three'],
   'q2' : ['one/two/three', 'a/b/c', 'd/e/f'],
   'q3' : ['a/b/c', 'd/e/f','g/h/i']})

code:

ltrs = list('abcdefghijklmonpqrstuvwxyz')
nmbrs = [i[0] for i in enumerate(ltrs)]
d = dict(zip(nmbrs, ltrs)) 

cols = df.columns[1:]
for col in cols:
    df1 = df[col].str.split('/', expand = True)
    df1.columns = df1.columns.map(d)
    df1 = df1.add_prefix(f'{col}')
    df = pd.concat([df,df1], axis=1)
df = df.drop(cols, axis=1)
df

output:

Out[1]: 
      q1  q2a  q2b    q2c q3a q3b q3c
0    one  one  two  three   a   b   c
1    two    a    b      c   d   e   f
2  three    d    e      f   g   h   i

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.