Split column based on input string into multiple columns in pandas python

Question

I have below pandas data frame and I am trying to split col1 into multiple columns based on split_format string.

Inputs:

split_format = 'id-id1_id2|id3'

data = {'col1':['a-a1_a2|a3', 'b-b1_b2|b3', 'c-c1_c2|c3', 'd-d1_d2|d3'],
        'col2':[20, 21, 19, 18]}
df = pd.DataFrame(data).style.hide_index()
df

col1        col2
a-a1_a2|a3   20
b-b1_b2|b3   21
c-c1_c2|c3   19
d-d1_d2|d3   18

Expected Output:

id  id1 id2 id3 col2
 a   a1  a2  a3  20
 b   b1  b2  b3  21
 c   c1  c2  c3  19
 d   d1  d2  d3  18

**Note: The special characters and column name in split_string can be changed.

@GoldenLion I want to split the columns based on user input string. In this example the user input is split_string = 'id-id1_id2|id3' and we would be able to split accordingly. — Rushabh Patel
– Rushabh Patel, Commented Jun 3, 2021 at 14:10
I am parsing the split_string for non alpha numeric symbols to get the column names id id1 id2 and id3. I then will use a recurse tree to evaluate the value string for the value in the columns — ListenSoftware Louise Ai Agent
– ListenSoftware Louise Ai Agent, Commented Jun 3, 2021 at 14:15

Rushabh Patel · Accepted Answer · 2021-06-03 14:46:01Z

2

I think I am able to figure it out.

col_name = re.split('[^0-9a-zA-Z]+',split_format)
df[col_name] = df['col1'].str.split('[^0-9a-zA-Z]+',expand=True)
del df['col1']
df



   col2 id  id1 id2 id3
0   20  a   a1  a2  a3
1   21  b   b1  b2  b3
2   19  c   c1  c2  c3
3   18  d   d1  d2  d3

answered Jun 3, 2021 at 14:46

Rushabh Patel

2,7741 gold badge18 silver badges37 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

ListenSoftware Louise Ai Agent · Accepted Answer · 2021-06-03 15:11:00Z

1

I parse the symbols and then recursively evaluate the resulting strings from the token split on the string. I flatten the resulting list and their recursive evaluate the resulting list until all the symbols have been evaluated.

 split_format = 'id-id1_id2|id3'

 data = {'col1':['a-a1_a2|a3', 'b-b1_b2|b3', 'c-c1_c2|c3', 'd-d1_d2|d3'],
    'col2':[20, 21, 19, 18]}
 df = pd.DataFrame(data)

symbols=[]
for x in split_format:
    if x.isalnum()==False:
        symbols.append(x)

result=[]
def parseTree(stringlist,symbols,result):

    #print("String list",stringlist)

    if len(symbols)==0:
        [result.append(x) for x in stringlist]
        return
    token=symbols.pop(0)
    elements=[]
    for item in stringlist:
        elements.append(item.split(token))
    
    flat_list = [item for sublist in elements for item in sublist]        
    parseTree(flat_list,symbols,result)

df2=pd.DataFrame(columns=["id","id1","id2","id3"])
for key, item in df.iterrows():
    symbols2=symbols.copy()
    value=item['col1']
    parseTree([value],symbols2,result)
    a_series = pd. Series(result, index = df2.columns)
    df2=df2.append(a_series, ignore_index=True)
    result.clear()

df2['col2']=df['col2']    
print(df2)

output:

  id id1 id2 id3  col2
0  a  a1  a2  a3    20
1  b  b1  b2  b3    21
2  c  c1  c2  c3    19
3  d  d1  d2  d3    18

answered Jun 3, 2021 at 15:11

ListenSoftware Louise Ai Agent

4,3432 gold badges31 silver badges39 bronze badges

2 Comments

Rushabh Patel Over a year ago

Thank you. I ended up using shorter version using regex.

ListenSoftware Louise Ai Agent Over a year ago

great. I take the longer path because of case analysis I learned in college algebra

Collectives™ on Stack Overflow

Split column based on input string into multiple columns in pandas python

2 Answers 2

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related