How to split a row into two rows in python based on delimiter in Python

Question

I have the following input file in csv

A,B,C,D
1,2,|3|4|5|6|7|8,9
11,12,|13|14|15|16|17|18,19

How do I split column C right in the middle into two new rows with additional column E where the first half of the split get "0" in Column E and the second half get "1" in Column E?

A,B,C,D,E
1,2,|3|4|5,9,0
1,2,|6|7|8,9,1
11,12,|13|14|15,19,0
11,12,|16|17|18,19,1

Thank you

BTW, I think your input sample is nicely crafted: 2nd row's values are off by 10 from 1st row (1/11,2/12,...9/19), and you skipped 10/20 to make that work :) — Zach Young
– Zach Young, Commented Jul 5, 2022 at 22:35

Zach Young · Accepted Answer · 2022-07-05 22:31:16Z

3

Here's how to do it without Pandas:

import csv

with open("input.csv", newline="") as f_in, open("output.csv", "w", newline="") as f_out:
    reader = csv.reader(f_in)

    header = next(reader)  # read header
    header += ["E"]  # modify header

    writer = csv.writer(f_out)
    writer.writerow(header)

    for row in reader:
        a, b, c, d = row  # assign 4 items for each row

        c_items = [x.strip() for x in c.split("|") if x.strip()]

        n_2 = len(c_items) // 2  # halfway index
        c1 = "|" + "|".join(c_items[:n_2])
        c2 = "|" + "|".join(c_items[n_2:])

        writer.writerow([a, b, c1, d, 0])  # 0 & 1 will be converted to str on write
        writer.writerow([a, b, c2, d, 1])

edited Jul 5, 2022 at 22:31

answered Jul 5, 2022 at 21:30

Zach Young

11.4k4 gold badges38 silver badges57 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Andrej Kesely · Accepted Answer · 2022-07-05 20:17:01Z

2

If I understand you correctly, you can use str.split on column 'C', then .explode() the column and join it again:

df["C"] = df["C"].apply(
    lambda x: [
        (vals := x.strip(" |").split("|"))[: len(vals) // 2],
        vals[len(vals) // 2 :],
    ]
)
df["E"] = df["C"].apply(lambda x: range(len(x)))
df = df.explode(["C", "E"])
df["C"] = "|" + df["C"].apply("|".join)

print(df.to_csv(index=False))

Prints:

A,B,C,D,E
1,2,|3|4|5,9,0
1,2,|6|7|8,9,1
11,12,|13|14|15,19,0
11,12,|16|17|18,19,1

answered Jul 5, 2022 at 20:17

Andrej Kesely

196k15 gold badges60 silver badges105 bronze badges

Comments

mozway · Accepted Answer · 2022-07-05 20:46:44Z

2

Using a regex and str.findall to break the string, then explode and Groupby.cumcount:

(df.assign(C=df['C'].str.findall('(?:\|[^|]*){3}'))
   .explode('C')
   .assign(E=lambda d: d.groupby(level=0).cumcount())
   #.to_csv('out.csv', index=False)
 )

Output (before CSV export):

    A   B          C   D  E
0   1   2     |3|4|5   9  0
0   1   2     |6|7|8   9  1
1  11  12  |13|14|15  19  0
1  11  12  |16|17|18  19  1

Output CSV:

A,B,C,D,E
1,2,|3|4|5,9,0
1,2,|6|7|8,9,1
11,12,|13|14|15,19,0
11,12,|16|17|18,19,1

edited Jul 5, 2022 at 20:46

answered Jul 5, 2022 at 20:40

mozway

267k13 gold badges56 silver badges106 bronze badges

Comments

wwnde · Accepted Answer · 2022-07-05 21:29:00Z

1

Another way

df=(df.assign(C=df['C'].str.replace('^\|','',regex=True)#remove leading | to allow split by the character
              .str.split('\|')#Split to create list
              .apply(lambda x:np.array_split(x, 2)))#split list into lists of sublists
             .explode('C')#explode into rows
               )
df = df.assign(C= "|" + df["C"].apply("|".join)#clean c
              ,E=df.groupby('A').cumcount('B'))

edited Jul 5, 2022 at 21:29

answered Jul 5, 2022 at 21:05

wwnde

26.7k6 gold badges22 silver badges38 bronze badges

2 Comments

DarrylG Over a year ago

Note: Missing column E.

DarrylG Over a year ago

Case issues i.e. a, b, c should be A, B, C.

Collectives™ on Stack Overflow

How to split a row into two rows in python based on delimiter in Python

4 Answers 4

Comments

Comments

Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related