4

I have the following input file in csv

A,B,C,D
1,2,|3|4|5|6|7|8,9
11,12,|13|14|15|16|17|18,19

How do I split column C right in the middle into two new rows with additional column E where the first half of the split get "0" in Column E and the second half get "1" in Column E?

A,B,C,D,E
1,2,|3|4|5,9,0
1,2,|6|7|8,9,1
11,12,|13|14|15,19,0
11,12,|16|17|18,19,1

Thank you

1
  • BTW, I think your input sample is nicely crafted: 2nd row's values are off by 10 from 1st row (1/11,2/12,...9/19), and you skipped 10/20 to make that work :) Commented Jul 5, 2022 at 22:35

4 Answers 4

3

Here's how to do it without Pandas:

import csv

with open("input.csv", newline="") as f_in, open("output.csv", "w", newline="") as f_out:
    reader = csv.reader(f_in)

    header = next(reader)  # read header
    header += ["E"]  # modify header

    writer = csv.writer(f_out)
    writer.writerow(header)

    for row in reader:
        a, b, c, d = row  # assign 4 items for each row

        c_items = [x.strip() for x in c.split("|") if x.strip()]

        n_2 = len(c_items) // 2  # halfway index
        c1 = "|" + "|".join(c_items[:n_2])
        c2 = "|" + "|".join(c_items[n_2:])

        writer.writerow([a, b, c1, d, 0])  # 0 & 1 will be converted to str on write
        writer.writerow([a, b, c2, d, 1])
Sign up to request clarification or add additional context in comments.

Comments

2

If I understand you correctly, you can use str.split on column 'C', then .explode() the column and join it again:

df["C"] = df["C"].apply(
    lambda x: [
        (vals := x.strip(" |").split("|"))[: len(vals) // 2],
        vals[len(vals) // 2 :],
    ]
)
df["E"] = df["C"].apply(lambda x: range(len(x)))
df = df.explode(["C", "E"])
df["C"] = "|" + df["C"].apply("|".join)

print(df.to_csv(index=False))

Prints:

A,B,C,D,E
1,2,|3|4|5,9,0
1,2,|6|7|8,9,1
11,12,|13|14|15,19,0
11,12,|16|17|18,19,1

Comments

2

Using a regex and str.findall to break the string, then explode and Groupby.cumcount:

(df.assign(C=df['C'].str.findall('(?:\|[^|]*){3}'))
   .explode('C')
   .assign(E=lambda d: d.groupby(level=0).cumcount())
   #.to_csv('out.csv', index=False)
 )

Output (before CSV export):

    A   B          C   D  E
0   1   2     |3|4|5   9  0
0   1   2     |6|7|8   9  1
1  11  12  |13|14|15  19  0
1  11  12  |16|17|18  19  1

Output CSV:

A,B,C,D,E
1,2,|3|4|5,9,0
1,2,|6|7|8,9,1
11,12,|13|14|15,19,0
11,12,|16|17|18,19,1

Comments

1

Another way

df=(df.assign(C=df['C'].str.replace('^\|','',regex=True)#remove leading | to allow split by the character
              .str.split('\|')#Split to create list
              .apply(lambda x:np.array_split(x, 2)))#split list into lists of sublists
             .explode('C')#explode into rows
               )
df = df.assign(C= "|" + df["C"].apply("|".join)#clean c
              ,E=df.groupby('A').cumcount('B'))

2 Comments

Note: Missing column E.
Case issues i.e. a, b, c should be A, B, C.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.