Splitting rows in csv with comma separated values

Question

I have csv file, which information (id and text) in column's looks like in example below:

1, Šildomos grindys
2, Šildomos grindys, Rekuperacinė sistema
3, 
4, Skalbimo mašina, Su baldais, Šaldytuvas, Šildomos grindys

My desired output is to transfer ID to one row and relate it to its text (it's for database). Because of csv file is really big, iam giving you just fraction to understand what i want:

| ID             | Features   
+----------------+-------------
| 1              | Šildomos grindys
| 2              | Šildomos grindys
| 2              | Rekuperacinė sistema
| 3              | null
| 4              | Skalbimo mašina
| 4              | Su baldais
| 4              | Šaldytuvas
| 4              | Šildomos grindys

How can i do that via python ? Thanks !

Give examples of 1/ the CSV file as it is 2/ the plain result that you want — hpchavaz
– hpchavaz, Commented May 21, 2022 at 14:10

constantstranger · Accepted Answer · 2022-05-21 15:09:33Z

Here is a way to do what you've asked:

with open('infoo.txt', 'r', encoding="utf-8") as f:
    records = []
    rows = [[x.strip() for x in row.split(',')] for row in f.readlines()]
    for row in rows:
        for i in range(1, len(row)):
            records.append([row[0], row[i] if row[i] else 'null'])
    with open('outfoo.txt', 'w', encoding="utf-8") as g:
        g.write('ID,Features\n')
        for record in records:
            g.write(f'{",".join(field for field in record)}\n')

# check the output file:
with open('outfoo.txt', 'r', encoding="utf-8") as f:
    print('contents of output file:')
    [print(row.strip('\n')) for row in f.readlines()]

Output:

contents of output file:
ID,Features
1,Šildomos grindys
2,Šildomos grindys
2,Rekuperacinė sistema
3,null
4,Skalbimo mašina
4,Su baldais
4,Šaldytuvas
4,Šildomos grindys

UPDATE:

An alternative approach would be to look at using pandas (docs). Pandas provides many powerful ways to work with tabular data, but it also has a bit of a learning curve:

import pandas as pd
with open('infoo.txt', 'r', encoding="utf-8") as f:
    records = []
    rows = [[x.strip() for x in row.split(',')] for row in f.readlines()]
    df = pd.DataFrame([[row[0], row[1:]] for row in rows], columns=['ID', 'Feature'])
    print('Dataframe read from input file:'); print(df)
    df = df.explode('Feature').reset_index(drop=True)
    print('Dataframe with one Feature per row:'); print(df)
    df.to_csv('outfoo.txt', index = False)

    # check the output file:
    df2 = pd.read_csv('outfoo.txt')
    print('Dataframe re-read from output file:'); print(df2)

Output

Dataframe read from input file:
  ID                                            Feature
0  1                                 [Šildomos grindys]
1  2           [Šildomos grindys, Rekuperacinė sistema]
2  3                                                 []
3  4  [Skalbimo mašina, Su baldais, Šaldytuvas, Šild...
Dataframe with one Feature per row:
  ID               Feature
0  1      Šildomos grindys
1  2      Šildomos grindys
2  2  Rekuperacinė sistema
3  3
4  4       Skalbimo mašina
5  4            Su baldais
6  4            Šaldytuvas
7  4      Šildomos grindys
Dataframe re-read from output file:
   ID               Feature
0   1      Šildomos grindys
1   2      Šildomos grindys
2   2  Rekuperacinė sistema
3   3                   NaN
4   4       Skalbimo mašina
5   4            Su baldais
6   4            Šaldytuvas
7   4      Šildomos grindys

Links to docs for pandas are here:

If you find yourself doing a lot of this kind of processing, it could make sense to explore pandas. I have updated my answer to show a pandas-based solution as an alternative way to solve your problem.

arshovon · Accepted Answer · 2022-05-21 14:32:04Z

You can read the CSV and then append each row to a new list.

data.csv:

1, Šildomos grindys
2, Šildomos grindys, Rekuperacinė sistema
3,
4, Skalbimo mašina, Su baldais, Šaldytuvas, Šildomos grindys

Code:

import csv

data = []
with open('data.csv') as csvfile:
    reader = csv.reader(csvfile)
    for row in reader:
        for i in range(1, len(row)):
            value = row[i].strip()
            if value == "":
                value = "null"
            data.append([int(row[0]), value])
print(data)

Output:

[[1, 'Šildomos grindys'], [2, 'Šildomos grindys'], [2, 'Rekuperacinė sistema'], [3, 'null'], [4, 'Skalbimo mašina'], [4, 'Su baldais'], [4, 'Šaldytuvas'], [4, 'Šildomos grindys']]

References:

Python documentation on CSV module

Collectives™ on Stack Overflow

Splitting rows in csv with comma separated values

2 Answers 2

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related