1

I have csv file, which information (id and text) in column's looks like in example below:

1, Šildomos grindys
2, Šildomos grindys, Rekuperacinė sistema
3, 
4, Skalbimo mašina, Su baldais, Šaldytuvas, Šildomos grindys

My desired output is to transfer ID to one row and relate it to its text (it's for database). Because of csv file is really big, iam giving you just fraction to understand what i want:

| ID             | Features   
+----------------+-------------
| 1              | Šildomos grindys
| 2              | Šildomos grindys
| 2              | Rekuperacinė sistema
| 3              | null
| 4              | Skalbimo mašina
| 4              | Su baldais
| 4              | Šaldytuvas
| 4              | Šildomos grindys

How can i do that via python ? Thanks !

2
  • what is the file format? Commented May 21, 2022 at 14:10
  • Give examples of 1/ the CSV file as it is 2/ the plain result that you want Commented May 21, 2022 at 14:10

2 Answers 2

1

Here is a way to do what you've asked:

with open('infoo.txt', 'r', encoding="utf-8") as f:
    records = []
    rows = [[x.strip() for x in row.split(',')] for row in f.readlines()]
    for row in rows:
        for i in range(1, len(row)):
            records.append([row[0], row[i] if row[i] else 'null'])
    with open('outfoo.txt', 'w', encoding="utf-8") as g:
        g.write('ID,Features\n')
        for record in records:
            g.write(f'{",".join(field for field in record)}\n')

# check the output file:
with open('outfoo.txt', 'r', encoding="utf-8") as f:
    print('contents of output file:')
    [print(row.strip('\n')) for row in f.readlines()]

Output:

contents of output file:
ID,Features
1,Šildomos grindys
2,Šildomos grindys
2,Rekuperacinė sistema
3,null
4,Skalbimo mašina
4,Su baldais
4,Šaldytuvas
4,Šildomos grindys

UPDATE:

An alternative approach would be to look at using pandas (docs). Pandas provides many powerful ways to work with tabular data, but it also has a bit of a learning curve:

import pandas as pd
with open('infoo.txt', 'r', encoding="utf-8") as f:
    records = []
    rows = [[x.strip() for x in row.split(',')] for row in f.readlines()]
    df = pd.DataFrame([[row[0], row[1:]] for row in rows], columns=['ID', 'Feature'])
    print('Dataframe read from input file:'); print(df)
    df = df.explode('Feature').reset_index(drop=True)
    print('Dataframe with one Feature per row:'); print(df)
    df.to_csv('outfoo.txt', index = False)

    # check the output file:
    df2 = pd.read_csv('outfoo.txt')
    print('Dataframe re-read from output file:'); print(df2)

Output

Dataframe read from input file:
  ID                                            Feature
0  1                                 [Šildomos grindys]
1  2           [Šildomos grindys, Rekuperacinė sistema]
2  3                                                 []
3  4  [Skalbimo mašina, Su baldais, Šaldytuvas, Šild...
Dataframe with one Feature per row:
  ID               Feature
0  1      Šildomos grindys
1  2      Šildomos grindys
2  2  Rekuperacinė sistema
3  3
4  4       Skalbimo mašina
5  4            Su baldais
6  4            Šaldytuvas
7  4      Šildomos grindys
Dataframe re-read from output file:
   ID               Feature
0   1      Šildomos grindys
1   2      Šildomos grindys
2   2  Rekuperacinė sistema
3   3                   NaN
4   4       Skalbimo mašina
5   4            Su baldais
6   4            Šaldytuvas
7   4      Šildomos grindys

Links to docs for pandas are here:

Sign up to request clarification or add additional context in comments.

2 Comments

If you find yourself doing a lot of this kind of processing, it could make sense to explore pandas. I have updated my answer to show a pandas-based solution as an alternative way to solve your problem.
Thank you. Iam about to learn pandas soon !
1

You can read the CSV and then append each row to a new list.

data.csv:

1, Šildomos grindys
2, Šildomos grindys, Rekuperacinė sistema
3,
4, Skalbimo mašina, Su baldais, Šaldytuvas, Šildomos grindys

Code:

import csv

data = []
with open('data.csv') as csvfile:
    reader = csv.reader(csvfile)
    for row in reader:
        for i in range(1, len(row)):
            value = row[i].strip()
            if value == "":
                value = "null"
            data.append([int(row[0]), value])
print(data)

Output:

[[1, 'Šildomos grindys'], [2, 'Šildomos grindys'], [2, 'Rekuperacinė sistema'], [3, 'null'], [4, 'Skalbimo mašina'], [4, 'Su baldais'], [4, 'Šaldytuvas'], [4, 'Šildomos grindys']]

References:

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.