Python pandas csv multirow values into arrays

Question

I have a csv that has values spread over multiple rows like this (real data has about 70 columns)

id | name | alias
 1 |  Amy | Potato
   |      | Fortress
 2 | Bill | Gyroscope
...

Now I want to import this into a dataframe. The tricky part is reading aliases into an array if there is a multiple of them. From the example above we should get Amy [Potato, Fortress] and Bill [Gyroscope]

I can do it with row-by-row processing but I was wandering if there is some smarter built-in way.

UPD: clarified the requirement

I don't think you can do it row-by-row. Have you heard of read_csv of pandas? — user17693816
– user17693816, Commented Jan 13, 2022 at 13:46
Can you copy/paste a sample of your csv in a raw format please? — Corralien
– Corralien, Commented Jan 13, 2022 at 13:49
Neither, yes I have heard of read_csv but it will not construct the arrays for me. — Rince
– Rince, Commented Jan 14, 2022 at 12:39

Corralien · Accepted Answer · 2022-01-15 20:00:11Z

2

If your csv file looks like:

id,name,alias
1,Amy,Potato
,,Fortress
2,Bill,Gyroscope

You can use ffill

df = pd.read_csv('data.csv', dtype=str).ffill()
print(df)

# Output
  id  name      alias
0  1   Amy     Potato
1  1   Amy   Fortress
2  2  Bill  Gyroscope

Update

Thanks for the response! I am looking for something that will combine row 1 into the row 0 rather than create a new row. So that we get Amy [Potato, Fortress]

Use agg:

df = pd.read_csv('data.csv', dtype=str).ffill() \
       .groupby('id', as_index=False) \
       .agg({'id': 'first', 'name': 'first', 'alias': lambda x: list(x)})
print(df)

# Output
  id  name               alias
0  1   Amy  [Potato, Fortress]
1  2  Bill         [Gyroscope]

edited Jan 15, 2022 at 20:00

answered Jan 13, 2022 at 13:51

Corralien

121k8 gold badges44 silver badges69 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Rince Over a year ago

Thanks for the response! I am looking for something that will combine row 1 into the row 0 rather than create a new row. So that we get Amy [Potato, Fortress]

Corralien Over a year ago

Is your csv look like mine?

Rince Over a year ago

yes, pretty much. There are more multi-line columns but basically it is the same

Corralien Over a year ago

@Rince. I updated my answer according your comment. Can you check it please?

Mazhar · Accepted Answer · 2022-01-15 11:16:01Z

0

If your data is look like below, as you describe:

id | name | alias
 1 |  Amy | Potato
   |      | Fortress
 2 | Bill | Gyroscope

Supoose this data saved in data.txt file, then you can simply

import pandas as pd
import numpy as np

def str_to_nan(x):
    return [np.nan if str(i).strip() == '' else i for i in x]


df = pd.read_csv('data.txt', sep='|').apply(lambda x: str_to_nan(x))
df = df.ffill()

Output:
    id  name    alias
0   1   Amy     Potato
1   1   Amy     Fortress
2   2   Bill    Gyroscope

edited Jan 15, 2022 at 11:16

answered Jan 13, 2022 at 14:14

Mazhar

1,0647 silver badges12 bronze badges

2 Comments

Rince Over a year ago

I was not completely clear in my requirements, edited

Mazhar Over a year ago

I have updated my answer have a look also

Collectives™ on Stack Overflow

Python pandas csv multirow values into arrays

2 Answers 2

4 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related