2

I have a csv that has values spread over multiple rows like this (real data has about 70 columns)

id | name | alias
 1 |  Amy | Potato
   |      | Fortress
 2 | Bill | Gyroscope
...

Now I want to import this into a dataframe. The tricky part is reading aliases into an array if there is a multiple of them. From the example above we should get Amy [Potato, Fortress] and Bill [Gyroscope]

I can do it with row-by-row processing but I was wandering if there is some smarter built-in way.

UPD: clarified the requirement

3
  • I don't think you can do it row-by-row. Have you heard of read_csv of pandas? Commented Jan 13, 2022 at 13:46
  • Can you copy/paste a sample of your csv in a raw format please? Commented Jan 13, 2022 at 13:49
  • Neither, yes I have heard of read_csv but it will not construct the arrays for me. Commented Jan 14, 2022 at 12:39

2 Answers 2

2

If your csv file looks like:

id,name,alias
1,Amy,Potato
,,Fortress
2,Bill,Gyroscope

You can use ffill

df = pd.read_csv('data.csv', dtype=str).ffill()
print(df)

# Output
  id  name      alias
0  1   Amy     Potato
1  1   Amy   Fortress
2  2  Bill  Gyroscope

Update

Thanks for the response! I am looking for something that will combine row 1 into the row 0 rather than create a new row. So that we get Amy [Potato, Fortress]

Use agg:

df = pd.read_csv('data.csv', dtype=str).ffill() \
       .groupby('id', as_index=False) \
       .agg({'id': 'first', 'name': 'first', 'alias': lambda x: list(x)})
print(df)

# Output
  id  name               alias
0  1   Amy  [Potato, Fortress]
1  2  Bill         [Gyroscope]
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks for the response! I am looking for something that will combine row 1 into the row 0 rather than create a new row. So that we get Amy [Potato, Fortress]
Is your csv look like mine?
yes, pretty much. There are more multi-line columns but basically it is the same
@Rince. I updated my answer according your comment. Can you check it please?
0

If your data is look like below, as you describe:

id | name | alias
 1 |  Amy | Potato
   |      | Fortress
 2 | Bill | Gyroscope

Supoose this data saved in data.txt file, then you can simply

import pandas as pd
import numpy as np

def str_to_nan(x):
    return [np.nan if str(i).strip() == '' else i for i in x]


df = pd.read_csv('data.txt', sep='|').apply(lambda x: str_to_nan(x))
df = df.ffill()
Output:
    id  name    alias
0   1   Amy     Potato
1   1   Amy     Fortress
2   2   Bill    Gyroscope

2 Comments

I was not completely clear in my requirements, edited
I have updated my answer have a look also

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.