4

So I have a dataframe with NaN values and I tranfsform all the rows in that dataframe in a list which then is added to another list.

Index   1   2   3   4   5   6   7   8   9   10  ... 71  72  73  74  75  76  77  78  79  80
orderid                                                                                 
20000765    624380  nan nan nan nan nan nan nan nan nan ... nan nan nan nan nan nan nan nan nan nan
20000766    624380  nan nan nan nan nan nan nan nan nan ... nan nan nan nan nan nan nan nan nan nan
20000768    1305984 1305985 1305983 1306021 nan nan nan nan nan nan ... nan nan nan nan nan nan nan nan nan nan
records = []
for i in range(0, 60550):
    records.append([str(dfpivot.values[i,j]) for j in range(0, 10)])

However, a lot of rows contain NaN values which I want to delete from the list, before I put it in the list of lists. Where do I need to insert that code and how do I do this?

I thought that this code would do the trick, but I guess it looks only to the direct values in the 'list of lists':

records = [x for x in records if str(x) != 'nan']

I'm new to Python, so I'm still figuring out the basics.

2
  • Do you want to delete the whole row that contains a NaN or do you just want to retrieve a list from every row, where every list does not contain any nans? Commented Dec 18, 2019 at 10:43
  • The second. I want a list of every row, except the NaN values. Commented Dec 18, 2019 at 10:44

3 Answers 3

3

One way is to take advantage of the fact that stack removes NaNs to generate the nested list:

df.stack().groupby(level=0).apply(list).values.tolist()
# [[624380.0], [624380.0], [1305984.0, 1305985.0, 1305983.0, 1306021.0]]
Sign up to request clarification or add additional context in comments.

Comments

1

IF you want to keep rows with nans you can do it like this:

In [5457]: df.T.dropna(how='all').T                                                                                                                                                            
Out[5457]: 
         Index           1           2           3           4
0 20000765.000  624380.000         nan         nan         nan
1 20000766.000  624380.000         nan         nan         nan
2 20000768.000 1305984.000 1305985.000 1305983.000 1306021.000

if you don't want any columns with nans you can drop them like this:

In [5458]: df.T.dropna().T                                                                                                                                                                     
Out[5458]: 
         Index           1
0 20000765.000  624380.000
1 20000766.000  624380.000
2 20000768.000 1305984.000

To create the array:

In [5464]: df.T.apply(lambda x: x.dropna().tolist()).tolist()                                                                                                                                  
Out[5464]: 
[[20000765.0, 624380.0],
 [20000766.0, 624380.0],
 [20000768.0, 1305984.0, 1305985.0, 1305983.0, 1306021.0]]

or

df.T[1:].apply(lambda x: x.dropna().tolist()).tolist()                                                                                                                              

Out[5471]: [[624380.0], [624380.0], [1305984.0, 1305985.0, 1305983.0, 1306021.0]]

depending on how you want the array

1 Comment

Yes, thank you! The last one did the trick. However I had to use df.T[0:].apply...
1

One way to do this would be with a nested list comprehension:

[[j for j in i if not pd.isna(j)] for i in dfpivot.values] 

EDIT it looks like you want strings - in which case,

[[str(j) for j in i if not pd.isna(j)] for i in dfpivot.values] 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.