Slice pandas DataFrame based on csv of lists

Question

I have a text file in the following format.

[1,2]
[3]
[4,5,6,7,10]

And I have a pandas DataFrame like following.

df = pd.DataFrame({'id' : [1,2,3,4,5,6,7],
                'path'  : ["p1,p2,p3,p4","p1,p2,p1","p1,p5,p5,p7","p1,p2,p3,p3","p1,p2","p1","p2,p3,p4"]})

output:

   id         path
0   1  p1,p2,p3,p4
1   2     p1,p2,p1
2   3  p1,p5,p5,p7
3   4  p1,p2,p3,p3
4   5        p1,p2
5   6           p1
6   7     p2,p3,p4

I want to slice the DataFrame based on the text file. What is the wrong with following? It produces empty DataFrames.

for line in lines:
    print line
    print df[df['id'].isin(line)]

But it works fine with following.

for line in lines:
    print df[df['id'].isin([1,2])]

unutbu · Accepted Answer · 2014-04-07 10:25:09Z

3

line is a string. [1,2] is a list. To convert the string to a list, you could use ast.literal_eval:

import ast
line = ast.literal_eval(line)

import ast
for line in lines:
    print line
    line = ast.literal_eval(line)
    print df.loc[df['id'].isin(line)]

PS. Although df[boolean_mask] works, I think df.loc[boolean_mask] is better because it does not require the reader to know the type of values in boolean_mask to understand which way the df is being sub-selected (by row or by column). df.loc is more explicit, and a tad faster.

edited Apr 7, 2014 at 10:25

answered Apr 7, 2014 at 10:17

unutbu

886k197 gold badges1.9k silver badges1.7k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Nilani Algiriyage Over a year ago

Excellent..wasted lot of time on this. Thanks :)

Collectives™ on Stack Overflow

Slice pandas DataFrame based on csv of lists

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related