1

I have a text file in the following format.

[1,2]
[3]
[4,5,6,7,10]

And I have a pandas DataFrame like following.

df = pd.DataFrame({'id' : [1,2,3,4,5,6,7],
                'path'  : ["p1,p2,p3,p4","p1,p2,p1","p1,p5,p5,p7","p1,p2,p3,p3","p1,p2","p1","p2,p3,p4"]})

output:

   id         path
0   1  p1,p2,p3,p4
1   2     p1,p2,p1
2   3  p1,p5,p5,p7
3   4  p1,p2,p3,p3
4   5        p1,p2
5   6           p1
6   7     p2,p3,p4

I want to slice the DataFrame based on the text file. What is the wrong with following? It produces empty DataFrames.

for line in lines:
    print line
    print df[df['id'].isin(line)]

But it works fine with following.

for line in lines:
    print df[df['id'].isin([1,2])]
0

1 Answer 1

3

line is a string. [1,2] is a list. To convert the string to a list, you could use ast.literal_eval:

import ast
line = ast.literal_eval(line)

import ast
for line in lines:
    print line
    line = ast.literal_eval(line)
    print df.loc[df['id'].isin(line)]

PS. Although df[boolean_mask] works, I think df.loc[boolean_mask] is better because it does not require the reader to know the type of values in boolean_mask to understand which way the df is being sub-selected (by row or by column). df.loc is more explicit, and a tad faster.

Sign up to request clarification or add additional context in comments.

1 Comment

Excellent..wasted lot of time on this. Thanks :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.