A quick disclosure: I come from R background and am switching to pandas (running on python 3.3.3).
I would like to select rows from a dataframe by using text from a dataframe entry. It's an elementry operation but I could not get around the syntax.
For example, with this DataFrame (sorry for the line split but I want to make the example clearer):
films = pandas.DataFrame({'$title':[ "The Godfather",
"Pulp Fiction",
"The Godfather: Part II",
"Fight Club"],
'$director': [ "Coppola, Francis Ford",
"Tarantino, Quentin",
"Coppola, Francis Ford",
"Fincher, David"]})
If I want to select all the films created by the first director, which would be "Coppola, Francis Ford", the command I am using is:
In [1]: director = films.iloc[[1]]["director"]
In [2]: director
1 Coppola, Francis Ford
Name: director, dtype: object
In [3]: a = films[ films["director"] == director ]
ValueError: Series lengths must match to compare
If I do this:
In [4]: a = films[ films["director"] == str(director) ]
I get an empty DataFrame. What's going on here? Seems like I'm missing something.