2

A quick disclosure: I come from R background and am switching to pandas (running on python 3.3.3).

I would like to select rows from a dataframe by using text from a dataframe entry. It's an elementry operation but I could not get around the syntax.

For example, with this DataFrame (sorry for the line split but I want to make the example clearer):

films = pandas.DataFrame({'$title':[  "The Godfather",
                                      "Pulp Fiction",
                                      "The Godfather: Part II",
                                      "Fight Club"],

                      '$director': [  "Coppola, Francis Ford",
                                      "Tarantino, Quentin",
                                      "Coppola, Francis Ford",
                                      "Fincher, David"]})

If I want to select all the films created by the first director, which would be "Coppola, Francis Ford", the command I am using is:

In [1]: director = films.iloc[[1]]["director"]

In [2]: director

        1    Coppola, Francis Ford
        Name: director, dtype: object

In [3]: a = films[ films["director"] == director ]

        ValueError: Series lengths must match to compare

If I do this:

In [4]: a = films[ films["director"] == str(director) ]

I get an empty DataFrame. What's going on here? Seems like I'm missing something.

1
  • 1
    As kermit666 explained, there are quite errors in this question and using the dot notation made things a lot clearer. Commented Jun 24, 2014 at 6:54

2 Answers 2

3

OK, first of all I see you made a couple of style/semantics mistakes which are common for R-to-Python converts:

  • you don't need the $ signs for your column names and it actually makes column selection nicer as you can write films.director if the name is only 'director' (it has to be a valid Python identifier for this syntactic sugar to work)
  • indexing in Python starts at 0, not 1, so you select the 1st director as films.director[0]

Assuming you removed the $ signs from your DataFrame definition, you can select the movies as:

In [16]: films[films['director'] == films['director'][0]]
Out[16]:
                director                   title
0  Coppola, Francis Ford           The Godfather
2  Coppola, Francis Ford  The Godfather: Part II

or even cleaner as films[films.director == films.director[0]].

Using your original DataFrame you can perform your query with:

director = films.iloc[[1]]['$director'][1]
films[films['$director'] == director]

One error was that you first defined the table with '$director' and then queried it with 'director' as the column name.

The [1] in the end is necessary because you indexed the DataFrame with a list [1], instead of a value 1, so you got back a Series, as CT Zhu already noticed. List indexing is meant more for selecting several arbitrary elements such as films.iloc[[1, 3]]. In your case it would be clearer to write

director = films.iloc[1]['$director']

Also, note that this still gets Tarantino and not Coppola.

Sign up to request clarification or add additional context in comments.

Comments

1

I think films[ films["director"] == films.ix[0, 'director' ]] will suffice.

The reason films.iloc[[1]]["director"] won't work is because it is a Series, not a string.

If you want to use iloc, do: films.iloc[1]["director"] instead of films.iloc[[1]]["director"]

Also:

In [241]:

str(films.iloc[[1]]["director"])
Out[241]:
'1    Tarantino, Quentin\nName: director, dtype: object'

so, films[ films["director"] == str(director) ] won't match anything and will return a empty dataframe.

2 Comments

Thanks, I will have a look into the answer when I get back. I believe when iloc[1] won't give me the first element if I sort the dataframe.
Also, notice that index in python starts at 0, instead of 1 in R and matlab, cheers!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.