2

I am trying to get a value situated on the third column from a pandas dataframe by knowing the values of interest on the first two columns, which point me to the right value to fish out. I do not know the row index, just the values I need to look for on the first two columns. The combination of values from the first two columns is unique, so I do not expect to get a subset of the dataframe, but only a row. I do not have column names and I would like to avoid using them.

Consider the dataframe df:

a    1    bla 
b    2    tra
b    3    foo
b    1    bar
c    3    cra

I would like to get tra from the second row, based on the b and 2 combination that I know beforehand. I've tried subsetting with

df = df.loc['b', :]

which returns all the rows with b on the same column (provided I've read the data with index_col = 0) but I am not able to pass multiple conditions on it without crashing or knowing the index of the row of interest. I tried both df.loc and df.iloc.

In other words, ideally I would like to get tra without even using row indexes, by doing something like:

df[(df[,0] == 'b' & df[,1] == `2`)][2]

Any suggestions? Probably it is something simple enough, but I have the tendency to use the same syntax as in R, which apparently is not compatible.

Thank you in advance

8
  • df.loc[df.index=='b',:] ? Commented Apr 6, 2021 at 17:35
  • @anky that returns all the rows with the index b right? Equivalent to df = df.loc['b', :] as I wrote in the question. I would need a double condition with the second column being 2 Commented Apr 6, 2021 at 17:38
  • can you update the question and the expected output with this usecase? Commented Apr 6, 2021 at 17:42
  • 1
    Try this read the file with index_col=[0,1] , then do df.iloc[df.index.get_loc(("b",2)):] ? Commented Apr 6, 2021 at 17:52
  • 1
    sorry @anky, it was another thing that led to that error. multiple row indexing works. i can get that value with df = pd.read_csv(path, sep='\t', index_col=[0, 1]), df = df.iloc[df.index.get_loc(("b", 2)):], and df.iloc[0] i will post an answer mentioning your help. thank you. not that straight forward tho :) Commented Apr 6, 2021 at 19:01

2 Answers 2

2

As @anky has suggested, a way to do this without knowing the column names nor the row index where your value of interest is, would be to read the file in a pandas dataframe using multiple column indexing.

For the provided example, knowing the column indexes at least, that would be:

df = pd.read_csv(path, sep='\t', index_col=[0, 1])

then, you can use:

df = df.iloc[df.index.get_loc(("b", 2)):]
df.iloc[0]

to get the value of interest.

Thanks again @anky for your help. If you found this question useful, please upvote @anky 's comment in the posted question.

Sign up to request clarification or add additional context in comments.

Comments

1

I'd probably use pd.query for that:

import pandas as pd

df = pd.DataFrame(index=['a', 'b', 'b', 'b', 'c'], data={"col1": [1, 2, 3, 1, 3], "col2": ['bla', 'tra', 'foo', 'bar', 'cra']})
df

   col1 col2
a   1   bla
b   2   tra
b   3   foo
b   1   bar
c   3   cra

df.query('col1 == 2 and col2 == "tra"')
    col1 col2
b   2   tra

2 Comments

thanks for your suggestion. but that implies using column names and knowing beforehand the value I am looking for (e.g., tra). however, I've tried it, but I get TypeError: 'method' object is not subscriptable
In my case I somewhat knew column names, but they changes based on predetermined data. I could construct the query string from the dataframe based on this predetermine dataset. Thanks for adding.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.