3

I have a pandas dataframe with multiple rows and columns filled with types and values. All are strings. I want to write a function that conditions: 1) which type I search (column 1) 2) a first value (column 2) 3) a second, consecutive value (in the next row of column 2)

I manage to write a function that searches one value of one type as below, but how do I add the second type? I think it might be with help of df.shift(axis=0), but I do not know how to combine that command with a conditional search.

import pandas as pd

d = {'type': ['wordclass', 'wordclass', 'wordclass', 'wordclass', 'wordclass', 'wordclass',
 'english', 'english', 'english', 'english', 'english', 'english'],
 'values': ['dem', 'noun', 'cop', 'det', 'dem', 'noun', 'this', 'tree', 'is', 'a', 'good', 'tree']}
df = pd.DataFrame(data=d)
print(df)

tiername = 'wordclass'
v1 = 'dem'
v2 = 'noun'

def search_single_tier(tiername, v1):
    searchoutput = df[df['type'].str.contains(tiername) & df['values'].str.match(v1)]
    return searchoutput

x = search_single_tier(tiername, v1)
print(x)```

2
  • To make it clearer, you want function that receives three arguments: x, y, z and returns a row where x is value for column1, y is values for column2 and z is the value of the next row which have x in column1? Commented May 12, 2020 at 9:37
  • Yes, that's want I want. Commented May 12, 2020 at 9:44

1 Answer 1

1

You don't need to create a function for doing this. Instead, try this:

In [422]: tiername = 'wordclass'                                                                                                                                                                            

## This equates `type` columns to `tiername`. 
## `.iloc[0:2]` gets the first 2 rows for the matched condition

In [423]: df[df.type.eq(tiername)].iloc[0:2]                                                                                                                                                                
Out[423]: 
        type values
0  wordclass    dem
1  wordclass   noun

After Op's comment:

Find all consecutive rows like this:
tiername = 'wordclass'
v1 = 'dem'

In [455]: ix_list = df[df.type.eq(tiername) & df['values'].eq(v1)].index.tolist()

In [464]: pd.concat([df.iloc[ix_list[0]: ix_list[0]+2], df.iloc[ix_list[1]: ix_list[1]+2]])                                                                                                                 
Out[464]: 
        type values
0  wordclass    dem
1  wordclass   noun
4  wordclass    dem
5  wordclass   noun
Sign up to request clarification or add additional context in comments.

5 Comments

Yes, that works, but I want to be able to change the values that I am looking for, and it doesn't give me all consecutive values dem and noun (there are two such combinations in the mwe).
@Eline Please check 2nd part of my answer. Updated.
If I understand this update correctly I cannot alter the second value, though, it just gives me the value following v1. I want to be able to find out if my v1 can be followed by a v2 that I set myself, say, if 'dem' can be followed by 'cop' or 'det' as well in my database.
What my update basically is, for a particular tiername and v1, return next row value which has the same tiername and v1. I thought this was the question. Now, you have made me confused.
To put it in normal language: I am interested in which combinations of words and wordclasses are found (and not found) in my database. For example: can the word 'this' follow the word 'tree'? Can a wordclass 'det' precede a wordclass 'noun'? I am also interested in the frequencies. To use Roim's words under the original post: I want a function that receives three arguments: x, y, z and returns all rows where x is the value for column1, y is the value for column2 and z is the value of the next row (which also has x in column1).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.