0

I have a dataframe shown below:

     Name   X    Y
0    A      False True
1    B      True  True
2    C      True  False

I want to create a function for example:

example_function("A") = "A is in Y"
example_function("B") = "B is in X and Y"
example_function("C") = "C is in X"

This is my code currently (incorrect and doesn't look very efficient):

def example_function(name):
    for name in df['Name']:
        if df['X'][name] == True and df['Y'][name] == False:
            print(str(name) + "is in X")
        elif df['X'][name] == False and df['Y'][name] == True:
            print(str(name) + "is in Y")
        else:
            print(str(name) + "is in X and Y")

I eventually want to add more Boolean columns so it needs to be scalable. How can I do this? Would it be better to create a dictionary, rather than a dataframe?

Thanks!

2
  • What's the output supposed to look like? Commented Mar 24, 2022 at 12:19
  • @timgeb I have edited it, an example of the output is "A is in Y" Commented Mar 24, 2022 at 12:21

2 Answers 2

1

If you really want a function you could do:

def example_function(label):
    s = df.set_index('Name').loc[label]
    l = s[s].index.to_list()
    return f'{label} is in {" and ".join(l)}'

example_function('A')
'A is in Y'

example_function('B')
'B is in X and Y'

You can also compute all the solutions as dictionary:

s = (df.set_index('Name').replace({False: pd.NA}).stack()
       .reset_index(level=0)['Name']
     )
out = s.index.groupby(s)

output:

{'A': ['Y'], 'B': ['X', 'Y'], 'C': ['X']}
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you very much, this works perfectly. However, if I add more columns, how do I make it say "A is in W, X, Y and Z", instead of "A is in W and X and Y and Z".
You need to test the length of the list and handle differently the last element and the rest. There might even be libraries that do that automatically.
1

I think you can stay with a DataFrame, the same output can be obtained with a function like this:

def func (name, df):
    # some checks to verify that the name is actually in the df
    occurrences_name = np.sum(df['Name'] == name)
    if occurrences_name == 0: 
        raise ValueError('Name not found')
    elif occurrences_name > 1:
        raise ValueError('More than one name found')

    # get the index corresponding to the name you're looking for
    # and select the corresponding row
    index = df[df['Name'] == name].index[0]
    row = df.drop(['Name'], axis=1).iloc[index]
    outstring = '{} is in '.format(name)
    for i in range(len(row)):
        if row[i] == True:
            if i != 0: outstring += ', '
            outstring += '{}'.format(row.index[i])
    return outstring

of course you can adapt this to the specific shape of your df, I'm assuming that the column containing names is actually 'Name'.

1 Comment

Thanks, this works but there is a small problem. If a name is False for X but True for Y, the output is "A is in , Y". Otherwise it is good.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.