1

I'm trying to build a dataframe from the following lists:

A = ['item 1', 'item 2', 'item 3', 'item 4', 'item 5']

B = ['item 2','item 4']

C = ['item 1', 'item 5']

I want the list name (or some representation of that name) to be the corresponding value, as such:

dA = [{'item':x, 'A':True} for x in A]
dB = [{'item':x, 'B':True} for x in B]
dC = [{'item':x, 'C':True} for x in C]

Currently, I'm building my dataframe using some ugly methods. I'd love a best practices solution here:

dfA = pd.DataFrame.from_records(dA)
dfB = pd.DataFrame.from_records(dB)
dfC = pd.DataFrame.from_records(dC)

df = pd.merge(dfA,dfB, 'outer').merge(dfC,'outer').fillna(False)

# Result:
    item    A   B   C
0   item 1  True    False   True
1   item 2  True    True    False
2   item 3  True    False   False
3   item 4  True    True    False
4   item 5  True    False   True

3 Answers 3

2

Another way to do this without merge

import pandas as pd

# list all unique items (in case there are not all present in A)
all_items = list(set(A+ B+C))
# create a dataframe with only item column
df = pd.DataFrame({'item':all_items})
# add boolean columns
df['A'] = df['item'].isin(A)
df['B'] = df['item'].isin(B)
df['C'] = df['item'].isin(C)

#   item    A   B   C
#0  item 4  True    True    False
#1  item 3  True    False   False
#2  item 2  True    True    False
#3  item 1  True    False   True
#4  item 5  True    False   True

If you wanted something prettier or you have more columns to create, you could also use a dictionary

dict_list = {'A': A, 'B': B, 'C':C}
for col in dict_list.keys():
  df[col] = df['item'].isin(dict_list[col])
Sign up to request clarification or add additional context in comments.

1 Comment

Clever. I didn't realize you could add sets like that.
1

You could use pandas.get_dummies:

import pandas as pd

A = ['item 1', 'item 2', 'item 3', 'item 4', 'item 5']
B = ['item 2', 'item 4']
C = ['item 1', 'item 5']

# generate series
s = pd.Series({'A': A, 'B': B, 'C': C})

# apply get dummies and transform
result = pd.get_dummies(s.apply(pd.Series).stack()).sum(level=0).T

print(result)

Output

        A  B  C
item 1  1  0  1
item 2  1  1  0
item 3  1  0  0
item 4  1  1  0
item 5  1  0  1

If you must have boolean values you could do, instead:

result = pd.get_dummies(s.apply(pd.Series).stack()).sum(level=0).T.astype(bool)

1 Comment

Really clever approach.
1

Try pd.crosstab

arr = np.concatenate([A, B, C])
col_arr = np.repeat(['A', 'B', 'C'], [len(A), len(B), len(C)])
pd.crosstab(index=arr, columns=col_arr)

Out[106]:
col_0   A  B  C
row_0
item 1  1  0  1
item 2  1  1  0
item 3  1  0  0
item 4  1  1  0
item 5  1  0  1

If you want True/False, just chain an additional eq(1)

pd.crosstab(index=arr, columns=col_arr).eq(1)

Out[108]:
col_0      A      B      C
row_0
item 1  True  False   True
item 2  True   True  False
item 3  True  False  False
item 4  True   True  False
item 5  True  False   True

3 Comments

What does crosstab do?
@YaakovBressler: crosstab builds cross tabulation table showing the frequency of certain groups of data. Crosstab accepts array-like, Series, or list of arrays/Series. Your data are lists, so crosstab has advantage that it works directly on those lists without constructing pandas dataframe or series. For more info, you may read ít docs: pandas.pydata.org/pandas-docs/stable/reference/api/…
Very helpful. I'll look into that.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.