I am getting a co-occurrence matrix as follows using pandas.
lst = [
['a', 'b'],
['b', 'c', 'd', 'e'],
['a', 'd'],
['b', 'e']
]
u = (pd.get_dummies(pd.DataFrame(lst), prefix='', prefix_sep='')
.groupby(level=0, axis=1)
.sum())
v = u.T.dot(u)
v.values[(np.r_[:len(v)], ) * 2] = 0
print(v)
My output is as follows.
a b c d e
a 0 1 0 1 0
b 1 0 1 1 2
c 0 1 0 1 1
d 1 1 1 0 1
e 0 2 1 1 0
I want to get how many times e appears with d using the above matrix (i.e. 1) and divide it by the total count of co-occurrences (i.e. 9 --> since the matix is symetric I only considered the upper part of the matrix to get the total sum).
So my output should be;
for co-occurrence count of e and d is 1.
and co-occurrence count of all should be 9 as follows (since the matrix is symetric).
I would like to know if it is possible do it in pandas.
I am happy to provide more details if needed.

