0

I am getting a co-occurrence matrix as follows using pandas.

lst = [
    ['a', 'b'],
    ['b', 'c', 'd', 'e'],
    ['a', 'd'],
    ['b', 'e']
]

u = (pd.get_dummies(pd.DataFrame(lst), prefix='', prefix_sep='')
       .groupby(level=0, axis=1)
       .sum())

v = u.T.dot(u)
v.values[(np.r_[:len(v)], ) * 2] = 0

print(v)

My output is as follows.

   a  b  c  d  e
a  0  1  0  1  0
b  1  0  1  1  2
c  0  1  0  1  1
d  1  1  1  0  1
e  0  2  1  1  0

I want to get how many times e appears with d using the above matrix (i.e. 1) and divide it by the total count of co-occurrences (i.e. 9 --> since the matix is symetric I only considered the upper part of the matrix to get the total sum).

So my output should be;

for co-occurrence count of e and d is 1.

enter image description here

and co-occurrence count of all should be 9 as follows (since the matrix is symetric).

enter image description here

I would like to know if it is possible do it in pandas.

I am happy to provide more details if needed.

2 Answers 2

1

Will this work for you?

a=df.loc['e','b']
b=df.values.sum()/2
print((a/b))

inside the loc method, First value is row & the second value is column. you can change it as needed.

Sign up to request clarification or add additional context in comments.

Comments

1

I believe you need divide by sum of all values only for upper matrix, so divide 2:

v = v / (v.values.sum() / 2)
print(v)
          a         b         c         d         e
a  0.000000  0.111111  0.000000  0.111111  0.000000
b  0.111111  0.000000  0.111111  0.111111  0.222222
c  0.000000  0.111111  0.000000  0.111111  0.111111
d  0.111111  0.111111  0.111111  0.000000  0.111111
e  0.000000  0.222222  0.111111  0.111111  0.000000

For one value:

print(v.loc['d','e'] / (v.values.sum() / 2))
0.1111111111111111

If need assign back ony one value:

v.loc['d','e'] = v.loc['d','e'] /v.values.sum() / 2
print(v)

   a  b  c  d         e
a  0  1  0  1  0.000000
b  1  0  1  1  2.000000
c  0  1  0  1  1.000000
d  1  1  1  0  0.111111
e  0  2  1  1  0.000000

6 Comments

thanks a lot for the answer. I actually want to do it by specifying the column and row name. i.e. I give e and d and I get its co-occurrence count as 1. next I get the total co-ocuurrence count seperately from the matrix (i.e. 9) and later I divide it (i.e. 1/9 = 0.111111111). Is there a way to do this in pandas? :)
@EmJ - So I think now understand, need scalar input? Answer was edited.
@jezrael I think the out put expected is just one value. the porblem is to find the value in a particular location (for example intersection of row 'd' & column 'e' is 1, intersection of row 'e' & column 'b' is 2) and then divide this number by half the sum of the whole dataframe. In this sum of the whole dataframe is 18, so, half of it is 9. I have provided a solution, may be you can provide a even better one.
ok! got it. saw the update only now. But the answer seems wrong. 1/9 should be 0.1111, not 0.027777777777777776. I double checked with the calculator on my computer.
@mohanys - You are right, missing () for me. Now working nice.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.