4

Is this a correct way of creating DataFrame for tuples? (assume that the tuples are created inside code fragment)

import pandas as pd
import numpy as np
import random

row = ['a','b','c']
col = ['A','B','C','D']

# use numpy for creating a ZEROS matrix
st = np.zeros((len(row),len(col))) 
df2 = pd.DataFrame(st, index=row, columns=col)

# CONVERT each cell to an OBJECT for inserting tuples
for c in col:
    df2[c] = df2[c].astype(object)

print df2

for i in row:
    for j in col:
        df2.set_value(i, j, (i+j, np.round(random.uniform(0, 1), 4)))

print df2

As you can see I first created a zeros(3,4) in numpy and then made each cell an OBJECT type in Pandas so I can insert tuples. Is this correct way to do or there is a better solution to ADD/RETRIVE tuples to matrices?

Results are fine:

   A  B  C  D
a  0  0  0  0
b  0  0  0  0
c  0  0  0  0


          A             B             C             D
 a  (aA, 0.7134)   (aB, 0.006)  (aC, 0.1948)  (aD, 0.2158)
 b  (bA, 0.2937)  (bB, 0.8083)  (bC, 0.3597)   (bD, 0.324)
 c  (cA, 0.9534)  (cB, 0.9666)  (cC, 0.7489)  (cD, 0.8599)
12
  • DataFrames are really designed to store a scalar value within each cell. Why do you want to store tuples? Commented May 7, 2016 at 18:38
  • I am designing an HMM/Viterbi class so I have to store the probability and the previous state that created that probability so later I can retrieve the best backward path. Commented May 7, 2016 at 18:43
  • Why not store these in separate columns? Commented May 7, 2016 at 18:45
  • could you elaborate a little bit more on your questions? for example how can i retrieve a content at cross section of b-C with your idea? {right now i can set/get ('bC', 0.36) } Commented May 7, 2016 at 18:50
  • 1
    Is the first value in the tuple always "equal" to the cell's row index plus the column index? Commented May 7, 2016 at 19:00

1 Answer 1

10

First, to answer your literal question: You can construct DataFrames from a list of lists. The values in the list of lists can themselves be tuples:

import numpy as np
import pandas as pd
np.random.seed(2016)

row = ['a','b','c']
col = ['A','B','C','D']

data = [[(i+j, round(np.random.uniform(0, 1), 4)) for j in col] for i in row]
df = pd.DataFrame(data, index=row, columns=col)
print(df)

yields

              A             B             C             D
a  (aA, 0.8967)  (aB, 0.7302)  (aC, 0.7833)  (aD, 0.7417)
b  (bA, 0.4621)  (bB, 0.6426)  (bC, 0.2249)  (bD, 0.7085)
c  (cA, 0.7471)  (cB, 0.6251)    (cC, 0.58)  (cD, 0.2426)

Having said that, beware that storing tuples in DataFrames dooms you to Python-speed loops. To take advantage of fast Pandas/NumPy routines, you need to use native NumPy dtypes such as np.float64 (whereas, in contrast, tuples require "object" dtype).

So perhaps a better solution for your purpose is to use two separate DataFrames, one for the strings and one for the numbers:

import numpy as np
import pandas as pd
np.random.seed(2016)

row=['a','b','c']
col=['A','B','C','D']

prevstate = pd.DataFrame([[i+j for j in col] for i in row], index=row, columns=col)
prob = pd.DataFrame(np.random.uniform(0, 1, size=(len(row), len(col))).round(4), 
                    index=row, columns=col)
print(prevstate)
#     A   B   C   D
# a  aA  aB  aC  aD
# b  bA  bB  bC  bD
# c  cA  cB  cC  cD

print(prob)
#         A       B       C       D
# a  0.8967  0.7302  0.7833  0.7417
# b  0.4621  0.6426  0.2249  0.7085
# c  0.7471  0.6251  0.5800  0.2426

To loop through the columns, find the row with maximum probability and retrieve the corresponding prevstate, you could use .idxmax and .loc:

for col in prob.columns:
    idx = (prob[col].idxmax())
    print('{}: {}'.format(prevstate.loc[idx, col], prob.loc[idx, col]))

yields

aA: 0.8967
aB: 0.7302
aC: 0.7833
aD: 0.7417
Sign up to request clarification or add additional context in comments.

1 Comment

َA very neat and deep understanding. tnx

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.