I'm trying to come up with a DataFrame to do some data analysis and would really benefit from having a data frame that can handle regular indexing and MultiIndexing together in one data frame.
For each patient, I have 6 slices of various types of data (T1avg, T2avg, etc...). Let's call this dataframe1 (from an ipython notebook):
import pandas
dat0 = numpy.zeros([6])
dat1 = numpy.zeros([6])
pat0=(['NecS3Hs05']*6)
pat1=(['NecS3Hs06']*6)
slc = (['Slice ' + str(x) for x in xrange(dat0.shape[-1])])
ind = zip(*[pat0+pat1,slc+slc])
named_ind = pandas.MultiIndex.from_tuples(ind, names = ['Patients','Slices'])
ser = pandas.Series(numpy.append(dat0,dat1),index = named_ind)
df = pandas.DataFrame(data=ser, columns=['T1avg'])
Image of output: df1
I also have, for each patient, various strings of information (tumour type, number of imaging sessions, treatment type):
pats = ['NecS3Hs05','NecS3Hs05']
tx = ['Control','Treated']
Ttype = ['subcutaneous','orthotopic']
NSessions = ['2','3']
cols = ['Tx Group', 'Tumour Type', 'Imaging Sessions']
dat = numpy.array([tx,Ttype,NSessions]).T
df2 = pandas.DataFrame(dat, index=pats,columns=cols)
[I'd like to post a picture here as well, but I need at least 10 reputation to do so]
Ideally, I want to have a dataframe that looks as follows (sketched it out in an image editor sorry)
Image of desired output: df-desired
But when I try to use the append command,
com = df.append(df2)
I get something undesired, the MultiIndex that I set up in df is now gone, replaced with a simple index of type tuples ('NecS3Hs05, Slice 0' etc...). The indices from df2 remain the same 'NecS3Hs05'.
Is this possible to do with PANDAS, or am I barking up the wrong tree here? Also, is this even a recommended way of storing Patient attributes in a dataframe (i.e. is this unpandas)? I think what I would really like is to keep everything a simple index, but instead store N-d arrays inside the elements of the data frame.
For instance, if I try something like:
com['NecS3Hs05','T1avg']
I want to get an array/tuple of shape/len 6
and when I try to get the tumour type:
com['NecS3Hs05','Tumour Type']
I get the string 'subcutaneous'. Obviously I also want to retain the cool features of data frames as well, it looks like PANDAS is the right way to go here, I just need to understand a bit more about how to set up my dataframe
I hope this is a sensible question, if not, I'd be happy to re-form it.