1

I have a following dataframe with the following columns

>>print(df.columns)

Index(['iteration0', 'iteration1', 'iteration2', 'iteration3', 'iteration4',
   'iteration5', 'iteration6', 'iteration7', 'iteration8', 'iteration9',
   'iteration10', 'iteration11', 'iteration12', 'iteration13',
   'iteration14', 'iteration15', 'iteration16', 'iteration17',
   'iteration18', 'iteration19', 'iteration20', 'iteration21',
   'iteration22', 'iteration23', 'iteration24', 'iteration25',
   'iteration26', 'iteration27', 'iteration28', 'iteration29',
   'iteration30', 'iteration31', 'iteration32', 'iteration33',
   'iteration34', 'iteration35', 'iteration36', 'iteration37',
   'iteration38', 'iteration39', 'iteration40', 'iteration41',
   'iteration42', 'iteration43', 'iteration44', 'iteration45',
   'iteration46', 'iteration47', 'iteration48', 'iteration49',
   'iteration50', 'iteration51', 'iteration52', 'iteration53',
   'iteration54', 'iteration55', 'iteration56', 'iteration57',
   'iteration58', 'iteration59', 'iteration60', 'iteration61',
   'iteration62', 'iteration63', 'iteration64', 'iteration65',
   'iteration66', 'iteration67', 'iteration68', 'iteration69',
   'iteration70', 'iteration71', 'iteration72', 'iteration73',
   'iteration74', 'iteration75', 'iteration76', 'iteration77',
   'iteration78', 'iteration79', 'iteration80', 'iteration81',
   'iteration82', 'iteration83', 'iteration84', 'iteration85',
   'iteration86', 'iteration87', 'iteration88', 'iteration89',
   'iteration90', 'iteration91', 'iteration92', 'iteration93',
   'iteration94', 'iteration95', 'iteration96', 'iteration97',
   'iteration98', 'iteration99'],
  dtype='object')

I also have an index for each line of the Dataframe, which is the date

print(df.index)
Index(['05/12/2009', '05/13/2009', '05/14/2009', '05/15/2009', '05/18/2009',
   '05/19/2009', '05/20/2009', '05/21/2009', '05/22/2009', '05/25/2009',
   ...
   '10/23/2009', '10/26/2009', '10/27/2009', '10/28/2009', '10/29/2009',
   '10/30/2009', '11/02/2009', '11/03/2009', '11/04/2009', '11/05/2009'],
  dtype='object', name='Date', length=127)

Therefore, I have a dataFrame with 127 lines and 100 columns. Each value in this dataset assumes 0, 1 or 2.

What I want to do is simply getting the mode of each line, getting the most frequent value of each Date. Here is what I did:

most_frequent=df.mode(axis=1)

Then, I will return a new dataframe, containing the mode of each line

local_df['ensemble'] = most_frequent 

But when I run the code, here is my error:

 File "/usr/local/lib/python3.5/dist-packages/pandas/core/frame.py", line 3370, in __setitem__
    self._set_item(key, value)
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/frame.py", line 3446, in _set_item
    NDFrame._set_item(self, key, value)
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/generic.py", line 3172, in _set_item
    self._data.set(key, value)
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/internals/managers.py", line 1056, in set
    self.insert(len(self.items), item, value)
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/internals/managers.py", line 1158, in insert
    placement=slice(loc, loc + 1))
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/internals/blocks.py", line 3095, in make_block
    return klass(values, ndim=ndim, placement=placement)
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/internals/blocks.py", line 87, in __init__
    '{mgr}'.format(val=len(self.values), mgr=len(self.mgr_locs)))
ValueError: Wrong number of items passed 2, placement implies 1

By printing the most_frequent dataFrame, I have the very weird behavior

09/25/2009  0.0 NaN
09/28/2009  0.0 NaN
09/29/2009  0.0 NaN
09/30/2009  1.0 NaN
10/01/2009  0.0 NaN
10/02/2009  0.0 NaN
10/05/2009  0.0 NaN
10/06/2009  1.0 NaN
10/07/2009  0.0 NaN
10/08/2009  0.0 NaN
10/09/2009  0.0 NaN
10/12/2009  0.0 NaN
10/13/2009  1.0 NaN
10/14/2009  0.0 NaN
10/15/2009  0.0 NaN
10/16/2009  0.0 NaN
10/19/2009  0.0 NaN
10/20/2009  0.0 NaN
10/21/2009  0.0 NaN
10/22/2009  0.0 NaN
10/23/2009  0.0 NaN
10/26/2009  0.0 NaN
10/27/2009  0.0 NaN

In other words, there is a new column as result.

I dont know if its what caused the problem. Anyway, what was my mistake here?

1 Answer 1

2

There is no mistake, mode method return sometimes more like 1 value, here per row.

So try select first column by position with DataFrame.iloc:

local_df['ensemble'] = df.mode(axis=1).iloc[:, 0]
Sign up to request clarification or add additional context in comments.

3 Comments

I want a row-wise mode, why selecting the first column position?
@mad - I think because df.mode(axis=1) return DataFrame with same rows like original and more like 1 column, from your print there are 2 columns
Thank you so much. I will test your solution.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.