How to combine/merge Columns within the same Dataframe in Pandas?

Question

I have a data frame similar to this:

       0    1   2   3           4   5
0   1001    1   176 REMAINING   US  SOUTH
1   1002    1   176 REMAINING   US  SOUTH

What I would like to do is to combine columns 3,4, and 5 to create on column that has all of the data in columns 3,4, and 5.

Desired output:

       0    1   2   3           
0   1001    1   176 REMAINING US SOUTH
1   1002    1   176 REMAINING US SOUTH

I've already tried

hbadef['6'] = hbadef[['3', '4', '5']].apply(lambda x: ''.join(x), axis=1)

and that didn't work out.

Here is the stacktrace when I implement

 hbadef['3'] = hbadef['3'] + ' ' +  hbadef['4'] + ' ' + hbadef['5']

Stacktrace:

TypeError                                 Traceback (most recent call last)
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

TypeError: an integer is required

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2524             try:
-> 2525                 return self._engine.get_loc(key)
   2526             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

KeyError: '3'

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

TypeError: an integer is required

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-62-2da6c35d6e89> in <module>()
----> 1 hbadef['3'] = hbadef['3'] + ' ' +  hbadef['4'] + ' ' + hbadef['5']
      2 # hbadef.drop(['4', '5'], axis=1)
      3 # hbadef.columns = ['MKTcode', 'Region']
      4 
      5 # pd.concat(

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   2137             return self._getitem_multilevel(key)
   2138         else:
-> 2139             return self._getitem_column(key)
   2140 
   2141     def _getitem_column(self, key):

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py in _getitem_column(self, key)
   2144         # get column
   2145         if self.columns.is_unique:
-> 2146             return self._get_item_cache(key)
   2147 
   2148         # duplicate columns & possible reduce dimensionality

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py in _get_item_cache(self, item)
   1840         res = cache.get(item)
   1841         if res is None:
-> 1842             values = self._data.get(item)
   1843             res = self._box_item_values(item, values)
   1844             cache[item] = res

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals.py in get(self, item, fastpath)
   3841 
   3842             if not isna(item):
-> 3843                 loc = self.items.get_loc(item)
   3844             else:
   3845                 indexer = np.arange(len(self.items))[isna(self.items)]

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2525                 return self._engine.get_loc(key)
   2526             except KeyError:
-> 2527                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2528 
   2529         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

KeyError: '3'

I've tried removing the NaN values, but I get a similar result. I am perplexed as to why such a simple function is not working properly.

I'll be accepting an answer so that we can sorta "close" this question. Both of the answers are acceptable and solve the problem, the problem that I'm running into is likely an application error that I will have to solve independently from this question.

Thank you! I have tried both answers, but unfortunately I'm still getting errors. I'm currently troubleshooting it, but it's slow going as I am new to Pandas and I don't particularly understand what the error message is trying to tell me. Man, I wish I were as experienced as you and Ami. So I suppose that both of your answers probably work and that the error is on my end, if that's so then should I just mark both as correct? — CharlesD
– CharlesD, Commented May 2, 2018 at 18:27
You can only accept one, so accept the one that's most performant/cleanest/smells nicest... if you STILL can't decide, flip a coin ;) — cs95
– cs95, Commented May 2, 2018 at 18:28
@CharlesD Maybe try df.columns = [str(c) for c in df.columns] and then continue? Also, consider accepting COLDSPEED's answer. — Ami Tavory
– Ami Tavory, Commented May 2, 2018 at 19:53

cs95 · Accepted Answer · 2018-05-02 16:14:06Z

1

Use concat + agg

pd.concat(
    [df.iloc[:, :3], df.iloc[:, 3:].agg(' '.join, axis=1)], 
    axis=1, 
    ignore_index=True
)

      0  1    2                   3
0  1001  1  176  REMAINING US SOUTH
1  1002  1  176  REMAINING US SOUTH

answered May 2, 2018 at 16:14

cs95

406k106 gold badges744 silver badges797 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Ami Tavory Over a year ago

I really like the use of ' '.join as a function.

cs95 Over a year ago

@AmiTavory I value good sportsmanship more than anything else, I've returned the favour since your answer most certainly addresses the question as well. Good luck :)

Ami Tavory · Accepted Answer · 2018-05-02 17:57:34Z

1

You can simply add

hbadef['3'] += ' ' +  hbadef['4'] + ' ' + hbadef['5']

then drop the unneeded columns

hbadef.drop(['4', '5'], axis=1, inplace=True)
>>> hbadef
    0   1   2   3
0   1001    1   176 REMAINING US SOUTH
1   1002    1   176 REMAINING US SOUTH

Note: If your columns are integer, then use instead

hbadef.loc[:, 3] += ' ' + hbadef.loc[:, 4] + ' ' + hbadef.loc[:, 5]
hbadef.drop([4, 5], axis=1, inplace=True)

edited May 2, 2018 at 17:57

answered May 2, 2018 at 16:13

Ami Tavory

76.7k13 gold badges152 silver badges196 bronze badges

8 Comments

CharlesD Over a year ago

Thanks for your speedy response! When I try to implement this solution I get multiple type and key errors. It says that an integer is required, so I suppose the header is a character?

Ami Tavory Over a year ago

Can you paste ‘headed.columns’?

CharlesD Over a year ago

Here is the output when I use list(df) [0,1,2,3,4,5]

Ami Tavory Over a year ago

@CharlesD I updated the answer. LMK if this doesn't help.

CharlesD Over a year ago

Perhaps it's also due to the fact that there are a few NaN values in the columns 4 and 5.

|

Collectives™ on Stack Overflow

How to combine/merge Columns within the same Dataframe in Pandas?

2 Answers 2

2 Comments

8 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

8 Comments

Your Answer

Sign up or log in

Post as a guest

Related