1

I have a data frame similar to this:

       0    1   2   3           4   5
0   1001    1   176 REMAINING   US  SOUTH
1   1002    1   176 REMAINING   US  SOUTH

What I would like to do is to combine columns 3,4, and 5 to create on column that has all of the data in columns 3,4, and 5.

Desired output:

       0    1   2   3           
0   1001    1   176 REMAINING US SOUTH
1   1002    1   176 REMAINING US SOUTH

I've already tried

hbadef['6'] = hbadef[['3', '4', '5']].apply(lambda x: ''.join(x), axis=1)

and that didn't work out.

Here is the stacktrace when I implement

 hbadef['3'] = hbadef['3'] + ' ' +  hbadef['4'] + ' ' + hbadef['5']

Stacktrace:

TypeError                                 Traceback (most recent call last)
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

TypeError: an integer is required

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2524             try:
-> 2525                 return self._engine.get_loc(key)
   2526             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

KeyError: '3'

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

TypeError: an integer is required

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-62-2da6c35d6e89> in <module>()
----> 1 hbadef['3'] = hbadef['3'] + ' ' +  hbadef['4'] + ' ' + hbadef['5']
      2 # hbadef.drop(['4', '5'], axis=1)
      3 # hbadef.columns = ['MKTcode', 'Region']
      4 
      5 # pd.concat(

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   2137             return self._getitem_multilevel(key)
   2138         else:
-> 2139             return self._getitem_column(key)
   2140 
   2141     def _getitem_column(self, key):

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py in _getitem_column(self, key)
   2144         # get column
   2145         if self.columns.is_unique:
-> 2146             return self._get_item_cache(key)
   2147 
   2148         # duplicate columns & possible reduce dimensionality

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py in _get_item_cache(self, item)
   1840         res = cache.get(item)
   1841         if res is None:
-> 1842             values = self._data.get(item)
   1843             res = self._box_item_values(item, values)
   1844             cache[item] = res

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals.py in get(self, item, fastpath)
   3841 
   3842             if not isna(item):
-> 3843                 loc = self.items.get_loc(item)
   3844             else:
   3845                 indexer = np.arange(len(self.items))[isna(self.items)]

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2525                 return self._engine.get_loc(key)
   2526             except KeyError:
-> 2527                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2528 
   2529         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

KeyError: '3'

I've tried removing the NaN values, but I get a similar result. I am perplexed as to why such a simple function is not working properly.

I'll be accepting an answer so that we can sorta "close" this question. Both of the answers are acceptable and solve the problem, the problem that I'm running into is likely an application error that I will have to solve independently from this question.

3
  • Thank you! I have tried both answers, but unfortunately I'm still getting errors. I'm currently troubleshooting it, but it's slow going as I am new to Pandas and I don't particularly understand what the error message is trying to tell me. Man, I wish I were as experienced as you and Ami. So I suppose that both of your answers probably work and that the error is on my end, if that's so then should I just mark both as correct? Commented May 2, 2018 at 18:27
  • You can only accept one, so accept the one that's most performant/cleanest/smells nicest... if you STILL can't decide, flip a coin ;) Commented May 2, 2018 at 18:28
  • @CharlesD Maybe try df.columns = [str(c) for c in df.columns] and then continue? Also, consider accepting COLDSPEED's answer. Commented May 2, 2018 at 19:53

2 Answers 2

1

Use concat + agg

pd.concat(
    [df.iloc[:, :3], df.iloc[:, 3:].agg(' '.join, axis=1)], 
    axis=1, 
    ignore_index=True
)

      0  1    2                   3
0  1001  1  176  REMAINING US SOUTH
1  1002  1  176  REMAINING US SOUTH
Sign up to request clarification or add additional context in comments.

2 Comments

I really like the use of ' '.join as a function.
@AmiTavory I value good sportsmanship more than anything else, I've returned the favour since your answer most certainly addresses the question as well. Good luck :)
1

You can simply add

hbadef['3'] += ' ' +  hbadef['4'] + ' ' + hbadef['5']

then drop the unneeded columns

hbadef.drop(['4', '5'], axis=1, inplace=True)
>>> hbadef
    0   1   2   3
0   1001    1   176 REMAINING US SOUTH
1   1002    1   176 REMAINING US SOUTH

Note: If your columns are integer, then use instead

hbadef.loc[:, 3] += ' ' + hbadef.loc[:, 4] + ' ' + hbadef.loc[:, 5]
hbadef.drop([4, 5], axis=1, inplace=True)

8 Comments

Thanks for your speedy response! When I try to implement this solution I get multiple type and key errors. It says that an integer is required, so I suppose the header is a character?
Can you paste ‘headed.columns’?
Here is the output when I use list(df) [0,1,2,3,4,5]
@CharlesD I updated the answer. LMK if this doesn't help.
Perhaps it's also due to the fact that there are a few NaN values in the columns 4 and 5.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.