I have a data frame similar to this:
0 1 2 3 4 5
0 1001 1 176 REMAINING US SOUTH
1 1002 1 176 REMAINING US SOUTH
What I would like to do is to combine columns 3,4, and 5 to create on column that has all of the data in columns 3,4, and 5.
Desired output:
0 1 2 3
0 1001 1 176 REMAINING US SOUTH
1 1002 1 176 REMAINING US SOUTH
I've already tried
hbadef['6'] = hbadef[['3', '4', '5']].apply(lambda x: ''.join(x), axis=1)
and that didn't work out.
Here is the stacktrace when I implement
hbadef['3'] = hbadef['3'] + ' ' + hbadef['4'] + ' ' + hbadef['5']
Stacktrace:
TypeError Traceback (most recent call last)
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
TypeError: an integer is required
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2524 try:
-> 2525 return self._engine.get_loc(key)
2526 except KeyError:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
KeyError: '3'
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
TypeError: an integer is required
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-62-2da6c35d6e89> in <module>()
----> 1 hbadef['3'] = hbadef['3'] + ' ' + hbadef['4'] + ' ' + hbadef['5']
2 # hbadef.drop(['4', '5'], axis=1)
3 # hbadef.columns = ['MKTcode', 'Region']
4
5 # pd.concat(
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
2137 return self._getitem_multilevel(key)
2138 else:
-> 2139 return self._getitem_column(key)
2140
2141 def _getitem_column(self, key):
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py in _getitem_column(self, key)
2144 # get column
2145 if self.columns.is_unique:
-> 2146 return self._get_item_cache(key)
2147
2148 # duplicate columns & possible reduce dimensionality
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py in _get_item_cache(self, item)
1840 res = cache.get(item)
1841 if res is None:
-> 1842 values = self._data.get(item)
1843 res = self._box_item_values(item, values)
1844 cache[item] = res
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals.py in get(self, item, fastpath)
3841
3842 if not isna(item):
-> 3843 loc = self.items.get_loc(item)
3844 else:
3845 indexer = np.arange(len(self.items))[isna(self.items)]
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2525 return self._engine.get_loc(key)
2526 except KeyError:
-> 2527 return self._engine.get_loc(self._maybe_cast_indexer(key))
2528
2529 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
KeyError: '3'
I've tried removing the NaN values, but I get a similar result. I am perplexed as to why such a simple function is not working properly.
I'll be accepting an answer so that we can sorta "close" this question. Both of the answers are acceptable and solve the problem, the problem that I'm running into is likely an application error that I will have to solve independently from this question.
df.columns = [str(c) for c in df.columns]and then continue? Also, consider accepting COLDSPEED's answer.