1

I have a dataframe in pandas that looks like this:

   100  200  300  400
0    1    1    0    1
1    1    1    1    0

What I want to do is select specific columns from this data frame. But when I try the following code (the df_matrix being the dataframe displayed at the top) :

intermediary_df = df_matrix["100"]

It does not work and from what I can tell is because it is an integer. I tried to force it with str(100) but gave the same error as before:

File "pandas\_libs\hashtable_class_helper.pxi", line 958, in pandas._libs.hashtable.Int64HashTable.get_item
TypeError: an integer is required

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "A:\python project\venv\lib\site-packages\pandas\core\indexes\base.py", line 3078, in get_loc
    return self._engine.get_loc(key)
  File "pandas\_libs\index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index.pyx", line 164, in pandas._libs.index.IndexEngine.get_loc
KeyError: '100'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "pandas\_libs\index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 958, in pandas._libs.hashtable.Int64HashTable.get_item
TypeError: an integer is required

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "A:/python project/testing/testing4.py", line 42, in <module>
    intermediary_df = df_matrix["100"]
  File "A:\python project\venv\lib\site-packages\pandas\core\frame.py", line 2688, in __getitem__
    return self._getitem_column(key)
  File "A:\python project\venv\lib\site-packages\pandas\core\frame.py", line 2695, in _getitem_column
    return self._get_item_cache(key)
  File "A:\python project\venv\lib\site-packages\pandas\core\generic.py", line 2489, in _get_item_cache
    values = self._data.get(item)
  File "A:\python project\venv\lib\site-packages\pandas\core\internals.py", line 4115, in get
    loc = self.items.get_loc(item)
  File "A:\python project\venv\lib\site-packages\pandas\core\indexes\base.py", line 3080, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas\_libs\index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index.pyx", line 164, in pandas._libs.index.IndexEngine.get_loc
KeyError: '100'

Does anyone know how to get around this? Thanks!

EDIT 1:

After trying to use intermediary_df = df_matrix[100] it worked as expecte. Btw, if someone else is facing this problem and wants to select multiple columns at the same time, you can use:

intermediary_df = df_matrix[[100, 300]]

and the output will be:

   100  300
0    1    0
1    1    1
6
  • 1
    Print your columns df_matrix.columns, check it is int or str Commented Jan 23, 2019 at 18:38
  • 1
    Try intermediary_df = df_matrix[100] ? Commented Jan 23, 2019 at 18:38
  • @W-B it seems to be integers. Output from console Int64Index([100, 200, 300, 400], dtype='int64') Commented Jan 23, 2019 at 18:39
  • 1
    So try , df_matrix[100] Commented Jan 23, 2019 at 18:39
  • 1
    @harvpan it worked now. Thank you! Commented Jan 23, 2019 at 18:40

2 Answers 2

1

Simply use below as in this case as your columns are int .

intermediary_df = df_matrix[100]`

If you want your columns to be accessed as str, Use:

df.columns = [str(x) for x in df.columns]

and then

df['100']

Output

0    1
1    1
Name: 100, dtype: int64
Sign up to request clarification or add additional context in comments.

3 Comments

do you know how I can apply filters on columns with integer headers? I have tried df_matrix[df_matrix.100 == 1] , but that results into an error...
Just use df_matrix.loc[df_matrix[100] == 1]
No problem. @Adrian. Good luck!
1

I think your column type is an integer, but if it's not try this using DataFrame.loc

Example:

intermediary_df = df_matrix.loc[:,100]

or

intermediary_df = df_matrix.iloc[:,0]

1 Comment

The code did not work but using intermediary_df = df_matrix[100] makes it work

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.