I have a Pandas DataFrame on which I would like to do some manipulations. First I sort my dataframe on the entropy using this code:
entropy_dataframe.sort_values(by='entropy',inplace=True,ascending=False)
This gives me the following dataframe (<class 'pandas.core.frame.DataFrame'>):
entropy identifier
486 1.000000 3.955030e+09
584 1.000000 8.526030e+09
397 1.000000 5.623020e+09
819 0.999700 1.678030e+09
.. ... ...
179 0.000000 3.724020e+09
766 0.000000 6.163020e+09
770 0.000000 6.163020e+09
462 0.000000 7.005020e+09
135 0.000000 3.069001e+09
Now I would like to select the 10 largest identifiers and return a list with the corresponding 10 identifiers (as integers). I have tried selecting the top 10 identifiers by either using:
entropy_top10 = entropy_dataframe.head(10)['identifier']
And:
entropy_top10 = entropy_dataframe[:10]
entropy_top10 = entropy_top10['identifier']
Which both give the following result (<class 'pandas.core.series.Series'>):
397 2.623020e+09
823 8.678030e+09
584 2.526030e+09
486 7.955030e+09
396 2.623020e+09
555 9.768020e+09
492 7.955030e+09
850 9.606020e+09
159 2.785020e+09
745 4.609030e+09
Name: identifier, dtype: float64
Even though both work, the pain starts after this operation as I now would like to change this Pandas Series with dtype float64 to a list of integers.
I have tried the following:
entropy_top10= np.array(entropy_top10,dtype=pd.Series)
entropy_top10= entropy_top10.astype(np.int64)
entropy_top10= entropy_top10.tolist()
Which results in (<type 'list'>):
[7955032207L, 8613030044L, 2623057011L, 2526030291L, 7951030016L, 2623020357L, 9768028572L, 9606023013L, 2785021210L, 9768023351L]
Which is a list of longs (while I'm looking for integers).
Anyone that can help me out here? Thanks in advance!
--- EDIT ---
The problem lies 'here'. When I remove entropy_top10= entropy_top10.tolist(), it results in a <type 'numpy.ndarray'> with elements of dtype numpy.int64. When I add the code again, I get a <type 'list'> with elements long.
sys.maxintI get 2147483647. I'm fairly sure that all identifiers have a maximum of 10 characters. If I trypython -Vin my command line it gives me Python 2.7.11 :: Anaconda 4.0.0 (64-bit).sys.maxint, you run 32bit python. And even if numbers in your list have a maximum length of 10 digits, they may be larger than yourmaxint. Already the first value in your list7955032207does not fit into a 32bit integer. Thus, you will have to uselong, as python already did.int64datatype. In this specific list Python is able to store these identifiers (also values larger than thesys.maxint) asintegerinstead oflong. Any idea why it is possible there?entropy_top10= entropy_top10.tolist()from the method in my original question I get annumpy.ndarraywhich does contain elements from the datatypenumpy.int64. Hence, when performing transforming thisnumpy.ndarrayinto alist, the elements are also transformed from annumpy.int64to along. Is it still clear or should I adjust my original question?