0

I have a Pandas DataFrame on which I would like to do some manipulations. First I sort my dataframe on the entropy using this code:

entropy_dataframe.sort_values(by='entropy',inplace=True,ascending=False)

This gives me the following dataframe (<class 'pandas.core.frame.DataFrame'>):

      entropy    identifier
486  1.000000  3.955030e+09
584  1.000000  8.526030e+09
397  1.000000  5.623020e+09
819  0.999700  1.678030e+09
..        ...           ...
179  0.000000  3.724020e+09
766  0.000000  6.163020e+09
770  0.000000  6.163020e+09
462  0.000000  7.005020e+09
135  0.000000  3.069001e+09

Now I would like to select the 10 largest identifiers and return a list with the corresponding 10 identifiers (as integers). I have tried selecting the top 10 identifiers by either using:

entropy_top10 = entropy_dataframe.head(10)['identifier']

And:

entropy_top10 = entropy_dataframe[:10]
entropy_top10 = entropy_top10['identifier']

Which both give the following result (<class 'pandas.core.series.Series'>):

397    2.623020e+09
823    8.678030e+09
584    2.526030e+09
486    7.955030e+09
396    2.623020e+09
555    9.768020e+09
492    7.955030e+09
850    9.606020e+09
159    2.785020e+09
745    4.609030e+09
Name: identifier, dtype: float64

Even though both work, the pain starts after this operation as I now would like to change this Pandas Series with dtype float64 to a list of integers.

I have tried the following:

entropy_top10= np.array(entropy_top10,dtype=pd.Series)
entropy_top10= entropy_top10.astype(np.int64)
entropy_top10= entropy_top10.tolist()

Which results in (<type 'list'>):

[7955032207L, 8613030044L, 2623057011L, 2526030291L, 7951030016L, 2623020357L, 9768028572L, 9606023013L, 2785021210L, 9768023351L]

Which is a list of longs (while I'm looking for integers).

Anyone that can help me out here? Thanks in advance!

--- EDIT ---

The problem lies 'here'. When I remove entropy_top10= entropy_top10.tolist(), it results in a <type 'numpy.ndarray'> with elements of dtype numpy.int64. When I add the code again, I get a <type 'list'> with elements long.

5
  • 1
    What version of python are you using? Are you sure, that regular integers would be large enough to hold the values? See stackoverflow.com/questions/7604966/… -- in 32bit python, the maximum integer value should be 2147483647 Commented Jul 12, 2016 at 9:08
  • If I do sys.maxint I get 2147483647. I'm fairly sure that all identifiers have a maximum of 10 characters. If I try python -V in my command line it gives me Python 2.7.11 :: Anaconda 4.0.0 (64-bit). Commented Jul 12, 2016 at 9:22
  • According to your sys.maxint, you run 32bit python. And even if numbers in your list have a maximum length of 10 digits, they may be larger than your maxint. Already the first value in your list 7955032207 does not fit into a 32bit integer. Thus, you will have to use long, as python already did. Commented Jul 12, 2016 at 9:33
  • Okay, that makes sense. One remark though. In another part of my code I also have a list consisting of values which have numpy's int64 datatype. In this specific list Python is able to store these identifiers (also values larger than the sys.maxint) as integer instead of long. Any idea why it is possible there? Commented Jul 12, 2016 at 9:40
  • In fact, when I remove entropy_top10= entropy_top10.tolist() from the method in my original question I get an numpy.ndarray which does contain elements from the datatype numpy.int64. Hence, when performing transforming this numpy.ndarray into a list, the elements are also transformed from an numpy.int64 to a long. Is it still clear or should I adjust my original question? Commented Jul 12, 2016 at 9:48

1 Answer 1

2

Since users may not skim through all of the comments on your original question, I'll condense our results into a single answer.

  • According to sys.maxint, a 32bit version of python is running. Since some list elements are larger than maxint (2**31 - 1), the elements are stored as long values

  • The transformation entropy_top10.astype(np.int64) creates a numpy.ndarray of 64bit integers in numpy's own data type. numpy ships a 64bit integer data type even for 32bit python (which is no python native type whatsoever!).

  • The transformation entropy_top10.tolist() converts the numpy data type back to python's native data type. Since you are running 32bit, the int64 can only be convertet to long type

  • For a 64bit python version, the tolist() transformation would most likely result in python native integer types, because the values would fit into the regular integer at 64bit (2**63 - 1)

The reason for your list containing long items is the translation between numpy datatypes and native datatypes of your installed python version. Independent from the actual python version that is used to run code, numpy is consistent in its own datatypes.

Edit

To make the difference between the list's type and the items' types clearer, see this code example:

a = np.array([3123123123, 1512451234], dtype=np.int64)
print('ALL NUMPY')
print('  List items', a)
print('  List type', type(a))
print('  Item type', type(a[0]))

l = a.tolist()
print('ALL PYTHON NATIVE')
print('  List items', l)
print('  List type', type(l))
print('  Item type', type(l[0]))

c = [i for i in a]
print('NATIVE LIST, NUMPY TYPE')
print('  List items', c)
print('  List type', type(c))
print('  Item type', type(c[0]))

It gives the following output:

ALL NUMPY
  List items [3123123123 1512451234]
  List type <type 'numpy.ndarray'>
  Item type <type 'numpy.int64'>
ALL PYTHON NATIVE
  List items [3123123123L, 1512451234L]
  List type <type 'list'>
  Item type <type 'long'>
NATIVE LIST, NUMPY TYPE
  List items [3123123123, 1512451234]
  List type <type 'list'>
  Item type <type 'numpy.int64'>

From this output, we can learn, that numpy's tolist() function does not only convert the list from numpy.ndarray to list but also transforms all items' types from numpy.int64 to long. Manually transforming the array into a native list (using a comprehension here) yields a python native list with elements of type numpy.int64.

Sign up to request clarification or add additional context in comments.

2 Comments

In some other parts of my code I was able to produce lists with integers from the same identifiers. See for example the following: [3368030009, 6191090062, 8486030004, 7859030003, 4562030005, 8343090057, 2959090000, 7155090021, 9615030065, 6513030004] Type of object: <type 'list'> Type of elements of list: <type 'numpy.int64'>
The difference is that the elements of your list are numpy.int64, which is the type for 64bit integers that numpy ships. These are no native python integers, since 32bit python does not have 64 bit integers.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.