1

I'm looking at the answers to an earlier question I asked. numpy.unique with order preserved They work great, but with one example, I have problems.

b
['Aug-09' 'Aug-09' 'Aug-09' ..., 'Jan-13' 'Jan-13' 'Jan-13']
b.shape
(83761,)
b.dtype
|S6
bi, idxb = np.unique(b, return_index=True)
months = bi[np.argsort(idxb)]
months
ndarray: ['Feb-10' 'Aug-10' 'Nov-10' 'Oct-12' 'Oct-11' 'Jul-10' 'Feb-12' 'Sep-11'\n  'Jan-10' 'Apr-10' 'May-10' 'Sep-09' 'Mar-11' 'Jun-12' 'Jul-12' 'Dec-09'\n 'Aug-09' 'Nov-12' 'Dec-12' 'Apr-12' 'Jun-11' 'Jan-11' 'Jul-11' 'Sep-10'\n 'Jan-12' 'Dec-10' 'Oct-09' 'Nov-11' 'Oct-10' 'Mar-12' 'Jan-13' 'Nov-09'\n 'May-11' 'Mar-10' 'Jun-10' 'Dec-11' 'May-12' 'Feb-11' 'Aug-11' 'Sep-12'\n 'Apr-11' 'Aug-12']

Why does months start with Feb-10 instead of Aug-09? With smaller datasets I get the expected behavior, i.e. months starts with Aug-09. I get Feb-10 with all answers to the previous question.


This works

months = []
for bi in b:
    if bi not in months:
        months.append(bi) 

http://www.uploadmb.com/dw.php?id=1364341573 Here is my dataset. Try it yourself.

import numpy as np
f=open('test.txt','r')
res = []
for line in f.readlines():
   res.append(line.strip())

a = np.array(res)
_, idx = np.unique(a, return_index=True)
print a[np.sort(idx)]
2
  • Maybe it's sorting by the string hashes? Commented Mar 26, 2013 at 23:14
  • This is a Numpy bug that was fixed in version 1.6.2, see my edited answer. Commented Mar 27, 2013 at 19:36

1 Answer 1

3

Update:

I believe the problem is actually this. What version of Numpy are you running?

http://projects.scipy.org/numpy/ticket/2063

I reproduced your problem because the Ubuntu installation of Numpy I tested on was 1.6.1, and the bug was fixed at 1.6.2 and above.

Upgrade Numpy, and try again, it worked for me on my Ubuntu machine.


In these lines:

bi, idxb = np.unique(b, return_index=True)
months = bi[np.argsort(idxb)]

There are two mistakes:

  1. You want to actually use the sorted indices on the original array, b[...]
  2. You want the sorted indices, not the indices that sort the indices, so use sort not argsort.

This should work:

bi, idxb = np.unique(b, return_index=True)
months = b[np.sort(idxb)]

Yes, it does, using your data set and running python 2.7, numpy 1.7 on Mac OS 10.6, 64 bit

Python 2.7.3 (default, Oct 23 2012, 13:06:50) 

IPython 0.13.1 -- An enhanced Interactive Python.

In [1]: import numpy as np

In [2]: np.__version__
Out[2]: '1.7.0'

In [3]: from platform import architecture

In [4]: architecture()
Out[4]: ('64bit', '')

In [5]: f = open('test.txt','r')

In [6]: lines = np.array([line.strip() for line in f.readlines()])

In [7]: _, ilines = np.unique(lines, return_index = True)

In [8]: months = lines[np.sort(ilines)]

In [9]: months
Out[9]: 
array(['Aug-09', 'Sep-09', 'Oct-09', 'Nov-09', 'Dec-09', 'Jan-10',
       'Feb-10', 'Mar-10', 'Apr-10', 'May-10', 'Jun-10', 'Jul-10',
       'Aug-10', 'Sep-10', 'Oct-10', 'Nov-10', 'Dec-10', 'Jan-11',
       'Feb-11', 'Mar-11', 'Apr-11', 'May-11', 'Jun-11', 'Jul-11',
       'Aug-11', 'Sep-11', 'Oct-11', 'Nov-11', 'Dec-11', 'Jan-12',
       'Feb-12', 'Mar-12', 'Apr-12', 'May-12', 'Jun-12', 'Jul-12',
       'Aug-12', 'Sep-12', 'Oct-12', 'Nov-12', 'Dec-12', 'Jan-13'], 
      dtype='|S6')

OK, I can finally reproduce your problem on Ubuntu 64 bit too:

Python 2.7.3 (default, Aug  1 2012, 05:14:39) 

IPython 0.12.1 -- An enhanced Interactive Python.

In [1]: import numpy as np

In [2]: np.__version__
Out[2]: '1.6.1'

In [3]: from platform import architecture

In [4]: architecture()
Out[4]: ('64bit', 'ELF')

In [5]: f = open('test.txt','r')

In [6]: lines = np.array([line.strip() for line in f.readlines()])

In [7]: _, ilines = np.unique(lines, return_index=True)

In [8]: months = lines[np.sort(ilines)]

In [9]: months
Out[9]: 
array(['Feb-10', 'Aug-10', 'Nov-10', 'Oct-12', 'Oct-11', 'Jul-10',
       'Feb-12', 'Sep-11', 'Jan-10', 'Apr-10', 'May-10', 'Sep-09',
       'Mar-11', 'Jun-12', 'Jul-12', 'Dec-09', 'Aug-09', 'Nov-12',
       'Dec-12', 'Apr-12', 'Jun-11', 'Jan-11', 'Jul-11', 'Sep-10',
       'Jan-12', 'Dec-10', 'Oct-09', 'Nov-11', 'Oct-10', 'Mar-12',
       'Jan-13', 'Nov-09', 'May-11', 'Mar-10', 'Jun-10', 'Dec-11',
       'May-12', 'Feb-11', 'Aug-11', 'Sep-12', 'Apr-11', 'Aug-12'], 
      dtype='|S6')

Works on Ubuntu after Numpy upgrade:

Python 2.7.3 (default, Aug  1 2012, 05:14:39) 

IPython 0.12.1 -- An enhanced Interactive Python.

In [1]: import numpy as np

In [2]: np.__version__
Out[2]: '1.7.0'

In [3]: f = open('test.txt','r')

In [4]: lines = np.array([line.strip() for line in f.readlines()])

In [5]: _, ilines = np.unique(lines, return_index=True)

In [6]: months = lines[np.sort(ilines)]

In [7]: months
Out[7]: 
array(['Aug-09', 'Sep-09', 'Oct-09', 'Nov-09', 'Dec-09', 'Jan-10',
       'Feb-10', 'Mar-10', 'Apr-10', 'May-10', 'Jun-10', 'Jul-10',
       'Aug-10', 'Sep-10', 'Oct-10', 'Nov-10', 'Dec-10', 'Jan-11',
       'Feb-11', 'Mar-11', 'Apr-11', 'May-11', 'Jun-11', 'Jul-11',
       'Aug-11', 'Sep-11', 'Oct-11', 'Nov-11', 'Dec-11', 'Jan-12',
       'Feb-12', 'Mar-12', 'Apr-12', 'May-12', 'Jun-12', 'Jul-12',
       'Aug-12', 'Sep-12', 'Oct-12', 'Nov-12', 'Dec-12', 'Jan-13'], 
      dtype='|S6')
Sign up to request clarification or add additional context in comments.

9 Comments

no, I need to sort them in their original order. In the example, the first item is Aug-09, so that should come first in the unique list with order preserved
that gives ['Aug-09' 'Aug-09' 'Sep-09' 'Sep-09' 'Sep-09' 'Aug-09' 'Aug-09' 'Oct-09' 'Aug-09' 'Aug-09' 'Sep-09' 'Sep-09' 'Sep-09' 'Sep-09' 'Sep-09' 'Aug-09' 'Aug-09' 'Sep-09' 'Aug-09' 'Aug-09' 'Sep-09' 'Aug-09' 'Sep-09' 'Sep-09' 'Aug-09' 'Aug-09' 'Sep-09' 'Sep-09' 'Sep-09' 'Sep-09' 'Aug-09' 'Sep-09' 'Sep-09' 'Sep-09' 'Sep-09' 'Aug-09' 'Sep-09' 'Aug-09' 'Aug-09' 'Oct-09' 'Aug-09' 'Aug-09']
@bizso09 Aha. Use sort on the indices, not argsort. See edit again, hopefully the last :P
Yes that is supposed to work. But on my dataset, it doesn't work. I don't know why. You can download it from the link. EDIT. Ok wait, let me try
Well, I tried your code on 2 computers, and I got both times ['Feb-10' 'Aug-10' 'Nov-10' 'Oct-12' 'Oct-11' 'Jul-10' 'Feb-12' 'Sep-11' 'Jan-10' 'Apr-10' 'May-10' 'Sep-09' 'Mar-11' 'Jun-12' 'Jul-12' 'Dec-09' 'Aug-09' 'Nov-12' 'Dec-12' 'Apr-12' 'Jun-11' 'Jan-11' 'Jul-11' 'Sep-10' 'Jan-12' 'Dec-10' 'Oct-09' 'Nov-11' 'Oct-10' 'Mar-12' 'Jan-13' 'Nov-09' 'May-11' 'Mar-10' 'Jun-10' 'Dec-11' 'May-12' 'Feb-11' 'Aug-11' 'Sep-12' 'Apr-11' 'Aug-12']
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.