Split numpy array into sub-arrays based on conditions

Question

I have two numpy arrays of daily values and time steps:

A = [[ 0.1   0.05  0.05  0.05  0.1   0.1   0.1   0.1   0.1   0.1   0.1   0.1 .......]]

T = [['19730101' '19730102' '19730103' '19730104' '19730105' '19730106' ....... '19931231']]

and want to split A into sub-arrays for each month such as:

s = numpy.split(A,condition) # condition is when there is a change in month index in T

I am not clear on how to track change in index of monthly digits. Any suggestions would be appreciative.

Increasing for a single month (in terms of days) but changing for years and remain in same format %y4%m2%d2 such as 19730101 — Ibe
– Ibe, Commented Nov 28, 2014 at 7:25

Hannes Ovrén · Accepted Answer · 2014-11-28 08:41:21Z

2

I think this should do it. There is probably a faster/neater way to do it with numpy, but I think this is pretty straight-forward.

A = [0.1,   0.05,  0.05,  0.05,  0.1,   0.1,   0.1]
T = ['19730101', '19730102', '19730103', '19730104', '19730105', '19730106', '19931231']

combined = zip(A, T)
combined = sorted(combined, key=lambda x: x[1]) # Sort on timestamp

splits = []
current_month = None
for a, t in combined:
    month = t[4:6]
    print month
    if not month == current_month:
        splits.append([a,]) # Add new split
        current_month = month
    else:
        splits[-1].append(a) # Add to current split
print splits

edited Nov 28, 2014 at 8:41

answered Nov 28, 2014 at 7:50

Hannes Ovrén

22k9 gold badges71 silver badges76 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Ibe Over a year ago

It is still single array. No split

Hannes Ovrén Over a year ago

I forgot to change current_month when the month changes. It works now, as long as A and T are one dimensional arrays of the same length.

Ibe Over a year ago

weird, still single array

Hannes Ovrén Over a year ago

See updated code with your example data. Note that since your original data was not valif Pythin code, I have converted A and T to one dimensional lists.

Hannes Ovrén Over a year ago

Also, I guess instead of checking just for month you want the year+month concatenation? Currently '197310' and '197410' would end up in the same list. In that case change month=t[4:6] to yearmonth=t[:6].

aquavitae · Accepted Answer · 2014-11-28 07:51:01Z

2

You could do it quite easily using pandas:

>>> T = ['20140101', '20140102', '20140201', '20140202']
>>> A = [0.1, 0.2, 0.3, 0.4]
>>> s = pandas.Series(A, T)
>>> groups = s.groupby(lambda i: i[:6])
>>> for month, group in g:
...     print(month)
...     print(group)
201401
20140101    0.1
20140102    0.2
dtype: float64
201402
20140201    0.3
20140202    0.4
dtype: float64

Or you could use pure python, although it is probably less efficient:

>>> groups = {}
>>> for t, a in zip(T, A):
...     month = t[:6]
...     groups.setdefault(month, []).append(a))
>>> for month, group in groups.items():
...     print(month)
...     print(group)
201402
[('20140201', 0.3), ('20140202', 0.4)]
201401
[('20140101', 0.1), ('20140102', 0.2)]

answered Nov 28, 2014 at 7:51

aquavitae

19.4k12 gold badges68 silver badges110 bronze badges

2 Comments

Ibe Over a year ago

received error with pure python way: TypeError: unhashable type: 'numpy.ndarray'

aquavitae Over a year ago

I don't get an error. How are you creating the arrays? Are they 1D or 2D? The [[..]] in your question suggests 2D, although the data is only 1D. If you have 2D arrays, that would be the problem.

Collectives™ on Stack Overflow

Split numpy array into sub-arrays based on conditions

2 Answers 2

5 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related