1

I have two numpy arrays of daily values and time steps:

A = [[ 0.1   0.05  0.05  0.05  0.1   0.1   0.1   0.1   0.1   0.1   0.1   0.1 .......]]

T = [['19730101' '19730102' '19730103' '19730104' '19730105' '19730106' ....... '19931231']]

and want to split A into sub-arrays for each month such as:

s = numpy.split(A,condition) # condition is when there is a change in month index in T

I am not clear on how to track change in index of monthly digits. Any suggestions would be appreciative.

2
  • the value of T is changing or its increasing??? Commented Nov 28, 2014 at 7:22
  • Increasing for a single month (in terms of days) but changing for years and remain in same format %y4%m2%d2 such as 19730101 Commented Nov 28, 2014 at 7:25

2 Answers 2

2

I think this should do it. There is probably a faster/neater way to do it with numpy, but I think this is pretty straight-forward.

A = [0.1,   0.05,  0.05,  0.05,  0.1,   0.1,   0.1]
T = ['19730101', '19730102', '19730103', '19730104', '19730105', '19730106', '19931231']

combined = zip(A, T)
combined = sorted(combined, key=lambda x: x[1]) # Sort on timestamp

splits = []
current_month = None
for a, t in combined:
    month = t[4:6]
    print month
    if not month == current_month:
        splits.append([a,]) # Add new split
        current_month = month
    else:
        splits[-1].append(a) # Add to current split
print splits
Sign up to request clarification or add additional context in comments.

5 Comments

It is still single array. No split
I forgot to change current_month when the month changes. It works now, as long as A and T are one dimensional arrays of the same length.
weird, still single array
See updated code with your example data. Note that since your original data was not valif Pythin code, I have converted A and T to one dimensional lists.
Also, I guess instead of checking just for month you want the year+month concatenation? Currently '197310' and '197410' would end up in the same list. In that case change month=t[4:6] to yearmonth=t[:6].
2

You could do it quite easily using pandas:

>>> T = ['20140101', '20140102', '20140201', '20140202']
>>> A = [0.1, 0.2, 0.3, 0.4]
>>> s = pandas.Series(A, T)
>>> groups = s.groupby(lambda i: i[:6])
>>> for month, group in g:
...     print(month)
...     print(group)
201401
20140101    0.1
20140102    0.2
dtype: float64
201402
20140201    0.3
20140202    0.4
dtype: float64

Or you could use pure python, although it is probably less efficient:

>>> groups = {}
>>> for t, a in zip(T, A):
...     month = t[:6]
...     groups.setdefault(month, []).append(a))
>>> for month, group in groups.items():
...     print(month)
...     print(group)
201402
[('20140201', 0.3), ('20140202', 0.4)]
201401
[('20140101', 0.1), ('20140102', 0.2)]

2 Comments

received error with pure python way: TypeError: unhashable type: 'numpy.ndarray'
I don't get an error. How are you creating the arrays? Are they 1D or 2D? The [[..]] in your question suggests 2D, although the data is only 1D. If you have 2D arrays, that would be the problem.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.