1

I have big dataset in array form and its arranged like this:

Rainfal amount arranged in array form

Average or mean mean for each latitude and longitude at axis=0 is computed using this method declaration:

Lat=data[:,0]

Lon=data[:,1]

rain1=data[:,2]

rain2=data[:,3]

--

rain44=data[:,44]


rainT=[rain1,rain2,rain3,rain4,....rain44]

mean=np.mean(rainT)

The result was aweseome but requires time computation and I look forward to use For Loop to ease the calculation. As for the moment the script that I used is like this:

mean=[]

lat=data[:,0]

lon=data[:,1]

for x in range(2,46):

    rainT=data[:,x]

mean=np.mean(rainT,axis=0)

print mean

But weird result is appeared. Anyone?

3
  • Can you explain, what is the weird result? error or does not match expected output? Commented Nov 25, 2017 at 4:47
  • "and I look forward to use For Loop to ease the calculation. " this is the opposite mentality than numpy uses. You should not have for loops, otherwise you may as well do it in regular python. That said, I'm not actually sure what you're trying to do exactly. Commented Nov 25, 2017 at 5:23
  • it giving me an average for each file rather than average for each latitude and longitude horizontally Commented Nov 25, 2017 at 5:28

2 Answers 2

1

First, you probably meant to make the for loop add the subarrays rather than keep replacing rainT with other slices of the subarray. Only the last assignment matters, so the code averages that one subarray rainT=data[:,45], also it doesn't have the correct number of original elements to divide by to compute an average. Both of these mistakes contribute to the weird result.

Second, numpy should be able to average elements faster than a Python for loop can do it since that's just the kind of thing that numpy is designed to do in optimized native code.

Third, your original code copies a bunch of subarrays into a Python List, then asks numpy to average that. You should get much faster results by asking numpy to sum the relevant subarray without making a copy, something like this:

rainT = data[:,2:] # this gets a view onto data[], not a copy
mean = np.mean(rainT)

That computes an average over all the rainfall values, like your original code.

If you want an average for each latitude or some such, you'll need to do it differently. You can average over an array axis, but latitude and longitude aren't axes in your data[].

Sign up to request clarification or add additional context in comments.

Comments

0

Thanks friends, you are giving me such aspiration. Here is the working script ideas by @Jerry101 just now but I decided NOT to apply Python Loop. New declaration would be like this:

lat1=data[:,0]

lon1=data[:,1]

rainT=data[:,2:46] ---THIS IS THE STEP THAT I AM MISSING EARLIER

mean=np.mean(rainT,axis=1)*24 - MAKE AVERAGE DAILY RAINFALL BY EACH LAT AND LON

mean2=np.array([lat1,lon1,mean])

mean2=mean2.T

np.savetxt('average-daily-rainfall.dat2',mean2,fmt='%9.3f')

And finally the result is exactly same to program made in Fortran.

1 Comment

data[:,2:46] is the rainfall located at that column without consider lat and lon column.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.