Using linear-indices and bincount -
lidx = np.ravel_multi_index(P, X.shape)
X += np.bincount(lidx, minlength=X.size).reshape(X.shape)
Benchmarking
For the case when indices are not repeated, advanced indexing based approach as suggested in @user2357112's post seems to be very efficient.
For repeated ones case, we have np.add.at and np.bincount and the performance numbers seem to be dependent on the size of indices array relative to the size of input array.
Approaches -
def app0(X,P): # @user2357112's soln1
np.add.at(X, (P[0], P[1]), 1)
def app1(X, P): # Proposed in this ppst
lidx = np.ravel_multi_index(P, X.shape)
X += np.bincount(lidx, minlength=X.size).reshape(X.shape)
Here's few timing tests to suggest that -
Case #1 :
In [141]: X = np.random.randint(0,9,(100,100))
...: P = np.random.randint(0,100,(2,1000))
...:
In [142]: %timeit app0(X, P)
...: %timeit app1(X, P)
...:
10000 loops, best of 3: 68.9 µs per loop
100000 loops, best of 3: 15.1 µs per loop
Case #2 :
In [143]: X = np.random.randint(0,9,(1000,1000))
...: P = np.random.randint(0,1000,(2,10000))
...:
In [144]: %timeit app0(X, P)
...: %timeit app1(X, P)
...:
1000 loops, best of 3: 687 µs per loop
1000 loops, best of 3: 1.48 ms per loop
Case #3 :
In [145]: X = np.random.randint(0,9,(1000,1000))
...: P = np.random.randint(0,1000,(2,100000))
...:
In [146]: %timeit app0(X, P)
...: %timeit app1(X, P)
...:
100 loops, best of 3: 11.3 ms per loop
100 loops, best of 3: 2.51 ms per loop
for p in range(P.shape[1]):probably?