Other alternatives for summing the columns are
numpy.einsum('ij->j', a)
and
numpy.dot(a.T, numpy.ones(a.shape[0]))
If the number of rows and columns is in the same order of magnitude, all of the possibilities are roughly equally fast:

If there are only a few columns, however, both the einsum and the dot solution significantly outperform numpy's sum (note the log-scale):

Code to reproduce the plots:
import numpy
import perfplot
def numpy_sum(a):
return numpy.sum(a, axis=1)
def einsum(a):
return numpy.einsum('ij->i', a)
def dot_ones(a):
return numpy.dot(a, numpy.ones(a.shape[1]))
perfplot.save(
"out1.png",
# setup=lambda n: numpy.random.rand(n, n),
setup=lambda n: numpy.random.rand(n, 3),
n_range=[2**k for k in range(15)],
kernels=[numpy_sum, einsum, dot_ones],
logx=True,
logy=True,
xlabel='len(a)',
)