While Isaac's answer seems promising, as it removes those two nested for loops, you are having to create an intermediate array M which is n times the size of your original V array. Python for loops are not cheap, but memory access ain't free either:
n = 10
p = 20000
V = np.random.rand(n, p)
F = np.random.rand(n, n)
def op_code(V, F):
n, p = V.shape
A = np.zeros(p)
for i in xrange(n):
for j in xrange(i+1):
A += F[i,j] * V[i,:] * V[j,:]
return A
def isaac_code(V, F):
n, p = V.shape
F = F.copy()
F[np.triu_indices(n, 1)] = 0
M = (V.reshape(n, 1, p) * V.reshape(1, n, p)) * F.reshape(n, n, 1)
return M.sum((0, 1))
If you now take both for a test ride:
In [20]: np.allclose(isaac_code(V, F), op_code(V, F))
Out[20]: True
In [21]: %timeit op_code(V, F)
100 loops, best of 3: 3.18 ms per loop
In [22]: %timeit isaac_code(V, F)
10 loops, best of 3: 24.3 ms per loop
So removing the for loops is costing you an 8x slowdown. Not a very good thing... At this point you may even want to consider whether a function taking about 3ms to evaluate requires any further optimization. IN case you do, there's a small improvement which can be had by using np.einsum:
def einsum_code(V, F):
n, p = V.shape
F = F.copy()
F[np.triu_indices(n, 1)] = 0
return np.einsum('ij,ik,jk->k', F, V, V)
And now:
In [23]: np.allclose(einsum_code(V, F), op_code(V, F))
Out[23]: True
In [24]: %timeit einsum_code(V, F)
100 loops, best of 3: 2.53 ms per loop
So that's roughly a 20% speed up that introduces code that may very well not be as readable as your for loops. I would say not worth it...
F? Is it(n, n)or(n, n, p)?