1
$\begingroup$

So I recently started with Andrew Ng's ML Course and this is the formula that Andrew lays out for calculating gradient descent on a linear model.

$$ \theta_j = \theta_j - \alpha \frac{1}{m} \sum_{i=1}^m \left( h_\theta(x^{(i)}) - y^{(i)}\right)x_j^{(i)} \qquad \text{simultaneously update } \theta_j \text{ for all } j$$

As we see, the formula asks us to the sum over all the rows in data.

However, the below code doesn't work if I apply np.sum()

def gradientDescent(X, y, theta, alpha, num_iters):

    # Initialize some useful values
    m = y.shape[0]  # number of training examples

    # make a copy of theta, to avoid changing the original array, since numpy arrays
    # are passed by reference to functions
    theta = theta.copy()

    J_history = [] # Use a python list to save cost in every iteration

    for i in range(num_iters):
        temp = np.dot(X, theta) - y
        temp = np.dot(X.T, temp)
        theta = theta - ((alpha / m) * np.sum(temp))
        # save the cost J in every iteration
        J_history.append(computeCost(X, y, theta))

    return theta, J_history

On the other hand, if I get rid of the np.sum(), the formula works perfectly.

def gradientDescent(X, y, theta, alpha, num_iters):

# Initialize some useful values
m = y.shape[0]  # number of training examples

# make a copy of theta, to avoid changing the original array, since numpy arrays
# are passed by reference to functions
theta = theta.copy()

J_history = [] # Use a python list to save cost in every iteration

for i in range(num_iters):
    temp = np.dot(X, theta) - y
    temp = np.dot(X.T, temp)
    theta = theta - ((alpha / m) * temp)
    # save the cost J in every iteration
    J_history.append(computeCost(X, y, theta))

return theta, J_history

Can someone please explain this?

$\endgroup$
2
  • $\begingroup$ Could it be that the dot product is doing the relevant summing? $\endgroup$ Commented Sep 8, 2019 at 4:24
  • $\begingroup$ I don't think so. $\endgroup$ Commented Sep 8, 2019 at 7:29

2 Answers 2

0
$\begingroup$

Your goal if to compute the gradients for the whole theta vector of size p (number of variables). Your temp is a vector also of size $p$, which contains the values of gradients of the cost function relative to each of your theta values.

Therefore, you want to substract point-wise the two vectors (with learning rate $\alpha$) to make an update, so no reason to sum the vector.

$\endgroup$
2
  • $\begingroup$ I still don't understand why the formula then takes the sum over i. Even for the cost function, we have a sum component and we do take the sum : J = np.sum(np.square((np.dot(X, theta) - y))) / (2 * m). $\endgroup$ Commented Sep 7, 2019 at 18:06
  • $\begingroup$ @ManasTripathi if you refer to the $i$ in the formula, it’s just the sum over the training examples. The dot product already handles this (that’s why we say your code is “vectorized”) $\endgroup$ Commented Sep 9, 2019 at 8:03
0
$\begingroup$

Commenters are correct - you confuse vector and scalar operations.

The formula is scalar one, and here is how you can implement it:

for n in range(num_iters):
    
    for j in range(len(theta)):
    
        sum_j = 0
        for i in range(len(X)):
            temp = X[i, j]*theta[j] - y[i]
            temp = temp * X[i, j]
            sum_j += temp
        
        sum_j = (alpha / m)*sum_j
        
        theta[j] = theta[j] - sum_j

J_history.append(computeCost(X, y, theta))

But you're trying to plug vectors into the scalar formula and that's what causes confusion.

$\endgroup$

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.