-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Open
Description
The current stochastic_gradient_descent implementation contains two fundamental math and logic errors:
1. Epoch Definition: The loop for i in range(epochs): treats a single sample as one epoch. For a dataset of size N, one epoch must consist of N iterations (updates).
2. Gradient Scaling: The code uses (2/total_samples) for the gradient. This is the formula for Batch GD. In SGD, the gradient is calculated for a single point, so it should not be divided by the total number of samples.
Suggested Fix
# Corrected loop to cover all samples per epoch
for i in range(epochs * total_samples):
random_index = random.randint(0, total_samples - 1)
# ... code ...
# Correct SGD gradient (remove 1/total_samples)
w_grad = -2 * (sample_x.T * (sample_y - y_predicted))
b_grad = -2 * (sample_y - y_predicted)
Please reply :).
Metadata
Metadata
Assignees
Labels
No labels