Logic error in SGD: Incorrect epoch definition and gradient scaling

The current stochastic_gradient_descent implementation contains two fundamental math and logic errors:
        **1.** Epoch Definition: The loop for i in range(epochs): treats a single sample as one epoch. For a dataset of size **N**, one epoch must consist of **N** iterations (updates).
        **2.** Gradient Scaling: The code uses **(2/total_samples)** for the gradient. This is the formula for Batch GD. In SGD, the gradient is calculated for a single point, so it should not be divided by the total number of samples.

**Suggested Fix**

        # Corrected loop to cover all samples per epoch
        for i in range(epochs * total_samples):    
            random_index = random.randint(0, total_samples - 1)
            # ... code ...
        
            # Correct SGD gradient (remove 1/total_samples)
            w_grad = -2 * (sample_x.T * (sample_y - y_predicted))
            b_grad = -2 * (sample_y - y_predicted)

Please reply :).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Logic error in SGD: Incorrect epoch definition and gradient scaling #34

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Logic error in SGD: Incorrect epoch definition and gradient scaling #34

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions