Skip to content

Logic error in SGD: Incorrect epoch definition and gradient scaling #34

@harshgupta-23

Description

@harshgupta-23

The current stochastic_gradient_descent implementation contains two fundamental math and logic errors:
1. Epoch Definition: The loop for i in range(epochs): treats a single sample as one epoch. For a dataset of size N, one epoch must consist of N iterations (updates).
2. Gradient Scaling: The code uses (2/total_samples) for the gradient. This is the formula for Batch GD. In SGD, the gradient is calculated for a single point, so it should not be divided by the total number of samples.

Suggested Fix

    # Corrected loop to cover all samples per epoch
    for i in range(epochs * total_samples):    
        random_index = random.randint(0, total_samples - 1)
        # ... code ...
    
        # Correct SGD gradient (remove 1/total_samples)
        w_grad = -2 * (sample_x.T * (sample_y - y_predicted))
        b_grad = -2 * (sample_y - y_predicted)

Please reply :).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions