Cost Function for Neural Networks

The cost function for neural networks generalizes the logistic regression cost to multiple output units and includes regularization over all weights in the network. This post breaks down the cost function, explaining the double and triple summations, and provides intuition for how it works.

Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

Cost Function for Neural Networks

Notation

We define:

$L$ = total number of layers in the network
$s_l$ = number of units (excluding bias unit) in layer $l$
$K$ = number of output units (classes)

For multiclass problems, the $k^{th}$ output is written as:

(h_\Theta(x))_k

Recall: Logistic Regression Cost Function

The regularized logistic regression cost is:

J(\theta) =- \frac{1}{m} \sum_{i=1}^{m} \left[ y^{(i)} \log(h_\theta(x^{(i)}))+ (1 - y^{(i)}) \log(1 - h_\theta(x^{(i)})) \right] + \frac{\lambda}{2m} \sum_{j=1}^{n} \theta_j^2

Neural Network Cost Function

For neural networks, the cost function generalizes to:

J(\Theta) =- \frac{1}{m} \sum_{i=1}^{m} \sum_{k=1}^{K} \left[ y_k^{(i)} \log((h_\Theta(x^{(i)}))_k) + (1 - y_k^{(i)}) \log(1 - (h_\Theta(x^{(i)}))_k) \right] + \frac{\lambda}{2m} \sum_{l=1}^{L-1} \sum_{i=1}^{s_l} \sum_{j=1}^{s_{l+1}} (\Theta^{(l)}_{j,i})^2

Key Ideas

Double Sum

The term

\sum_{i=1}^{m} \sum_{k=1}^{K}

means:

Loop over all training examples ( $i$ )
Loop over all output units ( $k$ )
Compute logistic loss for each output
Sum them together

This is simply the total loss across all output neurons.

Triple Sum (Regularization)

The term

\sum_{l=1}^{L-1} \sum_{i=1}^{s_l} \sum_{j=1}^{s_{l+1}} (\Theta^{(l)}_{j,i})^2

means:

Loop over all layers
Loop over all weights in each layer
Square every weight
Add them all together

Important:

Bias weights are not regularized.
The index $i$ here does not refer to training examples.
This term regularizes all parameters in the entire network.

Intuition

Neural network cost function =

Logistic regression loss applied to every output unit
Plus regularization over all weights in the network

Cost Function for Neural Networks

The cost function for neural networks generalizes the logistic regression cost to multiple output units and includes regularization over all weights in the network. This post breaks down the cost function, explaining the double and triple summations, and provides intuition for how it works.

Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

Cost Function for Neural Networks

Notation

We define:

$L$ = total number of layers in the network
$s_l$ = number of units (excluding bias unit) in layer $l$
$K$ = number of output units (classes)

For multiclass problems, the $k^{th}$ output is written as:

(h_\Theta(x))_k

Recall: Logistic Regression Cost Function

The regularized logistic regression cost is:

J(\theta) =- \frac{1}{m} \sum_{i=1}^{m} \left[ y^{(i)} \log(h_\theta(x^{(i)}))+ (1 - y^{(i)}) \log(1 - h_\theta(x^{(i)})) \right] + \frac{\lambda}{2m} \sum_{j=1}^{n} \theta_j^2

Neural Network Cost Function

For neural networks, the cost function generalizes to:

J(\Theta) =- \frac{1}{m} \sum_{i=1}^{m} \sum_{k=1}^{K} \left[ y_k^{(i)} \log((h_\Theta(x^{(i)}))_k) + (1 - y_k^{(i)}) \log(1 - (h_\Theta(x^{(i)}))_k) \right] + \frac{\lambda}{2m} \sum_{l=1}^{L-1} \sum_{i=1}^{s_l} \sum_{j=1}^{s_{l+1}} (\Theta^{(l)}_{j,i})^2

Key Ideas

Double Sum

The term

\sum_{i=1}^{m} \sum_{k=1}^{K}

means:

Loop over all training examples ( $i$ )
Loop over all output units ( $k$ )
Compute logistic loss for each output
Sum them together

This is simply the total loss across all output neurons.

Triple Sum (Regularization)

The term

\sum_{l=1}^{L-1} \sum_{i=1}^{s_l} \sum_{j=1}^{s_{l+1}} (\Theta^{(l)}_{j,i})^2

means:

Loop over all layers
Loop over all weights in each layer
Square every weight
Add them all together

Important:

Bias weights are not regularized.
The index $i$ here does not refer to training examples.
This term regularizes all parameters in the entire network.

Intuition

Neural network cost function =

Logistic regression loss applied to every output unit
Plus regularization over all weights in the network

Cost Function for Neural Networks

The cost function for neural networks generalizes the logistic regression cost to multiple output units and includes regularization over all weights in the network. This post breaks down the cost function, explaining the double and triple summations, and provides intuition for how it works.

Written by Hitesh Sahu, a passionate developer and blogger.

Cost Function for Neural Networks

Notation

Recall: Logistic Regression Cost Function

Neural Network Cost Function

Key Ideas

Double Sum

Triple Sum (Regularization)

Intuition

Playstore

Fetching content, this won’t take long…

🦈 Sharks existed before trees 🌳.

Cost Function for Neural Networks

The cost function for neural networks generalizes the logistic regression cost to multiple output units and includes regularization over all weights in the network. This post breaks down the cost function, explaining the double and triple summations, and provides intuition for how it works.

Written by Hitesh Sahu, a passionate developer and blogger.

Cost Function for Neural Networks

Notation

Recall: Logistic Regression Cost Function

Neural Network Cost Function

Key Ideas

Double Sum

Triple Sum (Regularization)

Intuition

Playstore