Understanding Single-Layer Neural Networks

Key Concepts

Neuron (Perceptron)
A neutron is the fundamental unit of the network. Each neuron computes a weighted sum of inputs plus a bias, then applies an activation function (e.g. step, sigmoid)
Input Layer
The input layer accepts the raw data features. Each input is multiplied by an associated weight and passed to the neuron. It is often represented by a vector $\mathbf{x} \in \mathbb{R}^n$.
Weights and Bias
- Weights represent the importance of each input feature. It is often represented by a vector $\mathbf{w} \in \mathbb{R}^n$.
- Bias allows shifting the decision boundary away from the origin.
Linear Combination
The neuron computes:
\[z = \mathbf{w}^\top \mathbf{x} + b\]
Where $\mathbf{w}$ = weight, $\mathbf{x}$ = input, $b$ = bias
Activation Function
The activation function introduces non-linearity (in classic perceptron often a step function). It determines the output class or value. Without it, the network is just a linear model.
Output Layer
It provides the final prediction. In a single-layer network, there is only one set of weights between the input and output (no hidden layer).
Decision Boundary
The hyperplane separating classes in the input space. In a single-layer network, this boundary is always linear.
Ground Truth
The true label of the data point, denoted as $y_{\text{true}} \in {0,1}$. It represents the actual class assigned in the dataset.
Prediction
The model produces $\hat{y}$, derived from the probability $p$. Typically, $\hat{y} = 1$ if $p \geq \tau$, else $\hat{y} = 0$.
Loss Function (Cross-Entropy)
To train the model, the predicted probability $p$ is compared with the ground truth $y_{\text{true}}$ using the binary cross-entropy loss:
\[\mathcal{L}(y_{\text{true}}, p) = - \big[ y_{\text{true}} \cdot \log(p) + (1 - y_{\text{true}}) \cdot \log(1 - p) \big]\]
This loss penalises large differences between the predicted probability and the actual label. Minimising $\mathcal{L}$ adjusts the weights $w$ and bias $b$ to improve classification performance.
Backpropagation
Gradients of the loss are propagated back to update the weights and bias.
Gradient Descent
The parameters are updated iteratively to minimise the loss:
\[\mathbf{w} := \mathbf{w} - \eta \frac{\partial \mathcal{L}}{\partial \mathbf{w}}, \quad b := b - \eta \frac{\partial \mathcal{L}}{\partial b}\]
where $\eta > 0$ is the learning rate controlling the step size.

Architecture of a Single-Layer Neural Network

This diagram illustrates a single-layer neural network for binary classification. The input is represented as a feature vector $\mathbf{x} \in \mathbb{R}^n$, and the parameters of the model are a weight vector $\mathbf{w} \in \mathbb{R}^n$ and a bias term $b \in \mathbb{R}$. The linear combination is computed as $z = \mathbf{w}^\top \mathbf{x} + b$. This value is then passed through a sigmoid activation function $\sigma(z) = \frac{1}{1+e^{-z}}$, which outputs a probability $p = \Pr(y=1 \mid \mathbf{x}) = \sigma(z)$, representing the likelihood that the class label $y$ equals 1. Finally, the probability is compared to a threshold $\tau$ (e.g., 0.5) to produce the predicted class label $y \in {0,1}$. The decision boundary of this model is defined by $\mathbf{w}^\top \mathbf{x} + b = 0$.