Differentiation

Understanding derivatives, their properties, and applications in machine learning optimization.

Differentiation is a fundamental concept in calculus that measures rates of change and is essential for optimization in machine learning algorithms.

Fundamentals of Derivatives

Basic Concepts

Definition: $f'(x) = \lim_{h \to 0} \frac{f(x + h) - f(x)}{h}$
Geometric interpretation: Slope of tangent line
Notation variations:
- Leibniz: $\frac{d}{dx}f(x)$
- Newton: $\dot{y}$ or $\ddot{y}$
- Prime: $f'(x)$ , $f''(x)$
Continuity and differentiability conditions

Rules of Differentiation

Power Rule:

\frac{d}{dx}x^n = nx^{n-1}

Product Rule:

\frac{d}{dx}[f(x)g(x)] = f'(x)g(x) + f(x)g'(x)

Quotient Rule: $\frac{d}{dx}[\frac{f(x)}{g(x)}] = \frac{f'(x)g(x) - f(x)g'(x)}{[g(x)]^2}$
Chain Rule: $\frac{d}{dx}f(g(x)) = f'(g(x))g'(x)$

import numpy as np
from scipy.misc import derivative

# Numerical differentiation
def numerical_derivative(f, x, h=1e-7):
    return (f(x + h) - f(x)) / h

# Example usage
def f(x): return x**2 + np.sin(x)
x0 = 2.0
df_dx = numerical_derivative(f, x0)

Partial Derivatives

Multivariate Functions

Definition: $\frac{\partial f}{\partial x_i} = \lim_{h \to 0} \frac{f(x_1,...,x_i+h,...,x_n) - f(x_1,...,x_i,...,x_n)}{h}$
Mixed partials: $\frac{\partial^2 f}{\partial x \partial y} = \frac{\partial}{\partial x}(\frac{\partial f}{\partial y})$

Applications in ML:

def partial_derivatives(f, point, eps=1e-7):
    n = len(point)
    grads = np.zeros(n)
    for i in range(n):
        h = np.zeros(n)
        h[i] = eps
        grads[i] = (f(point + h) - f(point)) / eps
    return grads

Gradient Vector

Definition: $\nabla f = [\frac{\partial f}{\partial x_1}, ..., \frac{\partial f}{\partial x_n}]^T$
Properties:
- Direction of steepest increase
- Perpendicular to level curves/surfaces

Gradient descent implementation:

def gradient_descent(f, grad_f, x0, lr=0.01, n_iter=1000):
    x = x0
    history = [x0]
    for _ in range(n_iter):
        grad = grad_f(x)
        x = x - lr * grad
        history.append(x)
    return x, history

Higher-Order Derivatives

Second Derivatives

Definition: $f''(x) = \frac{d}{dx}(f'(x))$
Hessian matrix: $H_{ij} = \frac{\partial^2 f}{\partial x_i \partial x_j}$

def hessian(f, x, eps=1e-7):
    n = len(x)
    H = np.zeros((n, n))
    for i in range(n):
        for j in range(n):
            h_i = np.zeros(n)
            h_j = np.zeros(n)
            h_i[i] = eps
            h_j[j] = eps
            H[i,j] = (f(x + h_i + h_j) - f(x + h_i) - f(x + h_j) + f(x)) / (eps**2)
    return H

Applications in ML

Neural Network Training

def backpropagation_example(X, y, weights, activation):
    # Forward pass
    z = np.dot(X, weights)
    a = activation(z)
    
    # Backward pass
    error = a - y
    gradient = np.dot(X.T, error * activation(z, derivative=True))
    return gradient

Optimization Algorithms

def newton_method(f, grad_f, hess_f, x0, n_iter=100):
    x = x0
    for _ in range(n_iter):
        grad = grad_f(x)
        hess = hess_f(x)
        x = x - np.linalg.solve(hess, grad)
    return x

Common Derivatives in ML

Activation Functions

class ActivationFunctions:
    @staticmethod
    def sigmoid(x, derivative=False):
        s = 1 / (1 + np.exp(-x))
        return s * (1 - s) if derivative else s
    
    @staticmethod
    def relu(x, derivative=False):
        if derivative:
            return np.where(x > 0, 1, 0)
        return np.maximum(0, x)
    
    @staticmethod
    def tanh(x, derivative=False):
        t = np.tanh(x)
        return 1 - t**2 if derivative else t

Loss Functions

class LossFunctions:
    @staticmethod
    def mse(y_true, y_pred, derivative=False):
        if derivative:
            return 2 * (y_pred - y_true)
        return np.mean((y_true - y_pred)**2)
    
    @staticmethod
    def cross_entropy(y_true, y_pred, derivative=False):
        eps = 1e-15
        if derivative:
            return -y_true / (y_pred + eps)
        return -np.sum(y_true * np.log(y_pred + eps))

Practical Considerations

Numerical Methods

def finite_difference_methods():
    # Forward difference
    def forward(f, x, h=1e-7):
        return (f(x + h) - f(x)) / h
    
    # Central difference
    def central(f, x, h=1e-7):
        return (f(x + h) - f(x - h)) / (2 * h)
    
    # Second derivative
    def second_order(f, x, h=1e-7):
        return (f(x + h) - 2*f(x) + f(x - h)) / h**2
    
    return forward, central, second_order

Implementation Issues

Gradient vanishing/exploding
Numerical stability
Computational efficiency
Choice of step size

Previous02 Calculus

NextIntegration

Getting Started

Math

Machine Learning

Deep Learning

Natural Language Processing

Reinforcement Learning

References