Differentiation

Understanding derivatives, their properties, and applications in machine learning optimization.

Differentiation is a fundamental concept in calculus that measures rates of change and is essential for optimization in machine learning algorithms.

Fundamentals of Derivatives

Basic Concepts

  • Definition: f(x)=limh0f(x+h)f(x)hf'(x) = \lim_{h \to 0} \frac{f(x + h) - f(x)}{h}
  • Geometric interpretation: Slope of tangent line
  • Notation variations:
    • Leibniz: ddxf(x)\frac{d}{dx}f(x)
    • Newton: y˙\dot{y} or y¨\ddot{y}
    • Prime: f(x)f'(x), f(x)f''(x)
  • Continuity and differentiability conditions

Rules of Differentiation

  1. Power Rule:
ddxxn=nxn1\frac{d}{dx}x^n = nx^{n-1}
  1. Product Rule:
ddx[f(x)g(x)]=f(x)g(x)+f(x)g(x)\frac{d}{dx}[f(x)g(x)] = f'(x)g(x) + f(x)g'(x)
  1. Quotient Rule: ddx[f(x)g(x)]=f(x)g(x)f(x)g(x)[g(x)]2\frac{d}{dx}[\frac{f(x)}{g(x)}] = \frac{f'(x)g(x) - f(x)g'(x)}{[g(x)]^2}
  2. Chain Rule: ddxf(g(x))=f(g(x))g(x)\frac{d}{dx}f(g(x)) = f'(g(x))g'(x)
import numpy as np
from scipy.misc import derivative

# Numerical differentiation
def numerical_derivative(f, x, h=1e-7):
    return (f(x + h) - f(x)) / h

# Example usage
def f(x): return x**2 + np.sin(x)
x0 = 2.0
df_dx = numerical_derivative(f, x0)

Partial Derivatives

Multivariate Functions

  • Definition: fxi=limh0f(x1,...,xi+h,...,xn)f(x1,...,xi,...,xn)h\frac{\partial f}{\partial x_i} = \lim_{h \to 0} \frac{f(x_1,...,x_i+h,...,x_n) - f(x_1,...,x_i,...,x_n)}{h}
  • Mixed partials: 2fxy=x(fy)\frac{\partial^2 f}{\partial x \partial y} = \frac{\partial}{\partial x}(\frac{\partial f}{\partial y})
  • Applications in ML:
    def partial_derivatives(f, point, eps=1e-7):
        n = len(point)
        grads = np.zeros(n)
        for i in range(n):
            h = np.zeros(n)
            h[i] = eps
            grads[i] = (f(point + h) - f(point)) / eps
        return grads
    

Gradient Vector

  • Definition: f=[fx1,...,fxn]T\nabla f = [\frac{\partial f}{\partial x_1}, ..., \frac{\partial f}{\partial x_n}]^T
  • Properties:
    • Direction of steepest increase
    • Perpendicular to level curves/surfaces
  • Gradient descent implementation:
    def gradient_descent(f, grad_f, x0, lr=0.01, n_iter=1000):
        x = x0
        history = [x0]
        for _ in range(n_iter):
            grad = grad_f(x)
            x = x - lr * grad
            history.append(x)
        return x, history
    

Higher-Order Derivatives

Second Derivatives

  • Definition: f(x)=ddx(f(x))f''(x) = \frac{d}{dx}(f'(x))
  • Hessian matrix: Hij=2fxixjH_{ij} = \frac{\partial^2 f}{\partial x_i \partial x_j}
def hessian(f, x, eps=1e-7):
    n = len(x)
    H = np.zeros((n, n))
    for i in range(n):
        for j in range(n):
            h_i = np.zeros(n)
            h_j = np.zeros(n)
            h_i[i] = eps
            h_j[j] = eps
            H[i,j] = (f(x + h_i + h_j) - f(x + h_i) - f(x + h_j) + f(x)) / (eps**2)
    return H

Applications in ML

  1. Neural Network Training

    def backpropagation_example(X, y, weights, activation):
        # Forward pass
        z = np.dot(X, weights)
        a = activation(z)
        
        # Backward pass
        error = a - y
        gradient = np.dot(X.T, error * activation(z, derivative=True))
        return gradient
    
  2. Optimization Algorithms

    def newton_method(f, grad_f, hess_f, x0, n_iter=100):
        x = x0
        for _ in range(n_iter):
            grad = grad_f(x)
            hess = hess_f(x)
            x = x - np.linalg.solve(hess, grad)
        return x
    

Common Derivatives in ML

Activation Functions

class ActivationFunctions:
    @staticmethod
    def sigmoid(x, derivative=False):
        s = 1 / (1 + np.exp(-x))
        return s * (1 - s) if derivative else s
    
    @staticmethod
    def relu(x, derivative=False):
        if derivative:
            return np.where(x > 0, 1, 0)
        return np.maximum(0, x)
    
    @staticmethod
    def tanh(x, derivative=False):
        t = np.tanh(x)
        return 1 - t**2 if derivative else t

Loss Functions

class LossFunctions:
    @staticmethod
    def mse(y_true, y_pred, derivative=False):
        if derivative:
            return 2 * (y_pred - y_true)
        return np.mean((y_true - y_pred)**2)
    
    @staticmethod
    def cross_entropy(y_true, y_pred, derivative=False):
        eps = 1e-15
        if derivative:
            return -y_true / (y_pred + eps)
        return -np.sum(y_true * np.log(y_pred + eps))

Practical Considerations

Numerical Methods

def finite_difference_methods():
    # Forward difference
    def forward(f, x, h=1e-7):
        return (f(x + h) - f(x)) / h
    
    # Central difference
    def central(f, x, h=1e-7):
        return (f(x + h) - f(x - h)) / (2 * h)
    
    # Second derivative
    def second_order(f, x, h=1e-7):
        return (f(x + h) - 2*f(x) + f(x - h)) / h**2
    
    return forward, central, second_order

Implementation Issues

  • Gradient vanishing/exploding
  • Numerical stability
  • Computational efficiency
  • Choice of step size