Differentiation
Understanding derivatives, their properties, and applications in machine learning optimization.
Differentiation is a fundamental concept in calculus that measures rates of change and is essential for optimization in machine learning algorithms.
Fundamentals of Derivatives
Basic Concepts
- Definition:
- Geometric interpretation: Slope of tangent line
- Notation variations:
- Leibniz:
- Newton: or
- Prime: ,
- Continuity and differentiability conditions
Rules of Differentiation
- Power Rule:
- Product Rule:
- Quotient Rule:
- Chain Rule:
import numpy as np
from scipy.misc import derivative
# Numerical differentiation
def numerical_derivative(f, x, h=1e-7):
return (f(x + h) - f(x)) / h
# Example usage
def f(x): return x**2 + np.sin(x)
x0 = 2.0
df_dx = numerical_derivative(f, x0)
Partial Derivatives
Multivariate Functions
- Definition:
- Mixed partials:
- Applications in ML:
def partial_derivatives(f, point, eps=1e-7): n = len(point) grads = np.zeros(n) for i in range(n): h = np.zeros(n) h[i] = eps grads[i] = (f(point + h) - f(point)) / eps return grads
Gradient Vector
- Definition:
- Properties:
- Direction of steepest increase
- Perpendicular to level curves/surfaces
- Gradient descent implementation:
def gradient_descent(f, grad_f, x0, lr=0.01, n_iter=1000): x = x0 history = [x0] for _ in range(n_iter): grad = grad_f(x) x = x - lr * grad history.append(x) return x, history
Higher-Order Derivatives
Second Derivatives
- Definition:
- Hessian matrix:
def hessian(f, x, eps=1e-7):
n = len(x)
H = np.zeros((n, n))
for i in range(n):
for j in range(n):
h_i = np.zeros(n)
h_j = np.zeros(n)
h_i[i] = eps
h_j[j] = eps
H[i,j] = (f(x + h_i + h_j) - f(x + h_i) - f(x + h_j) + f(x)) / (eps**2)
return H
Applications in ML
-
Neural Network Training
def backpropagation_example(X, y, weights, activation): # Forward pass z = np.dot(X, weights) a = activation(z) # Backward pass error = a - y gradient = np.dot(X.T, error * activation(z, derivative=True)) return gradient
-
Optimization Algorithms
def newton_method(f, grad_f, hess_f, x0, n_iter=100): x = x0 for _ in range(n_iter): grad = grad_f(x) hess = hess_f(x) x = x - np.linalg.solve(hess, grad) return x
Common Derivatives in ML
Activation Functions
class ActivationFunctions:
@staticmethod
def sigmoid(x, derivative=False):
s = 1 / (1 + np.exp(-x))
return s * (1 - s) if derivative else s
@staticmethod
def relu(x, derivative=False):
if derivative:
return np.where(x > 0, 1, 0)
return np.maximum(0, x)
@staticmethod
def tanh(x, derivative=False):
t = np.tanh(x)
return 1 - t**2 if derivative else t
Loss Functions
class LossFunctions:
@staticmethod
def mse(y_true, y_pred, derivative=False):
if derivative:
return 2 * (y_pred - y_true)
return np.mean((y_true - y_pred)**2)
@staticmethod
def cross_entropy(y_true, y_pred, derivative=False):
eps = 1e-15
if derivative:
return -y_true / (y_pred + eps)
return -np.sum(y_true * np.log(y_pred + eps))
Practical Considerations
Numerical Methods
def finite_difference_methods():
# Forward difference
def forward(f, x, h=1e-7):
return (f(x + h) - f(x)) / h
# Central difference
def central(f, x, h=1e-7):
return (f(x + h) - f(x - h)) / (2 * h)
# Second derivative
def second_order(f, x, h=1e-7):
return (f(x + h) - 2*f(x) + f(x - h)) / h**2
return forward, central, second_order
Implementation Issues
- Gradient vanishing/exploding
- Numerical stability
- Computational efficiency
- Choice of step size