Chapter 4: Multivariable Calculus & Gradients
Why This Chapter Matters
In Chapters 1-3, we explored calculus for functions with one input and one output — like temperature changing with time, or position changing with time. But the real world is far more complex!
Consider these scenarios:
- Weather: Temperature depends on both your location (latitude, longitude) and time
- Machine Learning: Your model's performance depends on thousands of parameters simultaneously
- Physics: The electric field depends on your position in three-dimensional space
- Medicine: Drug effectiveness depends on dosage, patient weight, age, genetics, and more
When we have multiple inputs affecting an output, we need multivariable calculus. This chapter teaches you how to understand and optimize systems where many things are changing at once — the foundation of modern machine learning, physics simulations, and engineering optimization.
What you'll master:
- How to measure sensitivity when multiple factors are changing
- How to find the steepest direction to climb a mountain (or minimize a loss function)
- How gradient descent powers machine learning
- How force fields work in physics
- How to optimize complex systems with many variables
Functions of Multiple Variables: The Real World is Multi-Dimensional
🌡️ Temperature Example: Why One Variable Isn't Enough
Imagine you're a meteorologist trying to predict temperature. In our previous single-variable world, you might have said:
"Temperature depends only on time of day." But that's obviously incomplete! Temperature also depends on:
- Location: It's colder at the North Pole than in Hawaii
- Elevation: It's colder on top of a mountain
- Season: January vs July makes a huge difference
So really, temperature is a function of multiple variables:
📐 Mathematical Representation
A multivariable function takes multiple inputs and produces an output:
Examples:
Simple quadratic:
- Takes two inputs
- Outputs one number
- Geometrically, this describes a paraboloid (like a bowl)
Distance function:
- Distance from origin to point
- Always positive
- Creates concentric circles of constant distance
Machine learning loss:
- Takes model parameters as inputs
- Outputs how "wrong" the model is
- We want to minimize this function
🎯 Visualizing Multivariable Functions
For functions of two variables, we can visualize them as 3D surfaces:
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
# Create a grid of x and y values
x = np.linspace(-3, 3, 50)
y = np.linspace(-3, 3, 50)
X, Y = np.meshgrid(x, y)
# Define different functions
Z1 = X**2 + Y**2 # Paraboloid (bowl shape)
Z2 = np.sin(X) * np.cos(Y) # Wavy surface
Z3 = X**2 - Y**2 # Saddle point
# Create subplot with three surfaces
fig = plt.figure(figsize=(15, 5))
# Paraboloid
ax1 = fig.add_subplot(131, projection='3d')
ax1.plot_surface(X, Y, Z1, cmap='viridis', alpha=0.7)
ax1.set_title('f(x,y) = x² + y²\n(Paraboloid)')
ax1.set_xlabel('x')
ax1.set_ylabel('y')
ax1.set_zlabel('f(x,y)')
# Wavy surface
ax2 = fig.add_subplot(132, projection='3d')
ax2.plot_surface(X, Y, Z2, cmap='plasma', alpha=0.7)
ax2.set_title('f(x,y) = sin(x)cos(y)\n(Wavy Surface)')
ax2.set_xlabel('x')
ax2.set_ylabel('y')
ax2.set_zlabel('f(x,y)')
# Saddle point
ax3 = fig.add_subplot(133, projection='3d')
ax3.plot_surface(X, Y, Z3, cmap='coolwarm', alpha=0.7)
ax3.set_title('f(x,y) = x² - y²\n(Saddle Point)')
ax3.set_xlabel('x')
ax3.set_ylabel('y')
ax3.set_zlabel('f(x,y)')
plt.tight_layout()
plt.show()
🔍 Understanding the Shapes
Paraboloid ():
- Bowl shape - has a clear minimum at
- As you move away from center in any direction, the function value increases
- This is like a "loss function" in ML - we want to find the bottom!
Saddle Point ():
- Horse saddle shape - goes up in -direction, down in -direction
- At , it's a minimum in one direction but maximum in another
- These are critical points that are neither minima nor maxima
Wavy Surface ():
- Complex landscape with many hills and valleys
- Shows how functions can have multiple local minima and maxima
- Common in real-world optimization problems
🎯 Why This Matters for Applications
Machine Learning: Your loss function might depend on thousands of parameters. Understanding the "shape" of this high-dimensional landscape helps you:
- Find good minima (train better models)
- Avoid getting stuck in bad local minima
- Choose appropriate optimization algorithms
Physics: Force fields, electric fields, gravitational fields - all depend on position in 3D space
Engineering: Optimizing designs often involves many variables simultaneously - material properties, dimensions, costs, performance metrics
Medicine: Drug interactions depend on multiple factors - dosages of different medications, patient characteristics, timing
Partial Derivatives: Measuring Change While Holding Things Constant
🏔️ The Mountain Hiking Analogy
Imagine you're standing on a mountainside. The elevation depends on both your east-west position () and your north-south position ():
Now, suppose you want to know: "If I take a small step east, how much will my elevation change?"
To answer this, you need to:
- Hold your north-south position fixed (don't move north or south)
- Take a tiny step east and see how elevation changes
- Measure the rate of change in that direction only
This is exactly what a partial derivative does!
🧮 Mathematical Definition
The partial derivative of with respect to is:
Key insight: Notice that stays the same in both and . We're only varying .
🎯 Intuitive Understanding
Partial derivative with respect to :
- "How fast does change as I increase , while keeping fixed?"
- It's like taking a regular derivative, but treating as a constant
Partial derivative with respect to :
- "How fast does change as I increase , while keeping fixed?"
- Treat as a constant and take the derivative with respect to
🧪 Step-by-Step Example
Let's compute partial derivatives for:
Finding :
Step 1: Treat as a constant (like the number 5 or )
Step 2: Differentiate with respect to :
Result:
Finding :
Step 1: Treat as a constant
Step 2: Differentiate with respect to :
Result:
🎮 Interactive Understanding
Let's visualize how partial derivatives work:
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
# Define our function f(x,y) = x²y + 3xy²
def f(x, y):
return x**2 * y + 3 * x * y**2
# Define partial derivatives
def df_dx(x, y):
return 2*x*y + 3*y**2
def df_dy(x, y):
return x**2 + 6*x*y
# Create a grid
x = np.linspace(-2, 2, 100)
y = np.linspace(-2, 2, 100)
X, Y = np.meshgrid(x, y)
Z = f(X, Y)
# Pick a specific point
x0, y0 = 1, 0.5
z0 = f(x0, y0)
# Create the visualization
fig = plt.figure(figsize=(15, 5))
# Main 3D surface
ax1 = fig.add_subplot(131, projection='3d')
ax1.plot_surface(X, Y, Z, alpha=0.6, cmap='viridis')
ax1.scatter([x0], [y0], [z0], color='red', s=100, label=f'Point ({x0}, {y0})')
ax1.set_title('f(x,y) = x²y + 3xy²')
ax1.set_xlabel('x')
ax1.set_ylabel('y')
ax1.set_zlabel('f(x,y)')
# Slice holding y constant (showing ∂f/∂x)
ax2 = fig.add_subplot(132)
x_slice = np.linspace(-2, 2, 100)
y_fixed = y0
z_slice = f(x_slice, y_fixed)
ax2.plot(x_slice, z_slice, 'b-', linewidth=2, label=f'f(x, {y_fixed})')
ax2.scatter([x0], [z0], color='red', s=100, zorder=5)
# Draw tangent line at the point
slope_x = df_dx(x0, y0)
tangent_x = z0 + slope_x * (x_slice - x0)
ax2.plot(x_slice, tangent_x, 'r--', alpha=0.7,
label=f'Tangent (slope = ∂f/∂x = {slope_x:.2f})')
ax2.set_title(f'Cross-section: y = {y_fixed} (constant)')
ax2.set_xlabel('x')
ax2.set_ylabel('f(x,y)')
ax2.legend()
ax2.grid(True)
# Slice holding x constant (showing ∂f/∂y)
ax3 = fig.add_subplot(133)
y_slice = np.linspace(-2, 2, 100)
x_fixed = x0
z_slice = f(x_fixed, y_slice)
ax3.plot(y_slice, z_slice, 'g-', linewidth=2, label=f'f({x_fixed}, y)')
ax3.scatter([y0], [z0], color='red', s=100, zorder=5)
# Draw tangent line at the point
slope_y = df_dy(x0, y0)
tangent_y = z0 + slope_y * (y_slice - y0)
ax3.plot(y_slice, tangent_y, 'r--', alpha=0.7,
label=f'Tangent (slope = ∂f/∂y = {slope_y:.2f})')
ax3.set_title(f'Cross-section: x = {x_fixed} (constant)')
ax3.set_xlabel('y')
ax3.set_ylabel('f(x,y)')
ax3.legend()
ax3.grid(True)
plt.tight_layout()
plt.show()
print(f"At point ({x0}, {y0}):")
print(f"∂f/∂x = {df_dx(x0, y0)} (slope in x-direction)")
print(f"∂f/∂y = {df_dy(x0, y0)} (slope in y-direction)")
🧠 Conceptual Insight
Why partial derivatives matter:
- Sensitivity analysis: Which variables have the biggest impact on your function?
- Optimization: Which direction should you move to increase/decrease the function?
- Approximation: How does the function behave near a specific point?
🔬 Real-World Applications
Machine Learning:
- If is your loss function, then:
- tells you how to adjust parameter
- tells you how to adjust parameter
Physics:
- Electric field: where is electric potential
- Each component shows the force in that direction
Economics:
- Production function depends on Labor and Capital
- = marginal productivity of labor
- = marginal productivity of capital
Medicine:
- Drug effectiveness depends on dose1, dose2, weight, age
- shows how sensitive effectiveness is to the first drug's dosage
The Gradient: The "Steepest Uphill" Vector
🧭 The Mountain Climbing Analogy
You're standing on a mountainside in dense fog. You can't see very far, but you have a magical compass that always points in the direction you should walk to climb upward as quickly as possible.
This magical compass is the gradient!
Here's what it tells you:
-
Direction: Which way to face to climb most steeply upward
-
Magnitude: How steep the climb is in that direction
- Large gradient = very steep terrain
- Small gradient = gentle slope
- Zero gradient = you're at a peak or valley
📊 Mathematical Definition
The gradient of a function is a vector made from all partial derivatives:
For 2D functions:
For 3D functions:
🎯 Why Gradients Point "Uphill"
Let's understand this intuitively. Suppose you're at point and you want to move a small distance in direction .
The directional derivative (rate of change in that direction) is:
This is the dot product of the gradient with your direction vector!
Key insight: The dot product is maximized when the two vectors point in the same direction. So points in the direction of maximum increase.
🧪 Step-by-Step Example
Let's compute the gradient of :
Step 1: Find partial derivatives
Step 2: Combine into gradient vector
Step 3: Interpret at specific points
- At : → points toward direction
- At : → points toward direction
- At : → no preferred direction (we're at the minimum!)
🎨 Visualizing Gradient Fields
Let's create beautiful visualizations to understand gradients:
import numpy as np
import matplotlib.pyplot as plt
# Create a grid of points
x = np.linspace(-3, 3, 20)
y = np.linspace(-3, 3, 20)
X, Y = np.meshgrid(x, y)
# Function: f(x,y) = x² + y²
Z = X**2 + Y**2
# Gradient components
dX = 2 * X # ∂f/∂x = 2x
dY = 2 * Y # ∂f/∂y = 2y
# Create the visualization
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
# 1. Contour plot with gradient vectors
ax1 = axes[0, 0]
contour = ax1.contour(X, Y, Z, levels=10, colors='gray', alpha=0.5)
ax1.clabel(contour, inline=True, fontsize=8)
ax1.quiver(X, Y, dX, dY, color='red', alpha=0.8, scale=50)
ax1.set_title('Gradients on Contour Plot\nf(x,y) = x² + y²')
ax1.set_xlabel('x')
ax1.set_ylabel('y')
ax1.grid(True, alpha=0.3)
ax1.set_aspect('equal')
# 2. Gradient magnitude
ax2 = axes[0, 1]
magnitude = np.sqrt(dX**2 + dY**2)
im = ax2.imshow(magnitude, extent=[-3, 3, -3, 3], origin='lower', cmap='hot')
ax2.contour(X, Y, magnitude, colors='white', alpha=0.5)
plt.colorbar(im, ax=ax2, label='|∇f|')
ax2.set_title('Gradient Magnitude\n|∇f| = √((2x)² + (2y)²)')
ax2.set_xlabel('x')
ax2.set_ylabel('y')
# 3. Different function: f(x,y) = x² - y² (saddle point)
Z2 = X**2 - Y**2
dX2 = 2 * X
dY2 = -2 * Y
ax3 = axes[1, 0]
contour2 = ax3.contour(X, Y, Z2, levels=15, colors='gray', alpha=0.5)
ax3.quiver(X, Y, dX2, dY2, color='blue', alpha=0.8, scale=50)
ax3.set_title('Saddle Point Function\nf(x,y) = x² - y²')
ax3.set_xlabel('x')
ax3.set_ylabel('y')
ax3.grid(True, alpha=0.3)
ax3.set_aspect('equal')
# 4. Wavy function: f(x,y) = sin(x)cos(y)
Z3 = np.sin(X) * np.cos(Y)
dX3 = np.cos(X) * np.cos(Y)
dY3 = -np.sin(X) * np.sin(Y)
ax4 = axes[1, 1]
contour3 = ax4.contour(X, Y, Z3, levels=10, colors='gray', alpha=0.5)
ax4.quiver(X, Y, dX3, dY3, color='green', alpha=0.8, scale=20)
ax4.set_title('Complex Landscape\nf(x,y) = sin(x)cos(y)')
ax4.set_xlabel('x')
ax4.set_ylabel('y')
ax4.grid(True, alpha=0.3)
ax4.set_aspect('equal')
plt.tight_layout()
plt.show()
🔍 Key Insights from the Visualizations
Paraboloid ():
- Gradients always point away from the center (0,0)
- Magnitude increases as you move away from center
- This creates a "flow field" toward the minimum
Saddle Point ():
- Complex gradient pattern
- Some directions go "uphill", others "downhill"
- Center point (0,0) has zero gradient but is neither min nor max
Wavy Surface ():
- Multiple local maxima and minima
- Gradients point toward nearby peaks
- Shows why optimization can be challenging
🎯 Gradient Properties
- Direction: Always points toward steepest increase
- Magnitude: Tells you how steep the increase is
- Zero Gradient: Critical points (peaks, valleys, saddle points)
- Perpendicular to Contours: Gradients always cross level curves at right angles
🧠 Intuitive Understanding: Why Perpendicular to Contours?
Think about contour lines on a topographic map:
- Contour lines connect points of equal elevation
- If you walk along a contour line, your elevation doesn't change
- The steepest uphill direction must be perpendicular to the contour
This is exactly what gradients do — they point perpendicular to contours, in the direction of steepest ascent!
🔬 Real-World Applications
Machine Learning - Gradient Descent:
- Loss function depends on model parameters
- Gradient points toward increasing loss (bad direction)
- Move in opposite direction:
- This minimizes loss and improves the model
Physics - Force Fields:
- Force is negative gradient of potential energy:
- Particles naturally move toward lower potential energy
- Examples: gravity, electric fields, magnetic fields
Engineering - Heat Flow:
- Heat flows from hot to cold regions
- Temperature gradient points toward increasing temperature
- Heat flow is proportional to (Fourier's law)
Computer Graphics:
- Gradients compute surface normals for lighting calculations
- Edge detection uses gradients to find rapid changes in image intensity
Gradients in Physics: Force Fields and Natural Laws
⚡ Forces from Potential Energy
One of the most beautiful applications of gradients is in physics, where forces are related to potential energy through:
Why the negative sign?
- Gradient points toward increasing potential energy
- Forces point toward decreasing potential energy (systems naturally move to lower energy states)
- Hence the negative sign
🌍 Gravitational Example
Gravitational potential energy near Earth's surface:
Gravitational force:
The negative sign indicates the force points downward (toward decreasing potential energy).
In 3D space, gravitational potential around a mass is:
The gravitational force is:
This points toward the center of mass — exactly what we expect!
🔌 Electric Fields
Electric potential creates an electric field:
Example: Point charge at origin
- Potential:
- Electric field: (points radially outward for positive )
🌡️ Heat Flow
Fourier's Law of heat conduction:
Where:
- = heat flux (energy per unit area per time)
- = thermal conductivity
- = temperature gradient
Physical meaning: Heat flows from hot to cold regions, proportional to the temperature gradient.
Gradients in Machine Learning: The Engine of AI
🎯 The Optimization Problem
Machine learning is fundamentally an optimization problem:
- Define a loss function that measures how "wrong" your model is
- Find parameter values that minimize this loss
- Use gradients to guide your search for the minimum
🚀 Gradient Descent: Following the Steepest Path Downhill
Basic idea: If gradients point "uphill", then negative gradients point "downhill" toward minima.
Update rule:
Where:
- = learning rate (how big steps to take)
- = gradient of loss function
- Minus sign = move in opposite direction of gradient (downhill)
🔍 Interactive Gradient Descent Visualization
Let's create a comprehensive visualization showing how gradient descent works:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
def gradient_descent_visualization():
# Define different loss functions to explore
def rosenbrock(x, y):
"""Rosenbrock function - challenging optimization landscape"""
return (1 - x)**2 + 100 * (y - x**2)**2
def rosenbrock_grad(x, y):
dx = -2*(1 - x) - 400*x*(y - x**2)
dy = 200*(y - x**2)
return dx, dy
def simple_quadratic(x, y):
"""Simple bowl-shaped function"""
return x**2 + y**2
def simple_grad(x, y):
return 2*x, 2*y
def saddle_point(x, y):
"""Saddle point function"""
return x**2 - y**2
def saddle_grad(x, y):
return 2*x, -2*y
# Choose function to optimize
func = simple_quadratic
grad_func = simple_grad
x_range, y_range = (-3, 3), (-3, 3)
# Create grid for contour plot
x = np.linspace(x_range[0], x_range[1], 100)
y = np.linspace(y_range[0], y_range[1], 100)
X, Y = np.meshgrid(x, y)
Z = func(X, Y)
# Gradient descent with different learning rates
def run_gradient_descent(start_point, lr, steps):
path = [start_point]
point = np.array(start_point, dtype=float)
for _ in range(steps):
grad = np.array(grad_func(point[0], point[1]))
point = point - lr * grad
path.append(point.copy())
# Stop if gradient is very small (near minimum)
if np.linalg.norm(grad) < 1e-6:
break
return np.array(path)
# Create visualization
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
# Different learning rates and starting points
scenarios = [
{'lr': 0.01, 'start': [2.5, 2.5], 'color': 'red', 'title': 'Small LR (0.01)'},
{'lr': 0.1, 'start': [2.5, 2.5], 'color': 'blue', 'title': 'Good LR (0.1)'},
{'lr': 0.5, 'start': [2.5, 2.5], 'color': 'green', 'title': 'Large LR (0.5)'},
{'lr': 0.1, 'start': [-2, 1.5], 'color': 'purple', 'title': 'Different Start'}
]
for i, scenario in enumerate(scenarios):
ax = axes[i//2, i%2]
# Plot contour
contour = ax.contour(X, Y, Z, levels=20, colors='gray', alpha=0.4)
ax.clabel(contour, inline=True, fontsize=8, fmt='%.1f')
# Run gradient descent
path = run_gradient_descent(scenario['start'], scenario['lr'], 100)
# Plot path
ax.plot(path[:, 0], path[:, 1], 'o-', color=scenario['color'],
linewidth=2, markersize=4, alpha=0.8, label='GD Path')
ax.plot(path[0, 0], path[0, 1], 'o', color=scenario['color'],
markersize=10, label='Start')
ax.plot(path[-1, 0], path[-1, 1], 's', color=scenario['color'],
markersize=10, label='End')
# Add gradient arrows at a few points
if len(path) > 5:
for j in range(0, min(len(path)-1, 20), 5):
x_pt, y_pt = path[j]
dx, dy = grad_func(x_pt, y_pt)
# Normalize for visualization
norm = np.sqrt(dx**2 + dy**2)
if norm > 1e-6:
dx, dy = dx/norm * 0.2, dy/norm * 0.2
ax.arrow(x_pt, y_pt, -dx, -dy, head_width=0.1,
head_length=0.05, fc='black', ec='black', alpha=0.6)
ax.set_xlim(x_range)
ax.set_ylim(y_range)
ax.set_xlabel('θ₁')
ax.set_ylabel('θ₂')
ax.set_title(f'{scenario["title"]}\nSteps: {len(path)-1}, Final loss: {func(path[-1, 0], path[-1, 1]):.3f}')
ax.grid(True, alpha=0.3)
ax.legend(fontsize=8)
ax.set_aspect('equal')
plt.tight_layout()
plt.show()
# Print analysis
print("🎯 Gradient Descent Analysis:")
print("=" * 50)
for i, scenario in enumerate(scenarios):
path = run_gradient_descent(scenario['start'], scenario['lr'], 100)
print(f"{scenario['title']}: {len(path)-1} steps, final loss = {func(path[-1, 0], path[-1, 1]):.6f}")
gradient_descent_visualization()
🧠 Key Insights from the Visualization
Learning Rate Effects:
- Too small (0.01): Very slow convergence, many steps needed
- Just right (0.1): Efficient convergence in reasonable steps
- Too large (0.5): May overshoot or oscillate
Starting Point: Different initial values can lead to different local minima in complex landscapes
🔬 Real ML Applications
Linear Regression:
- Loss:
- Gradients tell us how to adjust intercept and slope
Neural Networks:
- Backpropagation computes gradients with respect to all weights and biases
- Chain rule connects output error to input layer gradients
Logistic Regression:
- Loss:
- Gradients guide classification boundary optimization
🎯 Advanced Optimization Algorithms
Momentum:
Adam: Combines momentum with adaptive learning rates
All based on gradients — they just use gradient information more cleverly!
The Jacobian: When Outputs Are Vectors Too
🔄 From Single Output to Multiple Outputs
So far, we've studied functions with multiple inputs and single output:
But what about functions with multiple inputs AND multiple outputs?
Examples:
- Coordinate transformations: (Cartesian to polar)
- Neural network layers: Input vector → Output vector
- Physics: Position → Velocity vector
🧮 Mathematical Definition
For a vector-valued function :
The Jacobian matrix is:
Each row is the gradient of one output function.
🎯 Concrete Example: Coordinate Transformation
Cartesian to Polar coordinates:
Step 1: Find partial derivatives of
Step 2: Find partial derivatives of
Step 3: Assemble the Jacobian
🔍 What Does the Jacobian Tell Us?
Linear approximation: Near point , the function behaves like:
Geometric interpretation: The Jacobian tells us how small regions get stretched, rotated, and skewed by the transformation.
🎨 Visualizing Jacobian Transformations
import numpy as np
import matplotlib.pyplot as plt
def visualize_jacobian_transformation():
# Define a transformation: (x,y) -> (x + y, x - y)
def transform(x, y):
return x + y, x - y
def jacobian_transform(x, y):
# J = [[1, 1], [1, -1]]
return np.array([[1, 1], [1, -1]])
# Create a grid of points (unit square)
x = np.array([0, 1, 1, 0, 0]) # Square vertices + closing
y = np.array([0, 0, 1, 1, 0])
# Transform the points
x_new, y_new = transform(x, y)
# Create visualization
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
# Original space
ax1 = axes[0]
ax1.plot(x, y, 'b-o', linewidth=2, markersize=8, label='Original Square')
ax1.grid(True, alpha=0.3)
ax1.set_xlim(-0.5, 2.5)
ax1.set_ylim(-0.5, 2.5)
ax1.set_xlabel('x')
ax1.set_ylabel('y')
ax1.set_title('Original Space')
ax1.legend()
ax1.set_aspect('equal')
# Transformed space
ax2 = axes[1]
ax2.plot(x_new, y_new, 'r-o', linewidth=2, markersize=8, label='Transformed')
ax2.grid(True, alpha=0.3)
ax2.set_xlim(-0.5, 2.5)
ax2.set_ylim(-1.5, 1.5)
ax2.set_xlabel('u = x + y')
ax2.set_ylabel('v = x - y')
ax2.set_title('Transformed Space')
ax2.legend()
ax2.set_aspect('equal')
# Show both together
ax3 = axes[2]
ax3.plot(x, y, 'b-o', linewidth=2, markersize=8, label='Original', alpha=0.7)
ax3.plot(x_new, y_new, 'r-o', linewidth=2, markersize=8, label='Transformed', alpha=0.7)
# Draw transformation arrows
for i in range(len(x)-1): # Skip the last point (closing the square)
ax3.arrow(x[i], y[i], x_new[i]-x[i], y_new[i]-y[i],
head_width=0.1, head_length=0.05, fc='green', ec='green', alpha=0.6)
ax3.grid(True, alpha=0.3)
ax3.set_xlim(-0.5, 2.5)
ax3.set_ylim(-1.5, 2.5)
ax3.set_xlabel('x / u')
ax3.set_ylabel('y / v')
ax3.set_title('Transformation Visualization')
ax3.legend()
plt.tight_layout()
plt.show()
# Print the Jacobian
J = jacobian_transform(0, 0) # Constant in this case
print("Jacobian Matrix:")
print(J)
print(f"Determinant: {np.linalg.det(J)}")
print("This transformation has area scaling factor of", abs(np.linalg.det(J)))
visualize_jacobian_transformation()
🔬 Applications in Machine Learning
Neural Networks:
- Each layer is a function
- Backpropagation uses the chain rule with Jacobians
- Gradients flow backwards through network via Jacobian matrices
Generative Models:
- Transform simple noise to complex data
- Jacobian determinant appears in probability calculations
Optimization:
- Newton's method uses Jacobian for faster convergence
- Constrained optimization uses Jacobians of constraint functions
Chapter 4 Summary
🎯 Key Concepts Mastered
1. Multivariable Functions
- Why multiple variables: Real-world depends on many factors simultaneously
- Visualization: 3D surfaces, contour plots, complex landscapes
- Applications: Temperature fields, loss functions, force fields
2. Partial Derivatives
- Core idea: Rate of change while holding other variables constant
- Mountain analogy: Slope in one direction while staying on the same latitude/longitude
- Computation: Treat other variables as constants, differentiate normally
3. Gradients - The Steepest Direction
- Vector of partial derivatives:
- Geometric meaning: Points toward steepest increase, perpendicular to contours
- Magnitude: How steep the steepest direction is
4. Physics Applications
- Force fields: (forces from potential energy)
- Heat flow: (heat flows down temperature gradients)
- Electric fields: (electric field from potential)
5. Machine Learning Applications
- Gradient descent:
- Optimization: Following negative gradients to minimize loss
- Learning rates: Balance between speed and stability
6. Jacobian Matrices
- Multiple outputs: When functions return vectors, not just scalars
- Linear approximation: How transformations behave locally
- Applications: Neural networks, coordinate transformations, physics
🔗 Connections to Previous Chapters
- Chapter 1: Exponential/logarithmic functions appear in multivariable contexts
- Chapter 2: Partial derivatives extend single-variable derivative rules
- Chapter 3: Multiple integrals (coming in advanced topics) use gradients
🎯 Applications Preview
Coming in later chapters:
- Linear Algebra: Vectors and matrices provide the language for gradients and Jacobians
- Optimization: Advanced algorithms beyond basic gradient descent
- Machine Learning: Backpropagation, neural networks, deep learning
- Statistics: Maximum likelihood estimation uses gradients
🧮 Key Formulas to Remember
You now have the mathematical tools to understand and optimize complex systems where many variables interact — the foundation of modern AI and scientific computing! 🚀
Key Takeaways
- In Chapters 1-3, we explored calculus for functions with one input and one output — like temperature changing with time, or posit…
- When we have multiple inputs affecting an output, we need multivariable calculus.
- This chapter teaches you how to understand and optimize systems where many things are changing at once — the foundation of modern mac…
- Imagine you're a meteorologist trying to predict temperature.
- In our previous single-variable world, you might have said: