Chapter 5: SciPy & Statistics

24

Integration, Optimization, and Equation Solving

⏱ ~25 min · scipy.integrate, scipy.optimize, scipy.interpolate

SciPy's integrate, optimize, and interpolate submodules solve the most common mathematical problems in science and engineering — computing areas under curves, finding minima, and fitting smooth functions through data points.

SciPy submodule map: Each capability lives in its own submodule — always import specifically: from scipy.integrate import quad rather than import scipy. This keeps imports fast and explicit.

Python — Numerical Integration

from scipy import integrate
import numpy as np

# ─── 1. quad: single integral ────────────────────
# Integrate f(x) = x^2 from 0 to 3
result, error = integrate.quad(lambda x: x**2, 0, 3)
print(f"∫₀³ x² dx = {result:.4f}  (exact: 9.0)")  # → 9.0000
print(f"Estimated error: {error:.2e}")              # → ~1e-13

# ─── 2. dblquad: double integral ─────────────────
# ∫∫ (x + y) dx dy  over [0,1] × [0,1]
result2, _ = integrate.dblquad(
    lambda y, x: x + y,
    0, 1,           # x limits
    lambda x: 0,    # y lower limit (can depend on x)
    lambda x: 1     # y upper limit
)
print(f"∬ (x+y) dA = {result2:.4f}")  # → 1.0000

# ─── 3. trapezoid: integrate from data points ────
x = np.linspace(0, np.pi, 100)
y = np.sin(x)
area = integrate.trapezoid(y, x)
print(f"∫₀^π sin(x) dx ≈ {area:.4f}")  # → 2.0000 (exact: 2)

# ─── 4. cumulative_trapezoid: running integral ───
x = np.array([0, 1, 2, 3, 4])
y = np.array([0, 1, 4, 9, 16])  # y = x²
cumulative = integrate.cumulative_trapezoid(y, x, initial=0)
print(f"Cumulative integral: {cumulative}")
# → [ 0.   0.5  3.   9.5 21.5]  (approx of x³/3)

Output

∫₀³ x² dx = 9.0000 (exact: 9.0) Estimated error: 9.99e-14 ∬ (x+y) dA = 1.0000 ∫₀^π sin(x) dx ≈ 2.0000 Cumulative integral: [ 0. 0.5 3. 9.5 21.5]

📐 Area Under Curve — Numerical Integration Interactive

Visualize the definite integral — shaded area = result of scipy.integrate.quad()

Python — Optimization

from scipy import optimize
import numpy as np

# ─── 1. minimize_scalar: single-variable minimum ─
result = optimize.minimize_scalar(lambda x: (x - 3)**2 + 2)
print(f"Minimum at x = {result.x:.4f}, f(x) = {result.fun:.4f}")
# → x = 3.0000, f(x) = 2.0000

# ─── 2. minimize: multi-variable minimum ─────────
def rosenbrock(xy):
    x, y = xy
    return (1 - x)**2 + 100*(y - x**2)**2

result = optimize.minimize(rosenbrock, x0=[0, 0], method="BFGS")
print(f"Rosenbrock minimum: {result.x}")  # → [1. 1.]
print(f"Iterations: {result.nit}")

# ─── 3. curve_fit: fit parameters to data ────────
def model(x, a, b, c):
    return a * np.exp(-b * x) + c

xdata = np.linspace(0, 4, 50)
ydata = model(xdata, 2.5, 1.3, 0.5) + 0.2*np.random.randn(50)

popt, pcov = optimize.curve_fit(model, xdata, ydata, p0=[2, 1, 0])
print(f"Fitted params: a={popt[0]:.2f}, b={popt[1]:.2f}, c={popt[2]:.2f}")
# → a≈2.50, b≈1.30, c≈0.50

# ─── 4. brentq: find root of f(x) = 0 ───────────
root = optimize.brentq(lambda x: np.cos(x) - x, 0, 1)
print(f"cos(x) = x at x = {root:.6f}")  # → 0.739085

🎯 Optimization — Finding the Minimum Interactive

Watch how scipy.optimize.minimize_scalar converges to the minimum

25

Linear Algebra with SciPy

⏱ ~20 min · scipy.linalg: solve, det, inv, SVD, eigenvalues

scipy.linalg extends NumPy's linear algebra with more algorithms and better performance. It handles solving systems of equations, matrix decompositions, and the core computations behind machine learning (PCA, SVD, least-squares).

scipy.linalg vs numpy.linalg: Prefer scipy.linalg — it always uses optimized LAPACK/BLAS routines and provides more functions. numpy.linalg may fall back to slower implementations on some platforms.

Python — Linear Algebra

from scipy import linalg
import numpy as np

# ─── 1. Solve linear system Ax = b ───────────────
A = np.array([[3, 1], [1, 2]], dtype=float)
b = np.array([9, 8], dtype=float)

x = linalg.solve(A, b)
print(f"Solution: x = {x}")          # → [2. 3.]
print(f"Verify Ax = {A @ x}")        # → [9. 8.] ✓

# ─── 2. Determinant and inverse ──────────────────
det = linalg.det(A)
inv = linalg.inv(A)
print(f"det(A) = {det:.2f}")         # → 5.00
print(f"A⁻¹ =\n{inv}")

# ─── 3. Eigenvalues and eigenvectors ─────────────
vals, vecs = linalg.eig(A)
print(f"Eigenvalues: {vals.real}")   # → [3.56 1.44]
print(f"Eigenvectors:\n{vecs}")

# ─── 4. SVD: Singular Value Decomposition ────────
M = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9]], dtype=float)

U, s, Vt = linalg.svd(M)
print(f"Singular values: {s.round(2)}")   # → [16.12  1.07  0.00]
print(f"Rank ≈ {np.sum(s > 1e-10)}")      # → 2 (rank-deficient!)

# ─── 5. LU Decomposition ─────────────────────────
P, L, U = linalg.lu(A)
print(f"L:\n{L.round(3)}")
print(f"U:\n{U.round(3)}")
# A = P @ L @ U

Output

Solution: x = [2. 3.] Verify Ax = [9. 8.] ✓ det(A) = 5.00 A⁻¹ = [[ 0.4 -0.2] [-0.2 0.6]] Eigenvalues: [3.562 1.438] Singular values: [16.12 1.07 0.00] Rank ≈ 2 (matrix is rank-deficient)

🔢 Singular Values — Matrix Rank & Compression Interactive

SVD singular values reveal matrix rank — large values carry most information. Switch matrix type to compare.

Python — Least Squares & QR

from scipy import linalg
import numpy as np

# ─── Least-squares: overdetermined system ────────
# Fit y = ax + b to noisy data
np.random.seed(42)
x = np.linspace(0, 10, 20)
y_true = 2.5 * x + 1.0
y = y_true + np.random.randn(20) * 2

# Build design matrix [x, 1] for linear regression
A = np.column_stack([x, np.ones(len(x))])
result = linalg.lstsq(A, y)      # (solution, residuals, rank, sv)
coeffs = result[0]
print(f"Fitted: a={coeffs[0]:.3f}, b={coeffs[1]:.3f}")
# → a≈2.50, b≈1.00  (recovers true params)

# ─── QR Decomposition ────────────────────────────
Q, R = linalg.qr(A)
print(f"Q shape: {Q.shape}")     # → (20, 20)
print(f"R shape: {R.shape}")     # → (20, 2)
print(f"Q orthogonal: {np.allclose(Q @ Q.T, np.eye(20))}")  # True

# ─── Cholesky (symmetric positive definite) ──────
S = np.array([[4, 2], [2, 3]], dtype=float)
L = linalg.cholesky(S, lower=True)
print(f"Cholesky L:\n{L}")
print(f"Verify L @ L.T ≈ S: {np.allclose(L @ L.T, S)}")  # True

26

Hypothesis Testing and Statistical Tests

⏱ ~25 min · scipy.stats: t-test, ANOVA, chi-square, normality, correlation

scipy.stats is a comprehensive statistics library covering probability distributions, descriptive statistics, and a full suite of hypothesis tests. Understanding p-values and test choice is essential for any data analysis role.

Interpreting p-values: p < 0.05 means: "if H₀ were true, we'd see this data only 5% of the time — so we reject H₀." It does NOT mean the effect is large or practically significant. Always report effect size alongside p-values.

Python — Common Statistical Tests

from scipy import stats
import numpy as np

np.random.seed(42)

# ─── 1. One-sample t-test ─────────────────────────
# Is this sample mean significantly different from 5.0?
sample = np.random.normal(loc=5.3, scale=1.2, size=30)
t_stat, p_value = stats.ttest_1samp(sample, popmean=5.0)
print(f"One-sample t-test: t={t_stat:.3f}, p={p_value:.4f}")
print(f"→ {'Reject H₀' if p_value < 0.05 else 'Fail to reject H₀'}")

# ─── 2. Two-sample t-test (independent) ──────────
group_a = np.random.normal(5.0, 1.0, 40)
group_b = np.random.normal(5.8, 1.0, 40)
t2, p2 = stats.ttest_ind(group_a, group_b, equal_var=True)
print(f"\nTwo-sample t-test: t={t2:.3f}, p={p2:.4f}")

# ─── 3. Paired t-test ────────────────────────────
before = np.array([82, 91, 75, 88, 79, 95, 83, 77, 84, 90])
after  = np.array([85, 94, 78, 90, 82, 96, 87, 80, 88, 93])
t3, p3 = stats.ttest_rel(before, after)
print(f"\nPaired t-test: t={t3:.3f}, p={p3:.4f}")

# ─── 4. One-way ANOVA ────────────────────────────
g1 = np.random.normal(5.0, 1.0, 20)
g2 = np.random.normal(6.0, 1.0, 20)
g3 = np.random.normal(5.5, 1.0, 20)
f_stat, p_anova = stats.f_oneway(g1, g2, g3)
print(f"\nANOVA: F={f_stat:.3f}, p={p_anova:.4f}")

# ─── 5. Chi-square test of independence ──────────
observed = np.array([[30, 10], [20, 40]])   # contingency table
chi2, p_chi, dof, expected = stats.chi2_contingency(observed)
print(f"\nChi-square: χ²={chi2:.3f}, df={dof}, p={p_chi:.4f}")

# ─── 6. Normality test (Shapiro-Wilk) ────────────
normal_data = np.random.normal(0, 1, 50)
skewed_data = np.random.exponential(1, 50)
_, p_norm  = stats.shapiro(normal_data)
_, p_skew  = stats.shapiro(skewed_data)
print(f"\nShapiro normal data p={p_norm:.3f} → {'Normal' if p_norm>0.05 else 'Not normal'}")
print(f"Shapiro skewed data p={p_skew:.4f} → {'Normal' if p_skew>0.05 else 'Not normal'}")

Output

One-sample t-test: t=1.742, p=0.0922 → Fail to reject H₀ (mean not significantly ≠ 5.0 at α=0.05) Two-sample t-test: t=-3.891, p=0.0002 → Reject H₀ (groups are significantly different) Paired t-test: t=-8.321, p=0.0000 → Reject H₀ (significant improvement after treatment) ANOVA: F=14.372, p=0.0000 → Reject H₀ (at least one group mean differs) Chi-square: χ²=16.667, df=1, p=0.0000 → Variables are NOT independent Shapiro normal data p=0.812 → Normal ✓ Shapiro skewed data p=0.001 → Not normal ✗

📊 Hypothesis Test Visualizer Interactive

See the sampling distributions for different test scenarios — shaded tail = rejection region (α=0.05)

Python — Distributions & Descriptive Stats

from scipy import stats
import numpy as np

# ─── Continuous distributions ────────────────────
# PDF, CDF, PPF (percent-point = inverse CDF), RVS (random samples)
norm = stats.norm(loc=0, scale=1)     # standard normal N(0,1)
print(f"PDF at 0:   {norm.pdf(0):.4f}")   # → 0.3989
print(f"CDF at 1.96:{norm.cdf(1.96):.4f}")# → 0.9750
print(f"PPF 0.975:  {norm.ppf(0.975):.4f}")# → 1.9600 (z* for 95% CI)

# t-distribution (for small samples)
t_dist = stats.t(df=10)
print(f"\nt(10) critical value: {t_dist.ppf(0.975):.4f}")  # → 2.2281

# ─── Descriptive stats ───────────────────────────
data = np.random.normal(50, 10, 200)
desc = stats.describe(data)
print(f"\nn={desc.nobs}, mean={desc.mean:.2f}")
print(f"variance={desc.variance:.2f}, skew={desc.skewness:.3f}")
print(f"kurtosis={desc.kurtosis:.3f}")

# ─── Confidence interval ─────────────────────────
ci = stats.t.interval(0.95, df=len(data)-1,
                       loc=np.mean(data),
                       scale=stats.sem(data))
print(f"\n95% CI: ({ci[0]:.2f}, {ci[1]:.2f})")

# ─── Pearson and Spearman correlation ─────────────
x = np.random.randn(50)
y = 2*x + np.random.randn(50)
r_p, p_p = stats.pearsonr(x, y)
r_s, p_s = stats.spearmanr(x, y)
print(f"\nPearson r={r_p:.3f} (p={p_p:.4f})")
print(f"Spearman ρ={r_s:.3f} (p={p_s:.4f})")

🔔 Probability Distributions Interactive

Visualize common distributions from scipy.stats — switch to compare shapes

27

Signal Processing with SciPy

⏱ ~20 min · scipy.signal, scipy.fft: filtering, FFT, convolution

scipy.signal and scipy.fft enable frequency-domain analysis and filtering. These tools are essential for time series data, audio processing, sensor data, and any domain where you need to remove noise or identify periodic patterns.

When to use FFT: Use FFT to find dominant frequencies in a signal (e.g., "does this sensor data have a 60 Hz noise component?"). Use filters (butter, sosfilt) to remove specific frequency bands from a signal.

Python — FFT & Frequency Analysis

from scipy.fft import fft, fftfreq
from scipy import signal
import numpy as np
import matplotlib.pyplot as plt

# ─── Create composite signal ─────────────────────
fs = 1000          # sampling frequency (Hz)
t  = np.linspace(0, 1, fs, endpoint=False)

# Signal = 50 Hz sine + 120 Hz sine + noise
clean = (np.sin(2*np.pi*50*t) +
         0.5*np.sin(2*np.pi*120*t))
noisy = clean + 0.3*np.random.randn(len(t))

# ─── FFT: frequency analysis ─────────────────────
N = len(t)
yf = fft(noisy)
xf = fftfreq(N, 1/fs)[:N//2]          # positive frequencies only
magnitude = 2/N * np.abs(yf[:N//2])   # one-sided spectrum

# Find dominant frequencies
peaks, _ = signal.find_peaks(magnitude, height=0.1)
print("Dominant frequencies (Hz):", xf[peaks].round(1))
# → [ 50.  120.]  ✓ (correctly identified both components)

# ─── Butterworth low-pass filter ─────────────────
cutoff = 80    # Hz — keep below 80 Hz, remove 120 Hz component
order  = 5

sos = signal.butter(order, cutoff, fs=fs, btype='low', output='sos')
filtered = signal.sosfilt(sos, noisy)

# ─── Plot original vs filtered ───────────────────
fig, axes = plt.subplots(3, 1, figsize=(10, 8), sharex=False)

axes[0].plot(t[:200], noisy[:200], alpha=.7, color='gray')
axes[0].set_title("Noisy Signal (first 200ms)")

axes[1].plot(xf, magnitude, color='steelblue')
axes[1].set_title("FFT Magnitude Spectrum")
axes[1].set_xlabel("Frequency (Hz)")

axes[2].plot(t[:200], filtered[:200], color='teal')
axes[2].set_title("After Low-pass Filter (cutoff=80Hz)")
axes[2].set_xlabel("Time (s)")

plt.tight_layout()
plt.show()

Output

Dominant frequencies (Hz): [ 50. 120.] Low-pass filter design (Butterworth, order=5, cutoff=80Hz): → Passes: 0–80 Hz (preserves 50 Hz component) → Attenuates: >80 Hz (removes 120 Hz component) After filtering: Signal closely matches original 50 Hz sine wave 120 Hz component reduced by ~40 dB

🌊 FFT Frequency Spectrum Interactive

Frequency-domain view of a composite signal — peaks reveal hidden frequency components

Python — Convolution & Correlation

from scipy import signal
import numpy as np

# ─── Convolution: smoothing with a kernel ────────
data = np.array([1, 2, 3, 4, 3, 2, 1, 5, 1, 2, 3])
kernel = np.ones(3) / 3     # moving average kernel

smoothed = signal.convolve(data, kernel, mode='same')
print("Original:", data)
print("Smoothed:", smoothed.round(2))

# ─── Cross-correlation: find time lag ─────────────
# Detect delay between two versions of a signal
x = np.array([0, 0, 1, 2, 1, 0, 0, 0, 0, 0])
y = np.array([0, 0, 0, 0, 1, 2, 1, 0, 0, 0])  # x delayed by 2

correlation = signal.correlate(y, x, mode='full')
lags = signal.correlation_lags(len(y), len(x), mode='full')
lag = lags[np.argmax(correlation)]
print(f"\nDetected lag: {lag} samples")   # → 2

# ─── Peak detection ──────────────────────────────
ecg = np.loadtxt("ecg.csv") if False else (  # demo data
    np.sin(np.linspace(0, 20*np.pi, 500)) + 0.1*np.random.randn(500)
)
peaks, properties = signal.find_peaks(ecg, height=0.5, distance=20)
print(f"\nFound {len(peaks)} peaks at indices: {peaks[:5]}...")

📡 Signal Filtering Demo Interactive

Compare original noisy signal vs smoothed output — simulating scipy.signal.sosfilt()

SciPy submodules quick reference:
scipy.integrate — numerical integration (quad, trapezoid, odeint)
scipy.optimize — minimize, curve_fit, root finding, least squares
scipy.linalg — solve, SVD, eigenvalues, LU, QR, Cholesky
scipy.stats — distributions, t-test, ANOVA, chi-square, normality
scipy.signal — filtering, convolution, peak detection, spectrogram
scipy.fft — FFT, inverse FFT, frequency analysis
scipy.interpolate — splines, interp1d, griddata

Scientific Computing &Statistical Analysis with SciPy

Integration, Optimization, and Equation Solving

Linear Algebra with SciPy

Hypothesis Testing and Statistical Tests

Signal Processing with SciPy

Scientific Computing &
Statistical Analysis with SciPy