USAAIO Mathematical Foundations for AI - Linear Algebra, Probability, and Optimization - I
- 12 Sections
- 206 Lessons
- 14h Duration
Introduction to Math foundation for USAAIO
Google Colab + Markdown Programming
Linear-algebra - Vector and Vector Space
- 📘 Vectors, Matrices, and Matrix Multiplication
- What is a Vector?
- Understanding Vector Addition in 2D
- Vector Space
- From Vectors to Linear Independence
- Subspaces and Direct Sums
- Span, Basis, Linear Independence, Dimension & Rank
- Example The list (1, 2, −4), (7, −5, 6) is linearly independent in 𝐅3 but is not a basis of 𝐅3
Linear-algebra - Linear Maps/Linear Transformations
- Linear Maps (Linear Transformations) over Vector Spaces
- dimension of a vector space (V) ((dim(V)))
- Null T, Range T, and Rank–Nullity Theorem
- Advanced: How to proof Rank–Nullity Theorem
- Example of dim(kerT)
- Gaussian Elimination , Row Echelon Form and Finding basis of Image/Kern
- Injective and Surjective Linear Maps
- How to check matrix (M) (representing a linear map (T)) is injective, surjective, bijective
- Matrices from Linear Maps
- Definition: Rank of a Matrix
- Understanding Matrix and how to use it with example
- Transpose of a Matrix
- Column Rank, Row Rank, and Column–Row Factorization
- 🧭 Tutorial: Understanding Matrix Rank
- Invertible Linear Maps
- Find inverse : REF method ( find matrix P^-1)
- Advanced topic: General method of P^-1
- Why Linear Maps Act Like Matrix Multiplication
- Identity Matrix & Inverse Matrix (Sqaure Matrix)
- Matrix of a Composition of Linear Maps
- Change of Basis Formula
- Matrix of the Inverse of a Linear Map
- Isomorphisms and the Matrix Representation Map M
- Another easy to understand proof for L(V, W) =~ F^m,n
- Example how to check iosmorphism
- Linear map keeps structure
Linear-algebra - Eigenvalues and Eigenvectors
- Linear Operator, Invariant space, Eigenvalue, EigenVector
- Example: how to get the matrix of a linear operator 𝑇
- Recap: What must a “matrix of 𝑇 ” do?
- Example: How to Compute Eigenvalues and Eigenvectors
- Diagonalization Theorem A = P D P^{-1}
- Minimal Polynomial of T
- How to find the minimal polynomial of a linear operator (T)
- Eigenvalues are the zeros of the minimal polynomial
- Determinant and Invertibility of Matrix
- Advanced : Determinant — From Axioms to Computation Formulas
- Finding Eigenvalues and Eigenvectors
- Misc: T - aI = 0 vs (T - aI)v = 0
- 𝑇 not invertible ⟺ constant term of minimal polynomial of 𝑇 is 0
- Misc: operators on odd-dimensional vector spaces have eigenvalues
- T has no (real) eigenvalues ⟺ T−λI is invertible ( in real vector space)
- Upper-Triangular Matrices and Linear Operators
- Misc: matrix of (T) upper triangular
- Equation satisfied by operator with upper-triangular matrix
- Determination of eigenvalues from upper-triangular matrix
- Necessary and sufficient condition to have an upper-triangular matrix
- Advanced topic: Characteristic Polynomial vs Minimal Polynomial
- Every Linear Operator on a Complex Vector Space has Upper-Triangular Matix
- Diagonal Matrices and Eigenspaces
- Conditions equivalent to diagonalizability
- Enough eigenvalues implies diagonalizability
- Example: how do we get the diagonal matrix of T in the eigenvector basis?
- example: Using Diagonalization to Compute Powers of a Linear Operator
- Necessary and Sufficient Condition for Diagonalizability
- Advanced: why minimual polynormal of T 's distinct of zeros will determine the T can diagonal or not
- Advanced : Div(V), number of distinct eigenvalue, Rank(T), Minimal Poly degree
- Matrix, Rank, Eigenvalues, Trace, and Determinant
Linear-algebra - Inner product space and its operator ( geometric intuition)
- Commuting Operators in Linear Algebra
- Inner Product, Norms, Orthogonality, and Related Theorems
- Dot Product, Orthogonality, Transpose, Orthogonal Complement, and Least Squares
- Inner Product Space
- Orthonormal Bases, Gram–Schmidt, and Schur’s Theorem
- Examples of Gram-Schmidt orthogonalization
- Orthogonal Projection of a Vector onto a Subspace
- Linear transformation preserver structure
- Example of a Linear Transformation for Rotate, Stretch, Shear, Flip
- Operators on Inner Product Spaces: adjoint operator
- Real Orthogonal matrix properties
- Self-Adjoint Operators, Positive Operators
- Misc: Difference between T* and T-bar
- Example : how to cacluate S*
- Misc: ⟨Tv,w⟩=⟨v,T^∗ w⟩⟺⟨T^∗ w,v⟩=⟨w,Tv⟩
- Null space and range of 𝑇^∗ ( e.g range(S)⊥=null(S∗) )
- Misc: dim(range(T∗)) =dim( range(T) )
- Misc: proof of (TS)^∗=S^∗T^∗
- Advanced: how many operator are normal?
- Isometries and Unitary Operators
- isometry, unitary operator is injective,
- Example: Isometry and Unitary Operator
- Properties of ( T^*T )
- “positive operator” has non-negative eigenvalues
- 🌟 What is the Perron–Frobenius Theorem?
Linear-algebra - Factorization and SVD
- Fundamental Rank Factorization Theorem
- Spectral Theorem
- Unitary Matrices and QR Factorization
- QR Factorization
- Details on Proof of Uniqueness of QR
- Examples of Unitary Matrices and QR Factorization
- QR Factorization for Linearly Dependent Columns
- QR application: Solving (Ax=b) Using QR Factorization
- QR Application: Solving Least Squares Using QR Factorization
- QR Application: QR Algorithm for Eigenvalues (Symmetric)
- Symmetric and Positive Semi-Definite Matrices
- Symmetric Matrices, Eigenvalues, and Eigenvectors
- Cholesky factorization and its proof
- Symmetrics matrix eignevalues will determine positive definite
- Singular Values of T
- ✅ SVD Theorem (Singular Value Decomposition)
- Matrix Version of the Singular Value Decomposition (SVD)
- Example For SVD
- Advanced topic: Modern SVD Algorithms How SVD Is Really Computed
- Advanced: Trace ( more details /properties)
- Advanced topic: Determinants
- Determinant Application: Proof: a square matrix (A) is invertible if and only if (det(A) != 0 )
- Determinant Application: : Mv=0 has Nonzero solution exists ⇔ singular ( T is not invertiable)
- Determinant Application: misc: (det(A - lambda I) = 0 ) gives eigenvalues of A
- Advanced topic: LU Factorization
Linear Algebra Application for AI/ML
- 🎓 PCA (Principal Component Analysis) via Eigenvalue Decomposition
- ⭐ K-means Clustering (Column-Vector Linear Algebra Version + Intuition)
- ⭐ How to Choose (k) for K-means Clustering
- 📘 Affine Transformations
- Difference between affine transformation and linear transfomration
- Does a matrix only handle 2D?
- matrix vs. tensor
- Linear algebra is NOT only about matrices
- From Matrices to Tensors
- Advanced topic:tensor algebra
- Advanced topic: Nonlinear transformations in math
Probability & Statistics
- Probabilities Book reference and homework
- 🎲 Introduction to Probability
- Sample Space, Random Variables, Events, and Simulation of Discrete Probability
- 🎓 Basic Rules of Probability
- Example question for probabilities
- What is Conditional Probability?
- Advanced topic: joint probability P(A, B) or P( A && B)
- Joint and marginal probability
- 📘 Bayes’ Rule
- 📘 Understanding Bayes’ Theorem
- 🎯Advanced topic: Frequentist vs Bayesian Probability / Inference
- Probability Distributions (PMF, PDF, and CDF)
- 🎲 Uniform Distribution: Discrete vs. Continuous
- Bernoulli Distribution
- Sums of Random Variables (Discrete Case)
- Advanced topic: How Do We Compute the Probability of Sums of Multiple Random Variables?
- Binomial Distribution ( Sum of Bernoulli )
- Example of continuous distribution: Gaussian Distribution
- 📘 Gaussian Distribution & Conditional Distributions
- Advanced topic: Distributions of Functions of Random Variables
- 📊 : Mean, Variance, and Expectation
- 📊 Advanced topic: Covariance and Correlation
- Markov’s Inequality
- Chebyshev’s Inequality
- Law of Large Numbers (LLN)
- Hoeffding’s Inequality
- Advanced topic: Hoeffding’s Inequality Proof
- Central Limit Theorem (CLT)
Derivatives in multi-variable calculus
- Single-Variable Calculus
- Example: cost(x) = x^2 + 2x +1, to find the minimum cost
- Partial Derivatives
- Gradient, Jacobian, and Hessian
- ML example: Gradient Descent
- 📘 Gradients and Jacobians
- Directional Derivatives
- Chain Rule: From Single-Variable to Vector Calculus
- Hands-On Chain Rule Tutorial: Loss Function Example
- How to Deduce the Logistic Regression Loss Function — Step by Step
- Advanced topic: Numerical Gradient, Analytical Gradient, and Automatic Differentiation
- Taylor Expansion in Multiple Variables
- Stationary Points (Critical Points)
- First- and Second-Order Conditions in Calculus
Convex optimization
- what is Convex Optimization – A Foundation for Machine Learning
- Convex Sets & Convex Functions
- Convexity and Concavity for Vector Functions
- Why need convext set on domain x
- Gradient Descent & Projected Gradient Descent
- Newton’s Method (Single & Multiple Variables) and How It Differs from Gradient Descent
- Newton’s Method: Root-Finding and Optimization
- Maximize or minimize a function with contrait - Lagrange Multipliers Method
- What is infimum
- Duality and Lagrange Multipliers
- 🎯 Advanced: Duality in Optimization - real understanding
- Strong Duality in Optimization — Requirements and Intuition
- Optimality Conditions -KKT Conditions (Inequality + Equality)
- Convergence Guarantees
- Practical Algorithms for AI Competitions
- 📘 Practice – Code Gradient Descent for MSE Loss
- Gradient Descent and Newton’s Method for Learning β
- implementation for fit ( using gradient descent and newton's method)
- USAAIO authentic problem: proof of (weakly) convex for a Loss function
Python + AI Libraries
- 🧮 NumPy Tutorial: Arrays, Broadcasting, and Dot Product
- NumPy Shapes, Axes, Broadcasting, and Common Confusions
- 🧮 Matrix Inverse and Linear Regression
- Vectorized Fibonacci Computation
- forward function for Matrix for Wx + b
- how NumPy automatically batches over any leading dimensions?
- 📘 Use NumPy to Compute Variance, Covariance Matrix, etc.
- 🧼 Pandas Tutorial: Data Cleaning and Loading Datasets
- 🐼 Tutorial: Getting Started with pandas – Data Cleaning & Loading
- 🧩 pandas Merge and Join Tutorial
- One-Hot Encoding
- 🎨 Data Visualization with matplotlib and seaborn
- What is a Scaler in scikit-learn?
Course Overview
This course is a rigorous, math-focused course designed for students preparing for the USA Artificial Intelligence Olympiad (USAIO) and for anyone who wants a deep theoretical understanding of how modern AI algorithms work.
This course covers the official USAIO Mathematical Foundations for AI, emphasizing derivation, reasoning, and problem-solving rather than programming. Students will build strong mathematical intuition through structured lessons and extensive practice using a dedicated math homework system.
If you want to understand the algorithms behind machine learning—not just use tools or write code—this course lays the essential groundwork.
Core Topics
Mathematical Foundations for AI
-
Linear Algebra
-
Vectors and vector spaces
-
Linear and affine transformations
-
Eigenvalues and eigenvectors
-
Matrix factorizations (e.g., QR, SVD)
-
-
Probability & Statistics
-
Random variables and distributions
-
Expectation and variance
-
Bayes’ rule
-
Concentration inequalities (e.g., Hoeffding’s inequality)
-
-
Multivariable Calculus
-
Partial derivatives and gradients
-
Geometric interpretation of derivatives
-
-
Convex Optimization
-
Convex sets and functions
-
Gradient descent (conceptual and mathematical analysis)
-
Duality and optimization intuition in AI models
-
What You Will Learn
-
Mathematical foundations used in modern machine learning
-
How optimization algorithms are derived mathematically
-
How probability theory supports learning guarantees
-
How linear algebra structures AI models and data representations
-
How to solve competition-style theoretical problems
Course Features
-
Pure Math Focus
No programming required. Emphasis is on reasoning, derivations, and proofs. -
Practice-Driven Learning
Every topic includes targeted problem sets. -
Math Homework System
Auto-graded assignments for immediate feedback and mastery. -
Competition-Oriented
Problem difficulty and style aligned with USAIO Round 1.
Ideal For
-
Middle and high school students preparing for USAIO Round 1
-
Students strong in math who want to enter AI competitions
-
Olympiad-oriented learners interested in theoretical AI foundations
-
Anyone who wants to understand AI math without coding distractions
Course Duration
-
The whole course may last 2 semester
-
each semester 12–14 weeks
-
Follows Austin RRISD schedule
-
1 session per week
-
~1 hour per session + homework practice
Prerequisites
-
Solid algebra background
-
Basic single-variable calculus (limits and derivatives)
Materials Included
-
Lecture notes
-
Worked examples
-
Auto-graded math homework system
-
Competition-style practice problems
Learning Outcomes
By the end of this course, students will:
-
Master the mathematical foundations of AI
-
Confidently solve USAIO Round 1 math problems
-
Be well-prepared for future AI, ML, and advanced mathematics study
-
Develop strong analytical thinking and problem-solving skills
Ready to Start?
Join USAAIO Foundations and build the mathematical depth required for success in AI competitions and advanced AI study.
Want to submit a review? Login