Intelligent Systems – Interactive Study Guide

🧠 What Are Intelligent Systems?

Intelligent systems solve problems autonomously — with reduced or no human intervention. Classical algorithm design fails when problems are too complex, when solution spaces are astronomically large, or when no known algorithm can solve them in acceptable time.

Key motivation: 1 cm³ of molecules contains ~2.7×10¹⁹ candidates. Testing all combinations is impossible — intelligent systems explore smartly.

🧬

Drug Discovery

Find the right molecule from trillions of candidates

🚗

Self-Driving Cars

Navigate 3D space with real-time decisions

💹

Portfolio Optimization

Maximize returns subject to risk constraints

🌿 Bioinspired Algorithms

🧠

Artificial Neural Networks
Inspired by the human brain

🧬

Genetic Algorithms / GP
Darwin's theory of evolution

🐦

Particle Swarm Optimization
Bird flocking & fish schooling

🌡️

Simulated Annealing
Metallurgical heat treatment

📚 Topics Covered

Ch.1 Introductionpp. 1–10

Ch.2 Optimization & Local Searchpp. 11–50

Ch.3 Genetic Algorithmspp. 45–103

Ch.4 Particle Swarm Optimizationpp. 105–112

Tools mentioned: Weka (Java) & Scikit-learn (Python) — popular environments for comparing CI methods, directly motivated by the No Free Lunch Theorem.

🎯 Optimization Problems

An optimization problem is defined by a search space S and a fitness function f. The goal: find s* ∈ S that maximizes or minimizes f(s*).

Instance = (S, f)
S = search space (all feasible solutions)
f : S → ℝ = fitness function
Goal: s* = argmax f(s) or argmin f(s)

Combinatorial optimization: S is finite but enormous — e.g., 2¹⁰⁰ bit strings, or (n-1)!/2 TSP tours. Exhaustive search is impossible.

🎒 Knapsack Problem Interactive

Select items to maximize value without exceeding 15 kg capacity.

⚖️ Weight: 0/15 kg 💰 Value: 0

Click items to add to knapsack.

🗺️ Travelling Salesman Problem Interactive

Click to place cities, then solve the shortest tour.

Click canvas to add cities.

📊 Why Exhaustive Search Fails

Solution space size grows exponentially — heuristics are essential

🍽️ No Free Lunch Theorem

Wolpert & Macready (1997): when averaged over all possible problems, any two algorithms perform identically. No universal "best" algorithm exists.

∑_f P(d_m | f, m, A₁) = ∑_f P(d_m | f, m, A₂)

→ Specialized algorithms beat general ones on specific problems
→ But lose on other problems

Implication 1: Always benchmark multiple algorithms on your specific problem domain before choosing.

Implication 2: Exploit domain knowledge. An algorithm that uses problem structure outperforms one that ignores it — on that class of problems.

📉 Algorithm Performance Trade-off

A wins on some problems, B wins on others — they balance out overall

💡 Consequences

Statement	True?
"Algorithm X is always best"	❌ False
"SA beats HC on rugged landscapes"	✅ True (on average)
"No single best algorithm exists"	✅ True (NFL theorem)
"Problem-specific algorithms can win"	✅ True
"Weka/Scikit-learn compare algorithms"	✅ Motivated by NFL

⛰️ Hill Climbing

The simplest local search algorithm. Start at a random solution, move to a better neighbor, repeat until no improvement is possible.

Algorithm 1: Hill Climbing ──────────────────────────────────────── 1. Initialize random solution i_start 2. i := i_start 3. repeat 3.1. Generate neighbor j from N(i) 3.2. if f(j) ≥ f(i) then i := j until ∀j ∈ N(i): f(j) < f(i) 4. return i

Core flaw: HC always terminates at a local optimum — no guarantee it's the global one. "Blindly following fitness" is a losing strategy on rugged or deceptive landscapes.

🎮 Hill Climbing Simulator Live

Landscape

Press Step or Auto to run.

✅ vs ❌ Strengths & Weaknesses

Property	Result
Simple to implement	✅ Very
Fast per iteration	✅ O(\|N(i)\|)
Works for any N	✅ General
Escapes local optima	❌ Never
Finds global optimum	❌ No guarantee
Deceptive problems	❌ Very poor

Neighborhood N: A mapping N: S → 2^S. For bit strings, the 1-bit flip (Hamming-1) neighborhood is standard. Larger N explores more but costs more per step.

🗺️ Fitness Landscapes

A Fitness Landscape (FL) visualizes the relationship between solution structure and fitness value. The shape of the landscape determines how hard a problem is for local search algorithms.

Why it matters: HC is an "uphill climber" — it stops at any peak. The landscape shape tells us whether that peak is the global optimum or a poor local trap.

📊 Landscape Explorer Interactive

🔎 Why We Can't Draw FLs

Vast search space: 2¹⁰⁰ solutions — impossible to enumerate
High dimensionality: Neighborhood of size k → k-dimensional FL
Dynamic problems: Some fitness functions change over time

Consequence: We can rarely know the landscape in advance — adaptive methods like SA are essential.

💡 Landscape → Algorithm Choice

🟢

Smooth: HC always succeeds. Gradient methods work.

🟠

Rugged: Need SA, GA — algorithms that escape local optima.

🔴

Deceptive: Fitness misleads search. Population-based methods (GA) handle better.

🔵

Neutral: Large plateaus — need exploration strategies.

🌡️ Simulated Annealing

SA (Kirkpatrick et al. 1983) extends HC using inspiration from metallurgical annealing — slowly cooling a material to reach its minimum energy state. The key idea: accept worse solutions with decreasing probability to escape local optima.

P(accept j) = { 1, if f(j) ≥ f(i) [better solution] e^(−|f(j)−f(i)| / c), otherwise [worse solution] } c = control parameter (like temperature, decreases over time)

Physics analogy: Boltzmann distribution — P ∝ e^(−ΔE / k_B·T). Replace energy with fitness, temperature with c.

📐 Acceptance Probability Explorer Interactive

Control param c: 1.0

Fitness diff |Δf|: 1.0

P(accept)

36.8%

🎮 SA Live Simulator Interactive

Initial c: 3.0

Cooling α: 1.05

Press Start to run.

📐 Theory: Asymptotic Convergence

Lemma 2.1 (Stationary probability): After many transitions with constant c, SA stabilizes on solution i with probability:

P{X=i} = e^(−f(i)/c) / Σ_{j∈S} e^(−f(j)/c) [Boltzmann distribution]

Theorem 2.3 (Asymptotic Convergence): As c → 0, SA stabilizes on global optima with probability 1, uniformly distributed over all global optima.

Key caveat: The theorem guarantees convergence in the limit (c → 0, infinite time). It says nothing about convergence speed. Parameter tuning is crucial and problem-dependent.

c value: 1.0

⚖️ HC vs SA Comparison

Property	Hill Climbing	Simulated Annealing
Accepts worse solutions?	❌ Never	✅ With prob e^(−Δf/c)
Escapes local optima?	❌ No	✅ Yes
Global optimum guarantee?	❌ Local only	✅ Asymptotically
Neighbor selection	Best neighbor	Random neighbor
Parameters	✅ None	⚠️ c, L, schedule

🧬 Genetic Algorithms

GAs (Holland 1975, Goldberg 1989) are Evolutionary Computation methods inspired by Darwin's theory of natural selection. Unlike HC/SA, GAs work with a population of solutions and evolve them through selection, crossover, and mutation.

Darwin's 5 pillars: Reproduction · Adaptation · Inheritance · Variation · Competition

GA terminology: Individual = solution · Generation = iteration · Population = set of solutions · Chromosome = string representation

GA vs SA vs HC

Feature	HC	SA	GA
Solution count	1	1	Population N
Inspiration	—	Metallurgy	Evolution
Operators	Neighbor move	Random neighbor	Crossover + Mutation
Selection	—	—	Fitness-based

📋 The GA Algorithm

Algorithm 4: Standard Generational GA ──────────────────────────────────────────────── 1. Create initial population P of N individuals (random) 2. repeat until termination condition: 2.1. Calculate fitness of each individual in P 2.2. Create empty offspring population P' 2.3. repeat until P' has N individuals: 2.3.1. Choose operator: crossover (prob p_c) or replication (1−p_c) 2.3.2. Select 2 individuals from P using selection algorithm 2.3.3. Apply chosen operator to selected individuals 2.3.4. Apply mutation to offspring 2.3.5. Insert offspring into P' 2.4. P := P' (generational replacement) 3. return best individual in P

Parameters to set: Population size N · Crossover rate p_c (0.8–1.0) · Mutation rate p_m (0–0.2) · Max generations · Selection algorithm · Elitism size

Elitism: Guarantee the best k individuals survive to the next generation. Prevents loss of the best solution. Typical: k=1.

🎯 Selection Algorithms Interactive

Fitness Proportionate Selection: Each individual's selection probability is proportional to its fitness.

P(select i) = f_i / Σ_j f_j

Spin the roulette wheel! Fitter individuals occupy larger segments.

Press Spin to select an individual.

Population (fitness values)

Weakness: Sensitive to fitness scale differences. One very fit individual can dominate. Negative fitness requires rescaling.

Ranking Selection: Sort individuals by fitness; selection probability is based on rank position, not raw fitness value.

Rank individuals 1 (worst) to N (best)
P(select rank r) = φ(r) / Σ_j φ(j)
φ can be linear, logarithmic, or exponential

Advantage over roulette wheel: Not affected by fitness scale. Changing fitness values drastically doesn't change selection probabilities — only the rank order matters.

Tournament Selection: Pick k random individuals; select the best among them. Repeat for each parent needed.

Tournament size k: 3

Set k and run a tournament.

Selection pressure: Small k → low pressure (any individual can win). Large k → high pressure (best usually wins). k is a tunable parameter!

🔀 Genetic Operators Interactive

Standard (1-point) Crossover: Pick a random crossover point; swap substrings between two parents to create two children.

Conservation operator: Crossover recombines existing genetic material in new ways. Crossover rate p_c is typically high (0.8–1.0). If crossover doesn't happen, replication (copy parents) occurs instead.

Standard Mutation: For each gene, replace with a random character with probability p_m (the mutation rate).

Mutation rate p_m: 0.15

Innovation operator: Mutation introduces new genetic material. Rate p_m kept low (0–0.2) — as in nature, mutation is rare. High mutation destroys good solutions.

🔬 GA Population Simulator Live

Watch a population evolve to maximize f(x) = number of 1-bits in a binary string.

Pop size N: 10

Crossover p_c: 0.8

Mutation p_m: 0.1

Press Start to evolve.

📐 Schema Theorem

A schema H is a pattern over {0,1,*} where * is a wildcard. E.g., H = 1**0* matches any 5-bit string starting with 1 and having 0 in position 4.

m(H, t+1) ≥ m(H, t) · [f(H)/f̄] · [1 − p_c·δ(H)/(l−1)] · [1 − o(H)·p_m] where: f(H) = average fitness of strings matching H f̄ = average population fitness δ(H) = defining length (distance between outermost fixed positions) o(H) = order (number of fixed positions, i.e., non-* symbols) l = chromosome length

Building Block Hypothesis: GAs work by discovering, recombining and propagating short, low-order, above-average schemata — called building blocks.

What thrives: Short schemata (small δ), low order (small o), above-average fitness → exponentially increasing representation over generations.

⚙️ Advanced GA Methods

Premature convergence occurs when the population loses diversity too quickly — all individuals become similar before finding a good solution.

Symptoms: High entropy drops quickly. All individuals share the same building blocks. Genetic operators can no longer create improvements.

Measuring diversity: Entropy H(P) = −Σ F_j·log(F_j) where F_j is the fraction of individuals with a given genotype/fitness value. Variance can also be used.

Causes: Too high selection pressure · Small population · High crossover rate · Low mutation rate

3.7 Genetic Algorithms for Continuous Optimization

Continuous optimization is a very frequent class of problems — e.g. optimizing parameters of a device or another algorithm. Standard GA operators designed for discrete alphabets are weak here: bit-flip crossover cannot generate new allele values, it only swaps existing ones.

Discrete GA individuals: strings like 1011010 — alleles ∈ {0,1}.
Standard crossover swaps substrings → fine for discrete problems.

Continuous GA individuals: real-valued vectors [x₁, x₂, …, xₘ] where xᵢ ∈ [αᵢ, βᵢ] ⊂ ℝ.
Each individual = a point in m-dimensional Cartesian space.

Geometric Crossover

Given parents P₁ = [x₁,…,xₘ] and P₂ = [y₁,…,yₘ], one offspring is produced:

offspring[j] = rⱼ · xⱼ + (1 − rⱼ) · yⱼ, rⱼ ~ Uniform(0, 1) independently Special case: if all rⱼ are equal → offspring lies on the segment joining the parents. Consequence: if fitness ∝ distance to global optimum → offspring cannot be worse than the worst parent.

Geometric Mutation (Box Mutation)

Given an individual [x₁,…,xₘ], produces:

offspring[j] = xⱼ + rⱼ, rⱼ ~ Uniform(−ms, ms) ms = mutation step (tunable parameter) Effect: offspring appears anywhere inside a "box" centred on the parent. Key property: always possible to be closer to the global optimum → induces unimodal fitness landscape (no local optima except the global ones).

Why geometric crossover is powerful: It interpolates between parents in continuous space — generating entirely new allele values, not just recombinations of existing bits. The offspring's geometric position relative to parents gives it provable fitness guarantees under distance-proportional fitness.

Limitation of standard crossover on continuous problems: Swapping substrings of a real-vector encoding only reshuffles existing allele values across dimensions. It cannot, for example, create a value of 3.7 from parents holding 2.1 and 5.9 in that dimension — geometric crossover can (any value in [2.1, 5.9]).

🎮 Interactive: Geometric Crossover & Mutation in 2D

Drag Parent 1 (blue) and Parent 2 (green) on the canvas. Use the buttons to generate offspring using geometric crossover or box mutation.

Mutation step ms: 30

Click "Geometric Crossover" to generate an offspring between the two parents.

Continuous vs. Discrete GA — Key Differences

Representation: Real vectors vs. binary/integer strings

Crossover: Interpolation vs. substring swap

Mutation: Random perturbation (±ms) vs. bit flip

Search space: ℝᵐ (continuous) vs. {0,1}ℓ (discrete)

Fitness landscape: Box mutation → unimodal. Standard GA landscapes can be highly multimodal.

Convergence: Geometric mutation step ms acts like a temperature — can be annealed over time for fine-tuning.

How to fairly compare algorithm configurations:

ABF (Average Best Fitness): Average the best fitness found at each generation over all independent runs. Gives a "typical run" view.

MBF (Median Best Fitness): More robust to outliers than ABF. Use with box plots at termination.

SR (Success Rate): SR = #successful runs / #total runs. A run is "successful" if the global optimum (or ε-approximation) was found.

Comparing different population sizes: Don't compare against generations — compare against cumulative fitness evaluations! A pop-size-10 GA at gen 2 ≠ pop-size-20 GA at gen 2.

Fair comparison rule: If A uses population n₁ and B uses population n₂ > n₁, then B must run for (n₁/n₂) × gen_A generations to perform the same number of fitness evaluations.

🐦 Particle Swarm Optimization

PSO (Kennedy & Eberhart, 1995) is inspired by the social behavior of bird flocking and fish schooling. Particles move through the search space, guided by their own best position and the swarm's best position.

Unlike GAs: No selection, no crossover, no mutation. Instead, particles have velocity and memory. PSO belongs to Swarm Intelligence, not Evolutionary Computation.

Natural inspiration: A flock searching for a landing spot. Each bird remembers its own best location and is attracted by the flock's overall best location.

📋 PSO Algorithm & Equations

Algorithm 6: PSO (minimization) ──────────────────────────────────────────────── 1. ∀i: initialize position x_i and velocity v_i (randomly) 2. ∀i: b_i := x_i (local best = initial position) 3. g := argmin f(x_i) (global best = best initial particle) 4. repeat until termination: 4.1. for each particle i: 4.1.1. x_i := x_i + v_i (update position) 4.1.2. v_i := w·v_i + c₁·r₁∘(b_i−x_i) + c₂·r₂∘(g−x_i) (update velocity) 4.1.3. if f(x_i) < f(b_i): b_i := x_i (update local best) 4.1.4. if f(x_i) < f(g): g := x_i (update global best) 5. return g

Velocity components:
• w·v_i — inertia (keep moving in same direction)
• c₁·r₁∘(b_i−x_i) — cognitive term (attract to own best)
• c₂·r₂∘(g−x_i) — social term (attract to swarm best)

Parameters:
• w ≈ 1 (inertia weight)
• c₁ ≈ c₂ ≈ 2 (learning factors)
• Swarm size: 20–40 particles (fewer than GA pop)
• Iterations: thousands to millions (more than GA gens)

🎮 PSO 2D Simulator Live

Particles search for the minimum of a 2D surface. Watch them swarm toward the global optimum!

Inertia w: 0.7

c₁ (cognitive): 2.0

c₂ (social): 2.0

Particles: 20

Press Start to run PSO.

🔧 Parameter Setting & Variants

Parameter Guidelines

Parameter	Typical Range	Effect
Swarm size n	20–40	More particles → better coverage
Inertia w	≈ 1	Higher → more exploration
c₁, c₂	≈ 2	c₁ high → self-reliant; c₂ high → social
Max velocity	β_j − α_j	Prevents "jumping over" optima
Iterations	1000s – millions	More → better convergence

Variants

Multiple swarms: Run several independent swarms that occasionally share information. Improves diversity.

Hybrid PSO-GA: Combine PSO velocity updates with genetic crossover/mutation operators.

Discrete PSO: Map discrete search space to continuous domain, apply PSO, then map back. Or use modified velocity update for binary spaces.

GPU parallelism: PSO is highly parallelizable — position/velocity updates per particle are independent. Ideal for GPU implementation (millions of iterations feasible).

📖 Glossary of Key Terms

🧠 Self-Assessment Quiz

Test your understanding across all chapters.

Score

0/0

Lectures on Intelligent Systems