Lectures on Intelligent Systems

Vanneschi & Silva ยท Chapters 1โ€“4 ยท Interactive Study Guide

๐ŸŽฏ Optimization โ›ฐ๏ธ Hill Climbing ๐ŸŒก๏ธ Simulated Annealing ๐Ÿงฌ Genetic Algorithms ๐Ÿฆ Particle Swarm ๐Ÿง  Quiz

๐Ÿง  What Are Intelligent Systems?

Intelligent systems solve problems autonomously โ€” with reduced or no human intervention. Classical algorithm design fails when problems are too complex, when solution spaces are astronomically large, or when no known algorithm can solve them in acceptable time.

Key motivation: 1 cmยณ of molecules contains ~2.7ร—10ยนโน candidates. Testing all combinations is impossible โ€” intelligent systems explore smartly.
๐Ÿงฌ
Drug Discovery

Find the right molecule from trillions of candidates

๐Ÿš—
Self-Driving Cars

Navigate 3D space with real-time decisions

๐Ÿ’น
Portfolio Optimization

Maximize returns subject to risk constraints

๐ŸŒฟ Bioinspired Algorithms

๐Ÿง 
Artificial Neural Networks
Inspired by the human brain
๐Ÿงฌ
Genetic Algorithms / GP
Darwin's theory of evolution
๐Ÿฆ
Particle Swarm Optimization
Bird flocking & fish schooling
๐ŸŒก๏ธ
Simulated Annealing
Metallurgical heat treatment

๐Ÿ“š Topics Covered

Ch.1 Introductionpp. 1โ€“10
Ch.2 Optimization & Local Searchpp. 11โ€“50
Ch.3 Genetic Algorithmspp. 45โ€“103
Ch.4 Particle Swarm Optimizationpp. 105โ€“112
Tools mentioned: Weka (Java) & Scikit-learn (Python) โ€” popular environments for comparing CI methods, directly motivated by the No Free Lunch Theorem.

๐ŸŽฏ Optimization Problems

An optimization problem is defined by a search space S and a fitness function f. The goal: find s* โˆˆ S that maximizes or minimizes f(s*).

Instance = (S, f)
S = search space (all feasible solutions)
f : S โ†’ โ„ = fitness function
Goal: s* = argmax f(s) or argmin f(s)
Combinatorial optimization: S is finite but enormous โ€” e.g., 2ยนโฐโฐ bit strings, or (n-1)!/2 TSP tours. Exhaustive search is impossible.

๐ŸŽ’ Knapsack Problem Interactive

Select items to maximize value without exceeding 15 kg capacity.

โš–๏ธ Weight: 0/15 kg ๐Ÿ’ฐ Value: 0
Click items to add to knapsack.

๐Ÿ—บ๏ธ Travelling Salesman Problem Interactive

Click to place cities, then solve the shortest tour.

Click canvas to add cities.

๐Ÿ“Š Why Exhaustive Search Fails

Solution space size grows exponentially โ€” heuristics are essential

๐Ÿฝ๏ธ No Free Lunch Theorem

Wolpert & Macready (1997): when averaged over all possible problems, any two algorithms perform identically. No universal "best" algorithm exists.

โˆ‘_f P(d_m | f, m, Aโ‚) = โˆ‘_f P(d_m | f, m, Aโ‚‚)

โ†’ Specialized algorithms beat general ones on specific problems
โ†’ But lose on other problems
Implication 1: Always benchmark multiple algorithms on your specific problem domain before choosing.
Implication 2: Exploit domain knowledge. An algorithm that uses problem structure outperforms one that ignores it โ€” on that class of problems.

๐Ÿ“‰ Algorithm Performance Trade-off

A wins on some problems, B wins on others โ€” they balance out overall

๐Ÿ’ก Consequences

StatementTrue?
"Algorithm X is always best"โŒ False
"SA beats HC on rugged landscapes"โœ… True (on average)
"No single best algorithm exists"โœ… True (NFL theorem)
"Problem-specific algorithms can win"โœ… True
"Weka/Scikit-learn compare algorithms"โœ… Motivated by NFL

โ›ฐ๏ธ Hill Climbing

The simplest local search algorithm. Start at a random solution, move to a better neighbor, repeat until no improvement is possible.

Algorithm 1: Hill Climbing โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ 1. Initialize random solution i_start 2. i := i_start 3. repeat 3.1. Generate neighbor j from N(i) 3.2. if f(j) โ‰ฅ f(i) then i := j until โˆ€j โˆˆ N(i): f(j) < f(i) 4. return i
Core flaw: HC always terminates at a local optimum โ€” no guarantee it's the global one. "Blindly following fitness" is a losing strategy on rugged or deceptive landscapes.

๐ŸŽฎ Hill Climbing Simulator Live

Press Step or Auto to run.

โœ… vs โŒ Strengths & Weaknesses

PropertyResult
Simple to implementโœ… Very
Fast per iterationโœ… O(|N(i)|)
Works for any Nโœ… General
Escapes local optimaโŒ Never
Finds global optimumโŒ No guarantee
Deceptive problemsโŒ Very poor
Neighborhood N: A mapping N: S โ†’ 2^S. For bit strings, the 1-bit flip (Hamming-1) neighborhood is standard. Larger N explores more but costs more per step.

๐Ÿ—บ๏ธ Fitness Landscapes

A Fitness Landscape (FL) visualizes the relationship between solution structure and fitness value. The shape of the landscape determines how hard a problem is for local search algorithms.

Why it matters: HC is an "uphill climber" โ€” it stops at any peak. The landscape shape tells us whether that peak is the global optimum or a poor local trap.

๐Ÿ“Š Landscape Explorer Interactive

๐Ÿ”Ž Why We Can't Draw FLs

  • Vast search space: 2ยนโฐโฐ solutions โ€” impossible to enumerate
  • High dimensionality: Neighborhood of size k โ†’ k-dimensional FL
  • Dynamic problems: Some fitness functions change over time
Consequence: We can rarely know the landscape in advance โ€” adaptive methods like SA are essential.

๐Ÿ’ก Landscape โ†’ Algorithm Choice

๐ŸŸข
Smooth: HC always succeeds. Gradient methods work.
๐ŸŸ 
Rugged: Need SA, GA โ€” algorithms that escape local optima.
๐Ÿ”ด
Deceptive: Fitness misleads search. Population-based methods (GA) handle better.
๐Ÿ”ต
Neutral: Large plateaus โ€” need exploration strategies.

๐ŸŒก๏ธ Simulated Annealing

SA (Kirkpatrick et al. 1983) extends HC using inspiration from metallurgical annealing โ€” slowly cooling a material to reach its minimum energy state. The key idea: accept worse solutions with decreasing probability to escape local optima.

P(accept j) = { 1, if f(j) โ‰ฅ f(i) [better solution] e^(โˆ’|f(j)โˆ’f(i)| / c), otherwise [worse solution] } c = control parameter (like temperature, decreases over time)
Physics analogy: Boltzmann distribution โ€” P โˆ e^(โˆ’ฮ”E / k_BยทT). Replace energy with fitness, temperature with c.

๐Ÿ“ Acceptance Probability Explorer Interactive

P(accept)
36.8%

๐ŸŽฎ SA Live Simulator Interactive

Press Start to run.

๐Ÿ“ Theory: Asymptotic Convergence

Lemma 2.1 (Stationary probability): After many transitions with constant c, SA stabilizes on solution i with probability:

P{X=i} = e^(โˆ’f(i)/c) / ฮฃ_{jโˆˆS} e^(โˆ’f(j)/c) [Boltzmann distribution]

Theorem 2.3 (Asymptotic Convergence): As c โ†’ 0, SA stabilizes on global optima with probability 1, uniformly distributed over all global optima.

Key caveat: The theorem guarantees convergence in the limit (c โ†’ 0, infinite time). It says nothing about convergence speed. Parameter tuning is crucial and problem-dependent.

โš–๏ธ HC vs SA Comparison

PropertyHill ClimbingSimulated Annealing
Accepts worse solutions?โŒ Neverโœ… With prob e^(โˆ’ฮ”f/c)
Escapes local optima?โŒ Noโœ… Yes
Global optimum guarantee?โŒ Local onlyโœ… Asymptotically
Neighbor selectionBest neighborRandom neighbor
Parametersโœ… Noneโš ๏ธ c, L, schedule

๐Ÿงฌ Genetic Algorithms

GAs (Holland 1975, Goldberg 1989) are Evolutionary Computation methods inspired by Darwin's theory of natural selection. Unlike HC/SA, GAs work with a population of solutions and evolve them through selection, crossover, and mutation.

Darwin's 5 pillars: Reproduction ยท Adaptation ยท Inheritance ยท Variation ยท Competition
GA terminology: Individual = solution ยท Generation = iteration ยท Population = set of solutions ยท Chromosome = string representation

GA vs SA vs HC

FeatureHCSAGA
Solution count11Population N
Inspirationโ€”MetallurgyEvolution
OperatorsNeighbor moveRandom neighborCrossover + Mutation
Selectionโ€”โ€”Fitness-based

๐Ÿ“‹ The GA Algorithm

Algorithm 4: Standard Generational GA โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ 1. Create initial population P of N individuals (random) 2. repeat until termination condition: 2.1. Calculate fitness of each individual in P 2.2. Create empty offspring population P' 2.3. repeat until P' has N individuals: 2.3.1. Choose operator: crossover (prob p_c) or replication (1โˆ’p_c) 2.3.2. Select 2 individuals from P using selection algorithm 2.3.3. Apply chosen operator to selected individuals 2.3.4. Apply mutation to offspring 2.3.5. Insert offspring into P' 2.4. P := P' (generational replacement) 3. return best individual in P
Parameters to set: Population size N ยท Crossover rate p_c (0.8โ€“1.0) ยท Mutation rate p_m (0โ€“0.2) ยท Max generations ยท Selection algorithm ยท Elitism size
Elitism: Guarantee the best k individuals survive to the next generation. Prevents loss of the best solution. Typical: k=1.

๐ŸŽฏ Selection Algorithms Interactive

Fitness Proportionate Selection: Each individual's selection probability is proportional to its fitness.

P(select i) = f_i / ฮฃ_j f_j

Spin the roulette wheel! Fitter individuals occupy larger segments.

Press Spin to select an individual.

Population (fitness values)

Weakness: Sensitive to fitness scale differences. One very fit individual can dominate. Negative fitness requires rescaling.

Ranking Selection: Sort individuals by fitness; selection probability is based on rank position, not raw fitness value.

Rank individuals 1 (worst) to N (best)
P(select rank r) = ฯ†(r) / ฮฃ_j ฯ†(j)
ฯ† can be linear, logarithmic, or exponential
Advantage over roulette wheel: Not affected by fitness scale. Changing fitness values drastically doesn't change selection probabilities โ€” only the rank order matters.

Tournament Selection: Pick k random individuals; select the best among them. Repeat for each parent needed.

Set k and run a tournament.
Selection pressure: Small k โ†’ low pressure (any individual can win). Large k โ†’ high pressure (best usually wins). k is a tunable parameter!

๐Ÿ”€ Genetic Operators Interactive

Standard (1-point) Crossover: Pick a random crossover point; swap substrings between two parents to create two children.

Conservation operator: Crossover recombines existing genetic material in new ways. Crossover rate p_c is typically high (0.8โ€“1.0). If crossover doesn't happen, replication (copy parents) occurs instead.

Standard Mutation: For each gene, replace with a random character with probability p_m (the mutation rate).

Innovation operator: Mutation introduces new genetic material. Rate p_m kept low (0โ€“0.2) โ€” as in nature, mutation is rare. High mutation destroys good solutions.

๐Ÿ”ฌ GA Population Simulator Live

Watch a population evolve to maximize f(x) = number of 1-bits in a binary string.

Press Start to evolve.

๐Ÿ“ Schema Theorem

A schema H is a pattern over {0,1,*} where * is a wildcard. E.g., H = 1**0* matches any 5-bit string starting with 1 and having 0 in position 4.

m(H, t+1) โ‰ฅ m(H, t) ยท [f(H)/fฬ„] ยท [1 โˆ’ p_cยทฮด(H)/(lโˆ’1)] ยท [1 โˆ’ o(H)ยทp_m] where: f(H) = average fitness of strings matching H fฬ„ = average population fitness ฮด(H) = defining length (distance between outermost fixed positions) o(H) = order (number of fixed positions, i.e., non-* symbols) l = chromosome length
Building Block Hypothesis: GAs work by discovering, recombining and propagating short, low-order, above-average schemata โ€” called building blocks.
What thrives: Short schemata (small ฮด), low order (small o), above-average fitness โ†’ exponentially increasing representation over generations.

โš™๏ธ Advanced GA Methods

Premature convergence occurs when the population loses diversity too quickly โ€” all individuals become similar before finding a good solution.

Symptoms: High entropy drops quickly. All individuals share the same building blocks. Genetic operators can no longer create improvements.
Measuring diversity: Entropy H(P) = โˆ’ฮฃ F_jยทlog(F_j) where F_j is the fraction of individuals with a given genotype/fitness value. Variance can also be used.

Causes: Too high selection pressure ยท Small population ยท High crossover rate ยท Low mutation rate

Fitness Sharing: Reward diversity by reducing fitness of similar individuals. Individuals in dense clusters compete for a shared "fitness resource".

f_shared(x) = f(x) / ฮฃ_{yโˆˆP} sh(d(x,y)) sh(d) = 1 โˆ’ (d/ฯƒ_share)^ฮฑ if d < ฯƒ_share, else 0
Effect: Individuals in crowded regions get their fitness penalized. Diverse individuals in uncrowded regions receive a bonus. Promotes exploration of multiple peaks.

Other methods: Restricted mating (only mate similar individuals) ยท Diploid chromosomes (two copies, dominant/recessive) ยท Novelty search (replace fitness with novelty measure)

3.7 Genetic Algorithms for Continuous Optimization

Continuous optimization is a very frequent class of problems โ€” e.g. optimizing parameters of a device or another algorithm. Standard GA operators designed for discrete alphabets are weak here: bit-flip crossover cannot generate new allele values, it only swaps existing ones.

Discrete GA individuals: strings like 1011010 โ€” alleles โˆˆ {0,1}.
Standard crossover swaps substrings โ†’ fine for discrete problems.
Continuous GA individuals: real-valued vectors [xโ‚, xโ‚‚, โ€ฆ, xโ‚˜] where xแตข โˆˆ [ฮฑแตข, ฮฒแตข] โŠ‚ โ„.
Each individual = a point in m-dimensional Cartesian space.

Geometric Crossover

Given parents Pโ‚ = [xโ‚,โ€ฆ,xโ‚˜] and Pโ‚‚ = [yโ‚,โ€ฆ,yโ‚˜], one offspring is produced:

offspring[j] = rโฑผ ยท xโฑผ + (1 โˆ’ rโฑผ) ยท yโฑผ, rโฑผ ~ Uniform(0, 1) independently Special case: if all rโฑผ are equal โ†’ offspring lies on the segment joining the parents. Consequence: if fitness โˆ distance to global optimum โ†’ offspring cannot be worse than the worst parent.

Geometric Mutation (Box Mutation)

Given an individual [xโ‚,โ€ฆ,xโ‚˜], produces:

offspring[j] = xโฑผ + rโฑผ, rโฑผ ~ Uniform(โˆ’ms, ms) ms = mutation step (tunable parameter) Effect: offspring appears anywhere inside a "box" centred on the parent. Key property: always possible to be closer to the global optimum โ†’ induces unimodal fitness landscape (no local optima except the global ones).
Why geometric crossover is powerful: It interpolates between parents in continuous space โ€” generating entirely new allele values, not just recombinations of existing bits. The offspring's geometric position relative to parents gives it provable fitness guarantees under distance-proportional fitness.
Limitation of standard crossover on continuous problems: Swapping substrings of a real-vector encoding only reshuffles existing allele values across dimensions. It cannot, for example, create a value of 3.7 from parents holding 2.1 and 5.9 in that dimension โ€” geometric crossover can (any value in [2.1, 5.9]).

๐ŸŽฎ Interactive: Geometric Crossover & Mutation in 2D

Drag Parent 1 (blue) and Parent 2 (green) on the canvas. Use the buttons to generate offspring using geometric crossover or box mutation.

Click "Geometric Crossover" to generate an offspring between the two parents.

Continuous vs. Discrete GA โ€” Key Differences

Representation: Real vectors vs. binary/integer strings
Crossover: Interpolation vs. substring swap
Mutation: Random perturbation (ยฑms) vs. bit flip
Search space: โ„แต (continuous) vs. {0,1}โ„“ (discrete)
Fitness landscape: Box mutation โ†’ unimodal. Standard GA landscapes can be highly multimodal.
Convergence: Geometric mutation step ms acts like a temperature โ€” can be annealed over time for fine-tuning.

How to fairly compare algorithm configurations:

ABF (Average Best Fitness): Average the best fitness found at each generation over all independent runs. Gives a "typical run" view.
MBF (Median Best Fitness): More robust to outliers than ABF. Use with box plots at termination.
SR (Success Rate): SR = #successful runs / #total runs. A run is "successful" if the global optimum (or ฮต-approximation) was found.
Comparing different population sizes: Don't compare against generations โ€” compare against cumulative fitness evaluations! A pop-size-10 GA at gen 2 โ‰  pop-size-20 GA at gen 2.
Fair comparison rule: If A uses population nโ‚ and B uses population nโ‚‚ > nโ‚, then B must run for (nโ‚/nโ‚‚) ร— gen_A generations to perform the same number of fitness evaluations.

๐Ÿฆ Particle Swarm Optimization

PSO (Kennedy & Eberhart, 1995) is inspired by the social behavior of bird flocking and fish schooling. Particles move through the search space, guided by their own best position and the swarm's best position.

Unlike GAs: No selection, no crossover, no mutation. Instead, particles have velocity and memory. PSO belongs to Swarm Intelligence, not Evolutionary Computation.
Natural inspiration: A flock searching for a landing spot. Each bird remembers its own best location and is attracted by the flock's overall best location.

๐Ÿ“‹ PSO Algorithm & Equations

Algorithm 6: PSO (minimization) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ 1. โˆ€i: initialize position x_i and velocity v_i (randomly) 2. โˆ€i: b_i := x_i (local best = initial position) 3. g := argmin f(x_i) (global best = best initial particle) 4. repeat until termination: 4.1. for each particle i: 4.1.1. x_i := x_i + v_i (update position) 4.1.2. v_i := wยทv_i + cโ‚ยทrโ‚โˆ˜(b_iโˆ’x_i) + cโ‚‚ยทrโ‚‚โˆ˜(gโˆ’x_i) (update velocity) 4.1.3. if f(x_i) < f(b_i): b_i := x_i (update local best) 4.1.4. if f(x_i) < f(g): g := x_i (update global best) 5. return g
Velocity components:
โ€ข wยทv_i โ€” inertia (keep moving in same direction)
โ€ข cโ‚ยทrโ‚โˆ˜(b_iโˆ’x_i) โ€” cognitive term (attract to own best)
โ€ข cโ‚‚ยทrโ‚‚โˆ˜(gโˆ’x_i) โ€” social term (attract to swarm best)
Parameters:
โ€ข w โ‰ˆ 1 (inertia weight)
โ€ข cโ‚ โ‰ˆ cโ‚‚ โ‰ˆ 2 (learning factors)
โ€ข Swarm size: 20โ€“40 particles (fewer than GA pop)
โ€ข Iterations: thousands to millions (more than GA gens)

๐ŸŽฎ PSO 2D Simulator Live

Particles search for the minimum of a 2D surface. Watch them swarm toward the global optimum!

Press Start to run PSO.

๐Ÿ”ง Parameter Setting & Variants

Parameter Guidelines

ParameterTypical RangeEffect
Swarm size n20โ€“40More particles โ†’ better coverage
Inertia wโ‰ˆ 1Higher โ†’ more exploration
cโ‚, cโ‚‚โ‰ˆ 2cโ‚ high โ†’ self-reliant; cโ‚‚ high โ†’ social
Max velocityฮฒ_j โˆ’ ฮฑ_jPrevents "jumping over" optima
Iterations1000s โ€“ millionsMore โ†’ better convergence

Variants

Multiple swarms: Run several independent swarms that occasionally share information. Improves diversity.
Hybrid PSO-GA: Combine PSO velocity updates with genetic crossover/mutation operators.
Discrete PSO: Map discrete search space to continuous domain, apply PSO, then map back. Or use modified velocity update for binary spaces.
GPU parallelism: PSO is highly parallelizable โ€” position/velocity updates per particle are independent. Ideal for GPU implementation (millions of iterations feasible).

๐Ÿ“– Glossary of Key Terms

๐Ÿง  Self-Assessment Quiz

Test your understanding across all chapters.

Score
0/0