Artificial Intelligence Midterm

Artificial Intelligence
The study and design of intelligent agents, where an intelligent agent is a system that perceives its environment and takes actions that maximize its chances of success
Fours Schools of Thought
Thinking Humanly v. Thinking Rationally v. Acting Humanly v. Acting Rationally

ACTING RATIONALLY is the one we study

Rational Agent
An agent that acts so as to achieve the best outcome, or when there is uncertainty, the best expected outcome
Agent Equation
Agent = Architecture + Program
Perceives its environment through SENSORS and acts upon that environment through ACTUATORS

Perceive –> Think –> Act

Performance, Environment, Actuators, Sensors
Fully v. Partially Observable
an agent’s sensors give it access to complete state
Deterministic v. Stochastic
Next state is completely determined by current state, instead of random chance
Episodic vs. Sequential
Agent’s experience is divided into atomic “episodes,” choice depends only on the episode itself
Discrete v. Continuous
A limited number of distinct, clearly defined percepts and actions i.e. checkers
Simple Reflex Agents
select BASED ON CURRENT STATE ONLY – fully observable, simple but limited i.e. vacuum
Model-Based Reflex Agents
Agent needs some GOAL INFORMATION – combines goal information with environment model to choose actions that achieve the goal
Utility-Based Agents
Agent happiness is taken into consideration – UTILITY is the agent’s performance measure
Learning Agents
4 components: learning element, performance element, critic, problem generator
Goal-Based Agents
Agents that work toward a goal, consider the impact of actions on future states, job is to identify the action or series of actions that lead to the goal – formalized as a SEARCH through possible solutions
Search Problem Formulation
Initial State, States, Actions, Transition Model, Goal Test, Path Cost
State Space
A physical configuration
Search Space
An abstract configuration represented by a search tree or graph of possible solutions
Search Tree
Models the sequence of actions – root is the initial state, branches are the actions, nodes are results from actions
Search Space Regions
Explored, Frontier, Unexplored
Does it always find a solution if one exists?
Time Complexity
Number of nodes generated/expanded
Space Complexity
Maximum number of nodes in memory
Does it always find a least-cost solution?
maximum branching factor of the search tree
depth of the solution
maximum depth of the state space
Expand the shallowest node

Complete, O(b^d) time, O(b^d) space, optimal

Expand deepest first

Not complete! O(b^m) time, O(bm) space, not optimal

Depth-Limited Search
DFS with depth limit l, select some limit L in depth to explore with DFS, iterative deepening increases the limit l
Uniform Cost Search
We want the cheapest path, not the shallowest solution –> prioritize by cost, not depth, expand node n with the lowest path cost g(n)
Informed Search
Use a heuristic function that estimates how close a state is to the goal, allows us to know whether we’re actually getting closer to the goal
Greedy Search
h(n) is our heuristic function, estimates the cost from n to the closest goal
Common Heuristic Functions
straight-line distance, Manhattan distance
A* Search
minimize the total estimated solution cost – g(n) cost to reach node n, h(n) cost to get from n to goal, f(n) = g(n) + h(n)

Complete, exponential time, every node in memory, but optimal!

Admissible Heuristic
never overestimates the cost to reach the goal i.e. it is optimistic, for all node n, h(n) <= h*(n) or the real cost
Benefits of IDFS over BFS
Memory! For BFS, we have to store all the nodes of the depth in memory! For IDFS, we can look at only the number in our current subtree
Local Search
Iterative improvement algorithms, also pure optimization problems – the path doesn’t matter!

No need to maintain a search tree, use very little memory, often find good solutions

Hill climbing, simulated annealing, local beam, genetic algorithms

Hill Climbing
Greedy local search – looks only to immediate good neighbors and not beyond – search move uphill, terminates when it reaches a pick, can terminate with any maxima
Local Beam Search
maintains k states instead of just one, selects the k best successor, useful information is passed amont the states
Stochastic Beam Search
chooses k successors at random, helps alleviate the problem of states agglomerating around same part of state space
Generic Algorithms
variant of stochastic beam search, successor states generated by combining two parents than modifying a single state, process inspired by natural selection, have a fitness function
Association Rules Two Steps
Mining frequent patterns in D
Generating strong association rules

{Bread, Butter} frequent pattern
Bread –> Butter strong rule

an object belong to L = {x1, x2,…,xm}
any subset of L
itemset of cardinality k
itemset identified by a unique identifier tid
set of all transaction ids
a subset of T
Transaction Dataset
D = {(tid, Xtid) / tid in T, Xtid subset L}
|t(X)| = |{(tid, Xtid) in D / X subset Xtid}|
|t(X)| / |D|
Frequency itemset
X is frequent iff supp(x) >= MinSupp
Support Downward Closure
if an itemset is frequent, then all its subsets also are frequent
A-Priori Bottlenock
billions of transactions, tens of thousands of items, terabytes of data –> multiple scans of dataset, HUGE number of candidate sets
Probabilistic Interpretation
supp(A->C) = P(A ^ C)
conf(A->C) = P(A ^ C) / P(A)
Cons of Support/Confidence
Biased rules! Confidence cannot detect bias because it ignores support!

tea -> coffee – everyone buys coffee, not just because it has tea

P(A ^ C) / P(A) x P(C) = supp(A U C) / supp(A) x supp(C)
Multidimensional Rules
Rather than just single rules, construct k-predicatesets instead of k-itemsets
AR Post-Processing
Use evaluation measures, increase minimum support, increase minimum confidence, use rule templates
Quantitative AR
Infinite Search space, support-confidence tradeoff, unsupervised discretization
Adversarial Search
Games – multi-agent competitive environments, opponent we can’t control planning against us, optimal solution is a strategy – modeled as search problems, use heuristics
Deterministic, Perfect Info
chess, checkers, go, othello

Deterministic, fully observable environments, zero-sum, two agents act alternately

Zero-Sum Games
pure competition, agents have different values on the outcomes, one agent maximizes on single value, while the other minimizes it
Embedded Thinking
one agent is trying to figure out what to do, thinks about the consequences of the possible actions, needs to think about his opponent as well, each will imagine what would be the response from the opponent to their actions
Formal Game
Initial state, Player(s) defines which player has the move in state s, Actions(s) returns set of legal moves in s, transitin function S x A -> S, Terminal Test true when game is over, Utility(s,p) gives us utility for a game ending in terminal state s for player p
Compute each node’s minimax value – the best achievable utility against an optimal adversary – minimax value is the best achievable payoff against best play – DFS of game tree, compute utility of being in a state assuming both players play optimally, propagate minimax values up the tree
Solutions to Minimax Search
Replace terminal utilities with an evaluation function for non-terminal positions, use IDS, use pruning
alpha-beta pruning
Sometimes, minimax decisions are independent of values of certain subtrees, send alpha, beta values down during search, update by propagating upwards values of terminal nodes
Move Ordering
Can affect the effectiveness of alpha-beta pruning – examine first successors that are likely to be best – O(b^m/2) if we get an ideal orderinge
Real-Time Decisions
alpha-beta still has to go all the way to the leaves, which is tough – bound the depth of search (cut off search), replace utility(s) with eval(s), evaluation function to estimate value of current board configurations
Add a third level – Chance – if the player is chance, the value should be the expected value of the results of each of the options
K-Nearest Neighbors
Given an example xq to be classified, we take the sign of the average K-nearest neighbors
Common Forms of Loss
Classification Error, Least Square Loss, etc.
Avoid Overfitting
Reduce number of features, do a model selection, use regularization, do cross-validation
K-Fold Cross Validation
Method for estimating test error using training data, randomly partition data into k equal-sized subsets, use each subset as a test set once, using the other k-1 subsets as training
percentage of predictions that are correct
percentage of positive predictions that are correct
percentage of positive cases that were predicted as positive
percentage of negative cases that were predicted as negative
Tree Classifier
popular classification method, easy to understand, simple algorithmic approach, no assumption about linearity

no need for numerical features, model is a tree

p+ log p+ – p- log p-
Information Gain
measures the expected reduction in entropy caused by partitioning the examples according to the attributes