Phil/Psych 256, Week 3

3a Rules: Introduction

History of rule-based systems

1. Newell and Simon, GPS 1950s-60s

2. Expert systems, 1970s-90s. Most corporations.

3. ACT 1983. John Anderson.

4. Induction book, Holland et al., 1986

5. SOAR, Newell and his students, 1980s.

6. Prolog implementations; cf. logic.

Note both psychological and technological applications.

Example of a rule-based system

How to get from Waterloo to Toronto?

IF you want to get to Toronto, and you are in Waterloo, and you have no car, THEN take the bus.

IF you want to take the bus from Waterloo to Toronto, THEN go to the bus depot and buy a ticket.

IF you want to buy a ticket, THEN get some money.

IF you want to get some money, then go to the bank and withdraw it.

IF you want to get to Toronto and you have a car, THEN take highway 8 to the the 401.

IF: conditions (antecedent)

THEN: action (consequent)

Another examples: What rules do you use to plan weekend entertainment?

How different from logic?

1. Note less informational power than logic: rule-based system may not have full quantifiers and rules of inference.

2. But it can nevertheless be more computationally powerful, just because it focuses on the task to be accomplished.

3. Uses processes that are not inherently part of logic: subgoaling.

4. Can be tied in with other processes, such as spreading activation to model human memory (ACT, PI). Can be combined with other representations, such as concepts.

Strengths of rule-based systems.

1. Have been used in many commercial technical systems.

2. Modular, easy to add to.

3. Have modeled various kinds of psychological experiments.

4. Lots of human knowledge and ability naturally described in terms of rules.

5. Various techniques known for learning rules.

Note: 1 and 2 are questions of computational power, 3 and 4 are questions of psychological plausibility.

Weaknesses of rule-based systems

1. Inflexibility.

2. Overgenerality.

3. Control difficult.

4. Operation not always comprehensible.

5. Knowledge acquisition is difficult.

Key points in John Anderson, "Production Systems and the ACT-R Theory"

Class 3b: Evaluation of rule-based systems

Representational power of rules.

Condition-action rules can represent a lot of what we know that is general.

But they do not have everything found in full-blown logic. E.g. All horses have tails. Typically no existence quantifiers, combinations of ands and ors and nots. But such restrictions can actually increase computational power.

On the other hand, some things like subgoals are naturally represented: If you want to do x .... Not first order logic.

Computational power of rules.

Problem solving

1. Forward chaining: do deductive reasoning forward by modus ponens: if p then q; p; so q. But use strategies to focus inference, e.g. use most specific rule.

Good for planning. Start with initial state, work forward to goal.

Reasoning is search through a space of states and operators.

2. Backward chaining: if p then q; q; so check whether p can be accomplished. Good for diagnosis (explanation), planning. Start with goal, work backward to current state.

Bidirectional search: forward and backward.

3. Most of the successful commercial expert systems are rule-based.

4. Rule-based reasoninechanisms such as spreading activation: ACT*.

5. Modularity of rules: just add more to the data base.


1. Rules can be learned

(a) generalization from examples: experience, imitation

Fa, Ga -> all F are G. Induction.

(b) rule compilation, chunking

p -> q, q -> r ::::---> p -> r.

Important for skill acquisition: chunk together several rules into one that can be quickly executed.

(c) learn by being told

2. Abductive inference uses rules to form hypotheses..

E.g. babies with ear infections cry.

Adam is crying.

So maybe Adam has an ear infection.

Invalid inference, but plausible use of backward chaining.

Psychological plausibility of rules

Production systems have been used to model performance on many tasks, e.g. chess, Tower of Hanoi.

Production systems model learning, as new rules are constructed and chunked.

Quantitative fit: power law of learning. Rate of learning slows down.

Applies to many kinds of skill acquisition.

Learning in rats: not just associations, but rules.

E.g. if tone then shock. Conditioning is learning rules.

Learning of social rules. If given something, say thank you. + more interesting sorts, e.g. in explaining other people's behavior (abduction).

Knowledge of physical systems: mental models. If you turn the key, the engine starts.

Learning logical and statistical rules, e.g. law of large numbers: want larger samples.

Learning language: grammatical rules. Pronunciation.

"finger", "singer", "longer", "bringer"

If "ing" word is formed from a verb, the "g" is not pronounced.

Forming past tense: "goed".

Disadvantages of rules

Representational power

1. It's hard to state rules to express subtle perceptual factors, such as a guitar sounding right, a sauce tasting good, or a picture looking attractive.

2. Rules may be overgeneral, or too specific to be used. E.g. how to decide what classes to enrol for.

Computational power

1. Rules lack the flexibility of cases and distributed representations: just use a past success and adapt it. E.g. filling out income tax forms.

2. Rules need to be retrieved from memory, and

therefore need some kind of supplemental indexing system.

3. It can be difficult to construct a system of rules that ensures that rules are fired in appropriate ways, getting reasonable chains.

4. Expert systems are no longer just rule-based

- frame-based (concepts)

- case-based (analogy)

- connectionist (neural networks)

Psychological plausibility.

Rules may provide the explanation for lots of our psychological abilities, but their computational and informational limitations suggest that other kinds of representation are needed too.

N.B.: The human mind, like current expert systems, may use a multitude of representational schemes for different purposes. Minsky: the brain is a kludge.


Basic questions:

1. What representations are required for our ability to understand and produce language?

2. How is language learned?

Behaviorist answer:

Language is based on a set of associations, learned by trial and error.

Chomsky's revolution

1956 Syntactic Structures

Rejected associationism

- grammars are complex, rule-like structures

- universal grammar is innate

- we are born with readiness to notice what kind of grammar our native language has.

Evidence for new view:

a. productivity of language: we can understand sentences that have never been uttered before.

"Colorless green ideas sleep furiously." Compare

"The hit ball girl the."

b. ease of language learning: almost all children acquire language with relatively little feedback.

Example of Chomskyan grammar (earlier sort):

1. The girl kicked the ball.

. 2. The ball was kicked by the girl.

A syntactic transformation produces a new structure.

This explains the productivity of language: transformations can produce an unlimited number of structures.

Learning of grammars is relatively easy because we have innate expectations about kinds of structures and transformations.

Learning is a kind of abduction: children form hypotheses to explain the utterances they here.

Chomsky's 1980's view: government and binding.

1. Less emphasis on transformations.

2. More emphasis on constraints on what can count as grammatical.

3. Innate universal grammar: E.g. asymmetry of subject and object.

All languages have nouns, verbs, adjectives, and adpositions (pre- or post-positions).

XP = X - YP.

For each X (verb, noun, adjective, preposition) there is a phrase YP (noun phrase, etc.) that can follow it.

Children merely need to learn parameters: set of switches to be set. But the basics of universal grammar are not learned, and could not be in the time available.

Concepts are also innate and preexisting: children just need to learn what labels to apply to them.

Chomsky and his followers are very hostile to connectionism, which emphasizes learning and less complex structures than universal grammar. But connectionists can be innatists: distributed representations acquired through evolution.

Summary: Problems with Chomsky's views:

1. Not computational: competence not performance.

This limits its explanatory power. No detailed learning simulations.

2. Divorced from psychology.

Only methodology is linguistics judgments of grammaticallity.

3. Extreme nativism.

4. Divorce of syntax from semantics and pragmatics. Overconcentration on syntax.

Practical Applicability

Rule-based ideas are widely used in expert systems and computer tutors.

Key points in Steven Pinker, "Rules of Language"

Phil/Psych 256

Computational Epistemology Laboratory.

Paul Thagard

This page updated Sept. 23, 2015