Phil/Psych 446, Cognitive Modeling, Week 7

Open Mind Common Sense Project

Cognitive Wheels in Symbolic Artificial Intelligence

Comparison: Digital and natural intelligence

Digital intelligence Human mind

Architecture serial massively parallel

Component speed very fast slow (1 ms)

Search processes deep shallow (pattern matching)

Representation verbal, mathematical multimodal, e.g. visual

Embodiment simple robots evolved

Emotional no yes

Accuracy exact approximations

Cognitive wheels

A cognitive wheel is a system that is computationally powerful but is not psychologically and biologically natural (Dennett, 1984).

It's ok for Artificial Intelligence to use cognitive wheels for technological purposes, but cognitive modelling can't, since the aim is to understand human minds.

Discussion question: what are your best candidates for cognitive wheels?

What is the opposite of a cognitive wheel? Perhaps a cognitive leg, a system that is biologically natural but not so easy to turn into technology.

Cognitive wheels in AI

Problem solving by domain-independent search

Deep Thought is the world chess champion thanks to deep search.
In contrast, humans do not search very deep, but instead use learned patterns, which may be general schemas or particular cases.

Logical, deductive techniques such as theorem proving

But psychologists such as Johnson-Laird have argued that people do logical reasoning differently, by mental models.

Probability theory and Bayesian networks

Very influential in current AI, e.g. in the Russell and Norvig text.
But probability theory is historically recent (I. Hacking, The Emergence of Probability) and psychologically difficult (Kahneman and Tversky).
However, it is possible that human thinking uses other means (e.g. neural networks) to approximate Bayesian reasoning.

Various learning methods (see 7b below).

Of course, it could turn out that what I think is a cognitive wheel is really what the brain does.

Cognitive legs

A cognitive leg is a biologically natural system that is hard to computerize.

Examples:

Non-verbal processes, e.g. vision

Language processing that can deal with ambiguity

Expertise arising from pattern matching

E.g. chess, Wayne Gretsky: Skate to where the puck is going, not to where it's been.
Combination of schemas and analogies, often non-verbal

Emotional cognitive processing

Consciousness

Why learning is important to cognitive modelling

People learn new concepts, rules, and tasks.

People incrementally improve their ability to solve problems.

People aren't easily programmed - they have to learn to do things themselves.

We have already discussed learning of rules, frames and analogs (weeks 4-6).

Discuss: What have you learned so far in this class?

New concepts, e.g. forward chaining, inheritance.
New rules, e.g. schemas are powerful but inflexible.
New skills, e.g. how to write a LISP program.
New patterns, e.g. LISP programming types.

Localist neural networks and constraint satisfaction

Approaches to Artificial Neural Networks as Cognitive Models

1. Localist networks that perform parallel constraint satisfaction (this week)

2. Networks that learn distributed representations by backpropagation (9a)

3. Networks with more structured distributed representations (9b)

4. Networks that model emotional cognition (week 10)

Local ("localist") versus distributed representations:

1. Local: each unit (neuron) represents something cognitive like a concept or a proposition.

2. Distributed: concepts, propositions, images, etc. are distributed over multiple units

Coherence as constraint satisfaction (Thagard & Verbeurgt, 1998)

Many philosophical problems and psychological phenomena can be understood in terms of coherence (making sense).

Understanding language.
Choosing competing explanations of nature and everyday life.
Deciding what to do.
Making sense of other people.
Using analogies.

Coherence can be understood in terms of maximal satisfaction of multiple constraints, in a manner informally summarized as follows:

1. Elements are representations such as concepts, propositions, parts of images, goals, actions, and so on.

2. Elements can cohere (fit together) or incohere (resist fitting together). Coherence relations include explanation, deduction, facilitation, association, and so on. Incoherence relations include inconsistency, incompatibility, and negative association.

3. If two elements cohere, there is a positive constraint between them. If two elements incohere, there is a negative constraint between them.

4. Elements are to be divided into ones that are accepted and ones that are rejected.

5. A positive constraint between two elements can be satisfied either by accepting both of the elements or by rejecting both of the elements.

6. A negative constraint between two elements can be satisfied only by accepting one element and rejecting the other.

7. The coherence problem consists of dividing a set of elements into accepted and rejected sets in a way that satisfies the most constraints.

Formal definition of coherence

Consider a set E of elements which may be propositions or other representations. Two members of E, e1 and e2, may cohere with each other because of some relation between them, or they may resist cohering with each other because of some other relation. We need to understand how to make E into as coherent a whole as possible by taking into account the coherence and incoherence relations that hold between pairs of members of E. To do this, we can partition E into two disjoint subsets, A and R, where A contains the accepted elements of E, and R contains the rejected elements of E. We want to perform this partition in a way that takes into account the local coherence and incoherence relations. For example, if E is a set of propositions and e1 explains e2, we want to ensure that if e1 is accepted into A then so is e2. On the other hand, if e1 is inconsistent with e3, we want to ensure that if e1 is accepted into A, then e3 is rejected into R. The relations of explanation and inconsistency provide constraints on how we decide what can be accepted and rejected.

More formally, we can define a coherence problem as follows. Let E be a finite set of elements {ei} and C be a set of constraints on E understood as a set {(ei, ej)} of pairs of elements of E. C divides into C+, the positive constraints on E, and C-, the negative constraints on E. With each constraint is associated a number w, which is the weight (strength) of the constraint. The problem is to partition E into two sets, A and R, in a way that maximizes compliance with the following two coherence conditions:

1. if (ei, ej) is in C+, then ei is in A if and only if ej is in A.

2. if (ei, ej) is in C-, then ei is in A if and only if ej is in R.

Let W be the weight of the partition, that is, the sum of the weights of the satisfied constraints. The coherence problem is then to partition E into A and R in a way that maximizes W.

Maximizing coherence using artificial neural networks

Structures

Represent an element by a unit (artificial neuron). The activation of a unit represents the degree of acceptance of the element it represents.

Constraints between elements are represented by links between units.

To bias acceptance of some elements, create a special unit that is always activated, and link favored units to it.

Procedures

Set activation of all units (except the special unit) to near 0.

Repeatedly update activation of each unit until activations have all stabilized. For each unit, update its activation as a function of:

its current activation
the activation of each unit linked to it, taking into account the weight of the link
a decay parameter, to help the network settle.

Summary:

Coherence Neural network

element unit

positive constraint excitatory link

negative constraint inhibitory link

maximize coherence parallel updating of activation

element accepted unit activated

element rejected unit deactivated

Note: non-connectionist algorithms can also do constraint satisfaction, e.g. Winston, ch. 11.

Applications of coherence networks

Analogy, explanatory coherence, decision making, conceptual coherence

Kintsch, W. (1998) Comprehension: A paradigm for cognition. New York: Cambridge University Press.

Neural networks simulators

Stuttgart Neural Network Simulator

Tlearn

PDP++

Implementing neural networks in LISP

Neural network programs are usually written in a conventional language like C++, but here is how to write a simple program in LISP that does parallel constraint satisfaction using localist representations.

Structures

Units

Units (highly simplified artificial neurons) can be represented in various ways, for example by objects in the Common LISP object system. The simplest way to represent a unit is as a symbol with a property list with information that includes the activation of the unit and the links that the unit has with other units.

Write a procedure MAKE-UNIT (NAME) that gives the unit an initial activation 0. Add the unit to a list *ALL-UNITS* that you can use for settling the network.

For data abstraction, write a function GET-ACTIVATION (UNIT) that returns the activation of a unit from its property list or whatever data structure you have used to represent the unit.

Links

To specify constraints, you need to make links between units. Note that links in this kind of network are symmetrical: if unit1 is linked to unit2 then unit2 has the same link to unit1.

So write a procedure MAKE-LINK (UNIT1 UNIT2 WEIGHT) that associates with each unit a link to the other unit with give weight. A link could thus be a list of the form (UNIT WEIGHT). For example, if you represent units as symbols with property lists, this function could do something like:

(put unit1 'links (cons (list unit2 weight) (get unit1 'links)))

For data abstraction write a procedure GET-WEIGHT (UNIT1 UNIT2) that returns the weight between the two units by searching through the links of UNIT1 to find one with UNIT2 and return the weight.

There are other ways of representing links, for example as an n X n matrix of numbers for n units.

The weights should be small. I recommend .04 as the default weight of an excitatory link and -.06 as the default weight of an inhibitory link. To encourage settling, inhibition should be greater than excitation.

Procedures

Creating the network

Use MAKE-UNIT and MAKE-LINK to create a network of units and links.

Create a unit SPECIAL that has maximum activation and is not updated. Make links from SPECIAL to those units that you want your network to tend to activate, i.e. those units that represent elements that should tend to be accepted, such as ones that describe what has been observed.

Settling the network

Write a function RUN-NETWORK (TIMES) that will update the activations of all the units a maximum number of TIMES.

Updating can be done either synchronously (all units are updated simultaneously in parallel) or asynchronously (units are updated one at a time). Logically, synchronous updating is preferable, since it gives no advantage to units whose activation is updated first. To implement synchronous updating, you need to distinguish between the NEW-ACTIVATION of a unit that it has on the current cycle of updating and the OLD-ACTIVATION that it had from the last cycle. NEW-ACTIVATION of a unit is then based on OLD-ACTIVATION of other units. To do asynchronous activation properly, which unit gets activated next should be random.

Synchronous networks typically settle (i.e. all units reach unchanging activation values) in under 100 cycles of updating. The network should stop updating when it has settled.

Updating the activation of each node

There are many different equations that can be used to update the activation of a unit based on the activations of the units to which it is linked. I usually use this one (from McClelland and Rumelhart), which specifies the activation of a unit j , here representecd by Aj, in terms of the net input from all other units i.

Aj(t+1) = Aj(t)(1-d) + NETj(max - Aj(t)) if NETj > 0, otherwise NETj(Aj(t) - min).

Here d is a decay parameter (e.g .05) that decrements each unit at every cycle, min is a minimum activation (e.g. -1), max is maximum activation (e.g. 1).

Based on the weight Wij between each unit i and j, we can calculate NETj , the net input to a unit, by:

NETj = SUMi WijAi(t).

Here is LISP code that implements that equation. You will have to modify it to work with your own implementation. I don't think that the various float declarations are necessary - they were added by another programmer.

; UPDATE-UNIT-ACTIVN updates the activation of a unit based on the
; links it has.
(defun update-unit-activn (unit)
  (declare (ftype (function (&rest float) float) min max + * -)
   	   (ftype (function (float float) symbol) >))
  (let ((net-input-value (net-input unit)))
    (declare (type (float) net-input-value))
    (setf (new-activation unit)
	 (min *max-activation*
	      (max *min-activation*
		   (+ (* (activation unit) (- 1.0 *decay-amount*))
		      (if (> net-input-value 0.0)
			  (* net-input-value
			     (- *max-activation* (activation unit))
			  )
		          ; else:
			  (* net-input-value
			     (- (activation unit) *min-activation*)
			  )
		      )
		   )
	      )
         )
    )
  )
)

; NET-INPUT is the weighted sum of output from all input units.
(defun net-input (unit)
  (declare (ftype (function (&rest float) float) max + *))
  (do ((links (links-from unit) (cdr links))
       (result 0.0)
      )      
      ((null links) result)
      (declare (type (float) result))
      (setq result (+ (* (float (cdar links))
			 (max *output-threshold* (activation (caar links)))
                      )
		      result 
                   )
      )
  )
)

(defvar *min-activation* -1.0"Minimum possible activation for a unit.")

(defvar *max-activation* 1.0 "Maximum possible activation for a unit.")

(defvar *decay-amount* 0.05 "Amount that units' activations decay over time.")

(defvar *output-threshold* -1.0 "Minimum activation for an influential unit.")

Assignment 4, due March 10.

Don't forget that the Project Plan is due March 3.

Note: all assignments must be handed in on paper in class or to HH 365.

Phil/Psych 446

Computational Epistemology Laboratory.

Paul Thagard

This page updated Feb. 14, 2005

	Digital intelligence	Human mind
Architecture	serial	massively parallel
Component speed	very fast	slow (1 ms)
Search processes	deep	shallow (pattern matching)
Representation	verbal, mathematical	multimodal, e.g. visual
Embodiment	simple robots	evolved
Emotional	no	yes
Accuracy	exact	approximations

Coherence	Neural network
element	unit
positive constraint	excitatory link
negative constraint	inhibitory link
maximize coherence	parallel updating of activation
element accepted	unit activated
element rejected	unit deactivated

Phil/Psych 446, Cognitive Modeling, Week 7

Open Mind Common Sense Project

Cognitive Wheels in Symbolic Artificial Intelligence

Comparison: Digital and natural intelligence

Digital intelligence Human mind Architecture serial massively parallel Component speed very fast slow (1 ms) Search processes deep shallow (pattern matching) Representation verbal, mathematical multimodal, e.g. visual Embodiment simple robots evolved Emotional no yes Accuracy exact approximations

Cognitive wheels

Cognitive wheels in AI

Cognitive legs

Why learning is important to cognitive modelling

Localist neural networks and constraint satisfaction

Approaches to Artificial Neural Networks as Cognitive Models

Local ("localist") versus distributed representations:

Coherence as constraint satisfaction (Thagard & Verbeurgt, 1998)

Formal definition of coherence

Maximizing coherence using artificial neural networks

Structures

Procedures

Summary:

Coherence Neural network element unit positive constraint excitatory link negative constraint inhibitory link maximize coherence parallel updating of activation element accepted unit activated element rejected unit deactivated

Applications of coherence networks

Neural networks simulators

Implementing neural networks in LISP

Structures

Units

Links

Procedures

Creating the network

Settling the network

Updating the activation of each node

Assignment 4, due March 10.

Phil/Psych 446

Coherence Neural network

element unit

positive constraint excitatory link

negative constraint inhibitory link

maximize coherence parallel updating of activation

element accepted unit activated

element rejected unit deactivated