Phil/Psych 446, Cognitive Modelling, Week 11

Vision - Introduction

Why is vision important to Cognitive Science?

Almost 50% of a primate brain is dedicated to vision.

Our ability to navigate the world is heavily dependent on vision.

Much high-level reasoning, e.g. problem solving, employs visual representations. Einstein and many other scientists and inventors claim that their most creative thinking is visual.

Low-level vision

Input: light reflected onto the eye from objects

Output: representations that capture information such as the location, contrast, and sharpness of signficant intensity changes or edges of the image. These correspond to physical features such as object boundaries.

Procedures: Filter the image to smooth and differentiate the image intensities and produce a representation of the gross structure of image contours.

See the MIT Encyclopia of Cognitive Science article "Computational Vision."

Mid-level vision

Input: representations produced by low-level vision.

Output: Representations of surfaces and objects, including 3-D shape, motion, orientation, illumination, and occlusion.

Procedures: Use information from binocular stereo, changes in motion, variations in geometric structure, image shading, and other cues to determine shape and motion. Winston ch. 27 is about low- and mid-level vision.

High-level vision

Input: Representations of surfaces and objects.

Output: object and face recognition scene perception.


Visual reasoning

Input: High-level visual representations of objects and scenes.

Output: Inferred representations of objects, scenes, and relations.


This kind of reasoning has been relatively neglected in AI and cognitive science. Exceptions:

Uses of visual reasoning:

What is needed:

Visual reasoning with dynamic scene graphs

DIVA: A model of visual reasoning developed by David Croft.

See also D. Croft and P. Thagard, "Dynamic Imagery."


A scene graph is a structure developed to support complex computer graphics.

It has hierarchical (tree) structure that combines pictorial and propositional information to represent a 3-dimension scene.

Nodes may represent: a group, translation, rotation, shape, color, 2-D image, or behavior (motion).

Scene graphs can be implemented and manipulated in Java3D and VRML (Virtual Reality Modeling Language).

Long term memory: database of past sensory input and general semantic information. Similar to association networks in ACT.


Input: structures in VRML. These are translated into structures in Java3D, and placed in working memory.

Built into Java3D are procedures for transforming, rotating, and grouping objects, and for putting them in motion.

Under development are new algorithms for visual reasoning.

Potential applications

Imagination: construct a novel scene graph.

Analogy: match scene graphs.

Explanation: use a scene graph to generate a causal hypothesis.

Invention: use scene graphs to produce novel devices.

Abductive Inference with Multimodal Representations

- Powerpoint presentation to come.

Phil/Psych 446

Computational Epistemology Laboratory.

Paul Thagard

This page updated March 21, 2005