Almost 50% of a primate brain is dedicated to vision.
Our ability to navigate the world is heavily dependent on vision.
Much high-level reasoning, e.g. problem solving, employs visual representations. Einstein and many other scientists and inventors claim that their most creative thinking is visual.
Input: light reflected onto the eye from objects
Output: representations that capture information such as the location, contrast, and sharpness of signficant intensity changes or edges of the image. These correspond to physical features such as object boundaries.
Procedures: Filter the image to smooth and differentiate the image intensities and produce a representation of the gross structure of image contours.
See the MIT Encyclopia of Cognitive Science article "Computational Vision."
Input: representations produced by low-level vision.
Output: Representations of surfaces and objects, including 3-D shape, motion, orientation, illumination, and occlusion.
Procedures: Use information from binocular stereo, changes in motion, variations in geometric structure, image shading, and other cues to determine shape and motion. Winston ch. 27 is about low- and mid-level vision.
Input: Representations of surfaces and objects.
Output: object and face recognition scene perception.
Input: High-level visual representations of objects and scenes.
Output: Inferred representations of objects, scenes, and relations.
This kind of reasoning has been relatively neglected in AI and cognitive science. Exceptions:
Uses of visual reasoning:
What is needed:
DIVA: A model of visual reasoning developed by David Croft.
See also D. Croft and P. Thagard, "Dynamic Imagery."
A scene graph is a structure developed to support complex computer graphics.
It has hierarchical (tree) structure that combines pictorial and propositional information to represent a 3-dimension scene.
Nodes may represent: a group, translation, rotation, shape, color, 2-D image, or behavior (motion).
Scene graphs can be implemented and manipulated in Java3D and VRML (Virtual Reality Modeling Language).
Long term memory: database of past sensory input and general semantic information. Similar to association networks in ACT.
Input: structures in VRML. These are translated into structures in Java3D, and placed in working memory.
Built into Java3D are procedures for transforming, rotating, and grouping objects, and for putting them in motion.
Under development are new algorithms for visual reasoning.
Imagination: construct a novel scene graph.
Analogy: match scene graphs.
Explanation: use a scene graph to generate a causal hypothesis.
Invention: use scene graphs to produce novel devices.
- Powerpoint presentation to come.
Computational Epistemology Laboratory.
This page updated March 21, 2005