A visual routine is a means of extracting information from a visual scene.
Shimon Ullman, in his studies on human visual cognition, proposed that the human visual system's task of perceiving shape properties and spatial relations is split into two successive stages: an early "bottom-up" state during which base representations are generated from the visual input, and a later "top-down" stage during which high-level primitives dubbed "visual routines" extract the desired information from the base representations. In humans, the base representations generated during the bottom-up stage correspond to retinotopic maps (more than 15 of which exist in the cortex) for properties like color, edge orientation, speed of motion, and direction of motion. These base representations rely on fixed operations performed uniformly over the entire field of visual input, and do not make use of object-specific knowledge, task-specific knowledge, or other higher-level information.
The visual routines proposed by Ullman are high-level primitives which parse the structure of a scene, extracting spatial information from the base representations. These visual routines are composed of a sequence of elementary visual operators specific to the task at hand. Visual routines differ from the fixed operations of the base representations in that they are not applied uniformly over the entire visual field --- rather, they are only applied to objects or areas specified by the routines. Researchers have also applied the visual routines approach to artificial map representations, for playing real-time 2D video games. In those cases, however, the map of the video game was provided directly, alleviating the need to deal with real-world perceptual tasks like object recognition and occlusion compensation.
