thumb|Video of a person saying overlaid with audio of . People affected by the McGurk effect perceive .
The McGurk effect is a perceptual phenomenon that demonstrates an interaction between hearing and vision in speech perception. The illusion occurs when the auditory component of one sound is paired with the visual component of another sound, leading to the perception of a third sound. The visual information a person gets from seeing a person speak changes the way they hear the sound. If a person is getting poor-quality auditory information but good-quality visual information, they may be more likely to experience the McGurk effect.
Integration abilities for audio and visual information may also influence whether a person will experience the effect. People who are better at sensory integration have been shown to be more susceptible to the effect. This effect was discovered by accident when McGurk and his research assistant, MacDonald, asked a technician to dub a video with a different phoneme from the one spoken while conducting a study on how infants perceive language at different developmental stages. When the video was played back, both researchers heard a third phoneme rather than the one spoken or mouthed in the video.
This effect may be experienced when a video of one phoneme's production is dubbed with a sound-recording of a different phoneme being spoken. Often, the perceived phoneme is a third, intermediate phoneme. As an example, the syllables /ba-ba/ are spoken over the lip movements of /ga-ga/, and the perception is of /da-da/. McGurk and MacDonald originally believed that this resulted from the common phonetic and visual properties of /b/ and /g/. Two types of illusion in response to incongruent audiovisual stimuli have been observed: fusions ('ba' auditory and 'ga' visual produce 'da') and combinations ('ga' auditory and 'ba' visual produce 'bga'). This is the brain's effort to provide the consciousness with its best guess about the incoming information. The information coming from the eyes and ears is contradictory, and in this instance, the eyes (visual information) have had a greater effect on the brain, and thus the fusion and combination responses have been created. With the exception of people who can identify most of what is being said from lip reading alone, most people are quite limited in their ability to identify speech from visual-only signals. and have an effect on daily interactions that people are unaware of. Research into this area can provide information on not only theoretical questions, but also it can provide therapeutic and diagnostic relevance for those with disorders relating to audio and visual integration of speech cues.
Factors
Internal
Brain damage
Both hemispheres of the brain make a contribution to the McGurk effect. They work together to integrate speech information that is received through the auditory and visual senses. A McGurk response is more likely to occur in right-handed individuals for whom the face has privileged access to the right hemisphere and words to the left hemisphere. In people with right hemisphere damage, impairment on both visual-only and audio-visual integration tasks is exhibited, although they are still able to integrate the information to produce a McGurk effect. People with dyslexia particularly differed for combination responses, not fusion responses. They use less visual information in speech perception, or have a reduced attention to articulatory gestures, but have no trouble perceiving auditory-only cues. However, if the stimulus was nonhuman (for example bouncing a tennis ball to the sound of a bouncing beach ball) then they scored similarly to children without ASD. It has been suggested that the weakened McGurk effect seen in people with ASD is due to deficits in identifying both the auditory and visual components of speech rather than in the integration of said components (although distinguishing speech components as speech components may be isomorphic to integrating them).
Language-learning disabilities
Adults with language-learning disabilities exhibit a much smaller McGurk effect than other adults. These people are not as influenced by visual input as most people. Often a reduced size of the corpus callosum produces a hemisphere disconnection process. Schizophrenia slows down the development of audiovisual integration and does not allow it to reach its developmental peak. However, no degradation is observed. The greatest difficulty for aphasics is in the visual-only condition showing that they use more auditory stimuli in speech perception.
External
Cross-dubbing
Discrepancy in vowel category significantly reduced the magnitude of the McGurk effect for fusion responses. Auditory /a/ tokens dubbed onto visual /i/ articulations were more compatible than the reverse.
Mouth visibility
The McGurk effect is stronger when the right side of the speaker's mouth (on the viewer's left) is visible. People tend to get more visual information from the right side of a speaker's mouth than the left or even the whole mouth. Visual attention modulates audiovisual speech perception.
Syllable structure
A strong McGurk effect can be seen for click-vowel syllables compared to weak effects for isolated clicks. This shows that the McGurk effect can happen in a non-speech environment. If a male face is dubbed with a female voice, or vice versa, there is still no difference in strength of the McGurk effect. The effect is experienced more often and rated as clearer in the semantically congruent condition relative to the incongruent condition. While looking at oneself in the mirror and articulating visual stimuli while listening to another auditory stimulus, a strong McGurk effect can be observed. Subjects are still strongly influenced by auditory stimuli even when it lagged the visual stimuli by 180 milliseconds (point at which McGurk effect begins to weaken). Touch is a sensory perception like vision and audition, therefore increasing attention to touch, decreases the attention to auditory and visual senses.
Gaze
The eyes do not need to fixate in order to integrate audio and visual information in speech perception. There was no difference in the McGurk effect when the listener was focusing anywhere on the speaker's face. English, Spanish, German, Italian and Turkish language listeners experience a robust McGurk effect; Japanese and Chinese listeners, weaker. Most research on the McGurk effect between languages has been between English and Japanese. A smaller McGurk effect occurs in Japanese listeners than English listeners. The cultural practice of face avoidance in Japanese people may diminish the McGurk effect, as well as tone and syllabic structures of the language. In comparison to normal-hearing individuals, this is not different unless there is more than one syllable, such as a word. From just minutes to a couple of days old, infants can imitate adult facial movements, and within weeks of birth, infants can recognize lip movements and speech sounds. At this point, the integration of audio and visual information can happen, but not at a proficient level. Through the process of habituating an infant to a certain stimulus and then changing the stimulus (or part of it, such as ba-voiced/va-visual to da-voiced/va-visual), a response that simulates the McGurk effect becomes apparent.
