Psycholinguistics/Neural Basis of Speech Perception

This chapter is limited to the neural basis of speech perception. For information on the non-neural models please refer to the chapter on the Models of Speech Perception

Introduction
The study of the neuroanatomy of speech perception has been a topic of interest for over 130 years. However, the full characterization of speech perception has yet to become complete. In the 1870s, Carl Wernicke developed the first hypothesis in understanding the functional neuroanatomy of speech perception. Wernicke hypothesized that the auditory cortex supports speech perception from patients of his with auditory language comprehension disorders, typically with left superior temporal gyrus (STG) lesions (see Figure 1 for neuroanatomical location). Patients with these symptoms were later termed Wernicke aphasics. In the 1980s, Damasio & Damasio found that large lesions to the left superior temporal gyrus did not lead to auditory speech comprehension problems, but rather deficits in speech production. These discrepancies in the literature created some ambiguity as to whether the STG was involved in speech perception or not. Damage to the left frontal, or inferior parietal areas created deficits in identification and discrimination of speech syllable tasks. This discovery raises the question of the involvement of a frontal-parietal circuit in speech perception. The development of functional magnetic resonance imaging (fMRI), the ambiguity was not eliminated with the use of passive listening tasks as hoped. Activations were visible in all of these areas including the STG bilaterally. Using syllable discriminations tasks Zatorre and collegues found prominent activation in the left inferior frontal lobe. However, many authors have suggested that the inferior frontal lobe may have been activated as a result of task-dependent phonological working memory processes.

The goal of these various studies is to understand the neural processes supporting the processing of speech sounds that ultimately link up with the mental lexicon, and result in auditory comprehension. From as early as the last 1800s research shows much conflicting evidence as to the neural origins of speech recognition as a whole. This is in part because of the use of some sub-lexical tasks, the problem being that speech perception and speech recognition are double dissociated. This means that speech perception doesn’t always necessarily result in speech recognition, the abilities are distinct computationally. Sub-lexical tasks aim to only measure early stages of processing phonemes of syllabic representations, yet their are people who cannot accurately perform these tasks yet have accurate comprehension. So in failing the sub-lexical tasks, these people appear to have comprehension deficits even though they do not. The main difference between the two is that speech perception requires the listener to maintain a sub-lexical representation active while completing the task. Whereas speech recognition does involve lexical access processes. .

In the 1870s, when Carl Wernicke, depicted in figure 2, developed the first hypothesis to speech processing, he also developed the first notion of two streams of auditory information processing. It wasn’t until 2007 that Hickok and Poeppel expanded and furthered this idea into what is now called the dual-stream model, which accounted for both speech perception and speech recognition. The publication of their dual-stream model has helped incorporate some of these afore mentioned discrepancies in the literature into a working model. Their descriptive account of the dual stream model is based around the central claim that proposes there is a ventral stream and a dorsal stream in processing verbal auditory stimuli. Speech articulation tasks rely on the dorsal stream system that is mostly left lateralized, and speech perception and recognition rely more heavily on the ventral stream system that is primarily bilaterally organized. The two streams have a common area of tissue in the left superior temporal gyrus (STG), and this therefore would explain the double dissociations that are observed. The common area of tissue in the STG auditory-responsive fields provides the early cortical stages for speech perception bilaterally. It is beyond this point that the system diverges into two streams for continued processing, the ventral and dorsal streams. Here, the focus will be on the ventral stream as it is more relevant to the neural aspects of speech perception, however, the dorsal stream will be discussed as it assists in explaining the frontal activations observed in some of the literature.

To understand spoken language, one must first transform the acoustic speech stimuli, recognize it as speech and not tones or non-speech sounds, and then convert it into a conceptual phonological representation that can then be assessed for meaning upon access to the mental lexicon. The chapter on acoustic phonetics covers the transformation of the acoustic signal, so in this chapter we will pick up at the processing of phonological (creating phonological representations) and semantic (linking of phonological representations with the syntactic and morphological operations) information that leads to comprehension. To eliminate confusion, the use of the term speech processing in this chapter will refer to any task that uses aurally presented speech stimuli. Speech perception then will refer to the sublexical tasks like syllable discrimination. Finally, the term speech recognition in this chapter will refer to a set of computations used to the perception information into a representation that will link up with the mental lexicon. The chapter makes use of the dual-stream model to assist in breaking down the various aspects of speech processing. However, although the heading is the dual-stream model all other evidence is included as well, some supporting and some opposing. There also may be other models proposed out there, but generally the information gathered fits well with this model.

Dual-Stream Model
The dual-stream model of the auditory system shows similarities to the speech processing dual-stream model proposed by Hickok and Poeppel in 2007. The ventral stream is analogous to the auditory 'what' stream, and processes speech signals for speech recognition and comprehension. The dorsal stream, analogous to the 'where' stream, is involved in the translation of speech signals to articulatory representations for speech production. The auditory-motor integration function of the dorsal stream has been suggested as where the language processing dorsal stream and the auditory 'where' stream deviate. In order to grasp the understanding of the dual-stream model, in terms of speech perception, we must break down the ventral stream into phonological and semantic processing. It is also necessary to incorporate other (and sometimes conflicting) studies to generate a more full view of the current research.

Ventral Stream
In the literature there appears to be an undetermined fact as to what exact aspects of phonological processing are included in the ventral stream and which are not. Therefore, some sub-lexical tasks are considered pre-ventral stream processing, and other more advanced aspects, forming full phonological representations are included. Here, the initial phonological-level processing is placed before the ventral stream, and includes the middle to posterior portions of the superior temporal sulcus bilaterally. This is the portion of the superior temporal sulcus that is used by both ventral and dorsal streams before they split. The mapping of the sensory or phonological representations is considered part of the ventral stream, and is the topic of interest when discussing phonological processing in the rest of the chapter. The ventral stream maps the phonological information onto conceptual and semantic representations. This means that the ventral system itself processes distinctive features of the sounds, phonemes, syllabic and phonological structures, and, grammatical and semantic information. Hickok and Poeppel propose that the ventral system also uses it’s own parallel computational pathways, that allows the system to be organized bilaterally, but asymmetrically.

Phonological Processing
In the study of phonological processing, Steven Petersen and colleagues conducted a study on the cortical anatomy of single-word processing using positron emission tomography. They found key auditory processing areas to be bilateral activation of the primary auditory cortex, and temporoparietal cortex, anterior superior temporal cortex, and inferior anterior cingulated cortex, all left lateralized. They hypothesized the temporoparietal region near the angular and supramarginal gyri areas as a good candidate for the phonological coding region. Then, in order to achieve proper lexical access the mapping of sound to a representation requires the integration of information on various time scales. In the study by Robert Zatorre and colleagues, they attempted to evaluate phonological processing in terms of these temporal, but also spectral changes in response to auditory stimuli. They found that there were distinct auditory cortex areas in each hemisphere, specific for each of these two parameters. The anterior auditory region in the right hemisphere showed greater cerebral blood flow to increasing spectral, rather then temporal, changes in stimuli. Whereas the same area on the left hemisphere showed the reverse pattern; a greater response to temporal changes in stimuli. The right superior temporal sulcus also showed increased response to spectral stimuli, but not to temporal parameters. Thus, the right auditory cortex appeared to respond less well to rapidly changing acoustic information characteristic of speech processing, where the left hemisphere was much better able to follow this stimuli. Thus, as explained in Hickok 2010, a critical portion of the superior temporal sulcus is involved in phonological processes, particularly in the area bounded anteriorly by the anterolateral aspect of Heschl’s gyrus and posteriorly by the posterior end of the sylvian fissure.

Lexical and Semantic Processing
Phonological processing yields phonological codes (or representations) which are then used in speech processing to reach higher-level lexicon representations that are essential for auditory speech comprehension. Strong empirical evidence shows the involvement of the posterior middle temporal lobe regions in accessing lexical and semantic information. Lesion studies of the posterior temporal lobe have also supported this evidence. Elizabeth Bates and her colleagues studied 101 left-hemisphere aphasic patients and found that lesions to the middle temporal gyrus (MTG) most accurately predicted auditory comprehension deficits, with a significant deficit also observed with dorsolateral prefrontal cortical lesions. Whereas, lesions to the insula and arcuate/superior longitudinal fasciculus affected verbal fluency the most. Functional magnetic resonance imaging studies (see Figure 3 for an example of an fMRI scan) have further implicated the posterior middle temporal lobe regions in semantic processing. In a semantic decision task in research conducted by Jeffrey Binder and colleagues, activations were found in both sides of the STS and almost all of the MTG in the left hemisphere. They also found that this activation spread ventrally across the inferior temporal gyrus (ITG). The authors found further activations in the angular gyrus, anterior and posterior cingulated gyrus, portions of the precuneus, retrosplenial cortex, and cinglulate isthmus in the left hemisphere; as well as subcortical activations in the anterior thalamus in the left hemisphere. . Many positron emission tomography (PET) studies (see Figure 4 for an example of a PET scan) examined the activated areas in response to semantic processing. They too found left lateralized non-STG temporparietal regions; including the MTG, inferior temporal gyrus (ITG), and the angular gyrus. Finally, Karalyn Patterson and her two colleagues, found using dementia patients that deficits can affect amodal semantic knowledge for objects, as opposed to mapping sound to meaning specifically. Meaning, that it is not restricted to auditory stimuli at all. Their dementia patients appeared to have deficits in conceptual knowledge of objects in auditory and in visual modalities. Therefore, they hypothesized that the posterior lateral and inferior temporal lobe is more involved in the access of semantic knowledge from auditory input. They labeled the anterior temporal lobe as performing more of an integrating function for certain semantic forms across modalities. .

Although significant information implicates the temporal lobe in semantic processing, frontal lobe activations were also observed in a significant amount of studies. The Jeffrey Binder research mentioned above found significant prefrontal cortex activation including much of the inferior and superior frontal gyri, aspects of the middle frontal gyrus, and anterior cingulate cortex. PET studies have found similar, but less extensive results showing left frontal activation as a result of language processing in the inferior frontal gyrus and into the posterior middle frontal gyrus. One hypothesis for the role of the frontal lobe areas in language processing is it’s role as a “language executive” proposed by Jeffrey Binder. His teams research hypothesized the frontal lobe's role in coordinating sensory and semantic processing, and accommodating momentary changing goals; but noted that after injury the role could be adopted by other areas. This hypothesis is supported by patients with left frontoparietal lesions, who experience global aphasias acutely after injury, only later for their symptoms to improve into an expressive aphasia.

Overall, the research suggests that the left STG plays an important role in the analysis of speech sounds for comprehension at a linguistic-semantic level. This converging evidence suggests that Wernicke’s area may not be the primary location for language comprehension. It also suggests that the left temporoparietal regions outside of Wernicke’s area, as well as the left frontal lobe, and these frontal areas extend beyond Broca’s area including the prefrontal cortex.

Dorsal Stream
While the ventral stream perception and recognition of speech is generally agreed upon, the dorsal stream holds less agreement. It is responsible for translating speech signals into articulatory representations in the frontal lobe, involving the posterior frontal lobe and posterior dorsal-temporal lobe and the parietal operculum. The dorsal processing stream is said to be the site for auditory-motor interaction according to Hickok and Poeppel who have suggested this circuit as crucial in speech development and that it provides the neural mechanisms for phonological short-term memory. They also suggest that the dorsal stream is strongly left lateralized, which would provide evidence toward the prominence of production deficits following lesions to dorsal temporal and frontal lobes. .

A study conducted by Price and colleagues evaluated the regions involved in auditory word perception and repetition. This is an example of how results and methods need to be read carefully as results can be interpreted in many ways. Activations of the left inferior frontal region (otherwise known as Broca's area, see Figure 5 for the location Broca's area and other important language centers) during auditory word processing were found during phonological judgement to auditory stimuli, word retrieval, and semantic judgements. These tasks all require the with holding of auditory stimuli in auditory-verbal short-term memory while making semantic or phonological judgements. We know that these short-term memory tasks with auditory stimuli increase the activation of Broca's area based on research from Paulesu and colleagues and therefore the role of Broca's area in these tasks may be as a result of phonological rehearsal. Whereas, Fiez et al. interprets the activation in the frontal operculum as phonological analysis, not required for listening to or identifying auditory stimuli. Remember back to the discussion on the ventral stream semantic processing, Binder and colleagues hypothesized similar frontal activation as a result of "language executive" functions. Thus, this frontal activation site has been interpreted in different ways, and requires additional research before a solid take on it's role in either the ventral or dorsal stream, or both, is reached.

Conclusion
In summary, currently the most appropriate way to characterize the neural aspect of speech perception is through the use of the dual stream model. The dual stream model proposes there are two streams for processing verbal auditory stimuli; the dorsal and ventral processing streams. The primarily bilateral ventral stream maps phonological information onto semantic representations. It uses a critical portion of the superior temporal sulcus for phonological processing. Lexical and semantic processing of the ventral stream makes use of the posterior middle temporal lobe, and left superior temporal lobe. The common activation in the superior temporal lobe for both phonological and semantic processing explains the double dissociation observed in lesion studies. The dorsal stream also utilizes the superior temporal lobe, this is the location for the early cortical stages for auditory processing bilaterally that occurs before the two processing streams diverge. Inferior frontal activations in studies have been explained as a result of language executive functions, and also as phonological rehearsal. Therefore, the traditional focus of Wernicke's area and Broca's area on their involvement in speech processing and production may be too confined when including all the current evidence. A much broader left temporoparietal region, and frontal lobe region are more likely involved then initially suspected.

Learning Exercises
Try to answer all the questions before resulting to the answer key. This will help to solidify your understanding. Note: Answer examples do not include portions of essay questions that are opinion based. These are designed to get you thinking for yourself, rather than just regurgitating your newly acquired information.

Part A) Language Processing 1. The use of passive listening to speech tasks in fMRI language studies failed to highlight the STG involvement as hoped, showing bilateral activations. Based on the information you’ve acquired reading the Phonological Processing section about manipulating auditory stimuli , hypothetically, what methodological factors may have resulted in this bilateral activation if they had been focusing on the early stages of speech perception? How could they have manipulated their stimuli to favour one of the two hemispheres?

2. What is the difference between the dorsal and ventral streams in the Dual-Stream model? Which hemisphere doe they reside in, or are they bilaterally organized?

3. Read through the following information provided and answer the subsequent questions: Broca’s area was traditionally understood as a frontal “expressive” area for planning and executing speech. An fMRI study by Binder and colleagues on lexical and semantic processing has shown significant and extensive inferior frontal activations. Other PET studies have found similar, but less extensive left frontal activations. Binder claimed the frontal area as a “language executive” in controlling sensory and semantic processes in accommodating moment-to-moment shifts in goals or strategies. Some researchers attributed similar activations to maintaining words in an active state, others that it is more the analysis of these active words.
 * What can account for the similarities and/or differences in these conclusions? Refer back to this question after completing the other questions. See if your response changes.

4. What aspects of phonological processing occur before the processing streams diverge? If a study revealed activations coinciding with this area, to musical stimuli, how might these language-processing activations be related to processing music?

5. Read through the following information provided and answer the subsequent questions: When looking through research it is important to read over the methods sections, and the results section. That way you know the methods that are used, and you can interpret the results accordingly. Authors also have their own biases, and so using your own interpretations will help you include the best information in your own research. For example, Price et al., 1995 conducted a PET study with subjects listening to words, and repeating words. Binder et al., 1997 conducted an fMRI study using a semantic decision task (that requires them to press a button if the aurally presented noun was an animal native to the United States). . Both of these articles state their methodology tests semantic processing, and thus their activations both show semantic related brain regions.
 * Why is it difficult to compare the results of these two studies if they both evaluate semantics?
 * Which of the two tasks is a better choice for evaluating semantic processes?
 * Can you think of another type of task that might better focus on semantic processing?
 * If you were to conduct a study using this type of task, what methods would you use, and would you expect a different result?

Part B) Neuroanatomy of Language 1. The temporal lobe is involved in which aspects of language?

2. The superior aspect of the temporal lobe is more heavily involved in which aspects of language specifically?

3. The middle aspect of the temporal lobe is believed to be more heavily involved in which aspects of language specifically?

4. Name the three parts of the inferior frontal gyrus, as well as its common name. Name the three different tasks that activated the inferior frontal gyrus in the textbook. What do these tasks all have in common that caused some researchers to attribute the activation to another part of speech processing? What part of speech processing was it?

5. What are the names of the two regions in the inferior parietal lobule that are said to be good candidates for the phonological coding region?

6. Read through the following information provided and answer the subsequent questions: Many researchers have termed the anterior temporal lobe (ATL) as the “semantic hub”. Input and output information from all different sensory and motor areas to from an amodal semantic representation is used to make appropriate generalizations based on the central semantic relationships in this hub. A meta-analysis on research examining the ATL in reference to semantics found that frequently PET studies would show activations, whereas fMRI would not. They pointed out that the fMRI signal from the ATL and orbito-frontal regions are known to produce artifacts because of magnetic susceptibilities in the area. This would explain the little amount ATL activation studies highlighted in the text.
 * Assuming these hypotheses as true, what would make the ATL region an appropriate location for extracting this conceptual information, particularly as it pertains to linguistic information?
 * Some researchers have said that the motor, sensory, and language aspects of conceptual knowledge are necessary, but not sufficient to form the entire neural basis of semantics as the semantic hub hypothesis implies. What is your position on this matter?

/Learning Exercises Answers/