Psycholinguistics/Development of Speech Perception

Introduction
Speech perception is the process by which humans are able to interpret and understand the sounds used in language. It is about how we recognize speech sounds and how we use this information to understand spoken language.Researchers have studied how infants learn speech. It is evident that different languages use different sets of speech sounds and infants must learn which sounds their native language uses, and which ones it does not. As we ponder children's development of speech perception more carefully, we will see how children are able to do this. In many studies, psychologists have suggested that certain sound categories are genetically-specified while others have determined that infants may be able to learn the sound categories of their native language through passive listening, using a process called "statistical learning". Psychologists use three different techniques,sucking, heart rate, and head turn, to demonstrate if infants detect changes in phonetic features. These techniques also measure boredom and/or preferences for certain sounds and toys. Furthermore, this chapter will explain how infants are able to distinguish more categories of speech sounds than adults. Newborns are able to distinguish between many of the sounds of human languages, but by about 12 months of age, their abilities weaken. As they age, there would be a need for different techniques to determine infants' abilities in speech perception.

A time line of Speech Perception and Production


The figure was proposed by Kuhl et. al in 2004 to show how the changes that occur in speech perception and production in typically developing infants during their first year of life. It presents how a child begins to process basic linguistic units, including phonological features, phonemes, and syllables during the auditory input and articulatory output of spoken language. As seen in the figure, infants discriminate phonetic contrasts of all languages until they are 5-months old. While 3 months, they can produce non-speech and vowel-like sounds. As they are 6 months old, they are introduced to statistical learning (distributional frequencies) and they have preference to language-specific perception for vowels. At 7 months old, they start making 'canonical babbling' and by 8 months, they detect typical stress pattern in words resulting from statistical learning (transitional probabilities). At 9 months old, they recognize language specific sound combinations and by 10 months, they produce language specific speech production. When infants are 11 months old, the consonant perception in foreign language declines and there is an increase in their native language consonant perception. This time line is complete when the infant turns one year old. At this point, many infants produce their first words.

Prelinguistic Speech Perception
Although most children begin producing language, some still cannot produce speech sounds when they are just turning one year old. Developmentalists therefore make inferences about how preverbal children learn to discriminate speech sounds that they heard in their environments. There has been evidence indicating that newborn infants have had exposure to at least some characteristics of native language while still in the womb, a procedure that would allow developmentalists to test neonates and understand how this is possible. DeCasper and Fifer(1980) found that newborns less than one day old preferred hearing their mother's voice over listening to an unfamiliar female voice, suggesting that prenatal auditory experience is involved. These researchers showed that newborns would suck more when hearing their mother's voice compared to a stranger's voice. They also undertook further research asking pregnant women to read a specific passage aloud everyday for the last six weeks of their pregnancy and reread the same passage as well as a novel passage to their newborn infants. Infants were tested to see if there was a preference for the familiar passage over the new one. The babies who were read the passages before birth showed preference for the familiar passage whereas babies who were not read any passage did not show any preference at all. These are all examples of research demonstrating a preference for one type of speech over another. Each new investigation provides some additional phonetic evidence indicating that infants can discriminate. There are different procedures that can be used for testing speech perception capabilities in young infants, and what they can perceive at very young age. These include high amplitude sucking, visual habituation, and the conditioned head turning procedures.

HIGH AMPLITUDE SUCKING PROCEDURE
Sucking is one of the few activities over which infants have good motor control. Infants are born with a sucking reflex and in the first months of their lives they suck not only for food but for any object they can get in their mouth. One of the techniques used to measure newborn's perceptual abilities is called "High Amplitude Sucking". It is an ideal technique since sucking behaviour is easily conditioned. It was found by Siqueland and DeLucia(1969)and it used visual reinforcement to condition infant's sucking. The strength of sucking was controlled by the brightness of a visual display, and they showed that 4 month olds can learn the contingency between sucking and presentation of light. In another study, infants are hooked up to a device that records how vigorously they are sucking on an artifical nipple to receive formula. The sucking response here is a good indicator of the infants' arousal level, since sucking rates increase when there are changes in environmental stimuli. For example, an infant will increase their sucking rates to a novel stimuli and decrease their rates if the stimuli is continuously repeated( habituation), indicating a shift in arousal via sucking rate. In 1971, a study by Eimas et al. assessed 1-4 month old infants abilities to discriminate between /ba/ versus /pa/ stimuli. They presented infants with between category shift sounds such as /ba/and /pa/, and the control group with within category shift sounds such as /ba/ and /ba/. Results showed that infants increased their sucking rate after the between category shift and not after the within category shift. This study demonstrates that infants, like adults, can perceive speech in a categorical manner, and this technique therefore can be profitably used to describe discrimination abilities in very young infants.

THE VISUAL HABITUATION
The visual habituation procedure has been used to test many aspects of infant visual perception. Werker. et al (1994) proved it more sensitive than many other behavioural techniques for revealing infant's discrimination capabilities. Infant's ability to categorise stimuli can also be assessed using this method, and performance in habituation tasks can be used as a predictor of information processing capacity. The procedure is based on the infant's novelty preference and thus his/her visual attendance to a display when it is new. When an infant is presented to the same display it will become familiar and it will no longer be novel and therefore an infant's time will decline and the experimenter can infer that habituation has occurred. An infant, at this point, will be presented to novel stimuli and watched to see if they can discriminate the old display and therefore their looking time will increase. An infant is usually seated in a dim testing room with little to interest them. A red flashing light is used to attract infant's attention to the screen. The trial begins as the experimenter determines that the infant is looking at the screen. The infant in this study can both be presented repeatedly with visual and auditory stimuli which can be the syllable (/da/). In the fully infant-controlled version of this procedure, as long as the infant continues to look at the visual displayed on the screen, the infant hears the familiarisation stimulus(1994). When the infant looks away from the screen, the experimenter signals the computer to end the trial and both visual and auditory stimuli are turned off (1994). At this point, the red light is turned on and the cycle begins again. This procedure is found to be relatively easy to implement, and the experimenter can be trained quickly, and to test the whole session. It also capitalises on an infant's tendency to look at a visual stimulus in the presence of an interesting auditory stimulus.

HEAD TURN TECHNIQUE(Visually Reinforced Infant Speech Discrimination)
A third way to obtain evidence of sound preference is through the Head Turn Technique. Researchers suggest that infants are born with the ability to discriminate the universal set of phonetic contrasts and that experience with a specific language serves to narrow infants' discriminatory capabilities. This was based on the research of infants using High Amplitude Sucking Technique and it was also based on research of adults using labelling tasks. They found that the tasks used for testing infants were more sensitive and that there was a difference in methodology, rather than a difference in perceptual sensitivity between infants and adults. Therefore, it was decided that a procedure was required that could be adapted with only minor modifications for testing people from infancy through adulthood. This technique therefore is one of the most versatile procedures for testing people of different ages. One version of the procedure is for early infants where it tests for habituation simply by recording how the newborn reacts to new stimuli. Another procedure tests the older infant's (ages from 6 months to 1 year of age) tendency to listen to something new or to look at something interesting, like a moving toy. The infant learns to turn his/her head to a sound or to a change in sounds. The infant will always initially be more interested, meaning that they will pay more attention to the word if they notice something familiar about it than a word they are not familiar with at all. The study follows that an infant sits on the caregiver's lap across the table from an experimental assistant and he/she is shown brightly coloured toys to keep their attention and interest. The initial sound is played repeatedly and then changed, at which point the moving toy is presented. When the baby makes a correct head-turn, the moving toy is displayed and the assistant smiles and praises the infant. When there is an incorrect head-turn, there is no reinforcement.

Phonological Development
In speech perception, categorization represents the ability to group perceptually distinct sounds in the same category. Unlike computers, infants can classify as similar phonetic units spoken by different talkers, at different rates of speech and in different contexts (Jay, 2003).

Categorical Perception


An early discovery about the nature of the innate skills that infants bring to the task of phonetic learning and about the timeline of early learning is called "categorical perception". Kuhl et. al (1994) describe categorical perception as the tendency for adult listeners of a particular language to classify the sounds used in their languages as one phoneme or another, showing no sensitivity to intermediate sounds. It is focused on the discrimination of the acoustic events that distinguish the phonetic units. In speech, at the age of one month, infants can discriminate between /b/ and /p/ syllables and /ba/ and /pa/. There are two tasks that are involved in demonstrating categorical perception. These are called "identification and discrimination" (Kuhl et al,1994). Experimentally we determine categorical perception both by using identification tasks such as 'was that sound a /pa/ or a /ba/?' and with discrimination tasks such as distinguishin two sounds. Perfect categorical perception occurs when specific conditions are met. For example, if the identification function changes abruptly along the continuum, the discrimination function shows a peak in accuracy at the boundary between categories, with performance at or near chance within category, and also if the discrimination functions are perfectly predictable from the identification probabilities (Scott. et al).

Voice Onset Time (VOT)
Voice Onset Time (VOT) is the length of time that passes between when a stop consonant is released and when voicing, the vibration of the vocal folds, begins. For example, /b/ is perceived if the VOT of the bilabial consonant is less than 25ms, but when the VOT is between 25 and 60 ms then the consonant /p/ is perceived (Jay, 2003). Children, like adults, after varying the VOT tables, showed poor discrimination with a phoneme category but good discrimination across phoneme boundaries. Using the High Amplitude Sucking Technique, it was suggested that children habituate to sounds within a category. They would however increase their sucking rate when the sounds shift categories (/b/ to /p/), meaning that there was a novel stimuli, a dishabituation.

Critical VOT Times also vary between languages. In "Perception of Speech from sound to meaning" written by Moore. et al, a cross language study of syllable initial stops identified an important phonetic correlation of voicing contrasts, namely, the VOT. They found that stops in initial position tend to occur within three VOT sub ranges across languages: long negative VOTs that preceds the articulator release by more than 45 ms, short positive VOTs that follows the release by no more than 20ms, and long positive VOTs that follows the release by more than 35ms. These three VOT sub ranges illustrated that languages select two adjacent ones to implement their voicing contrasts. An example to this would be the Spanish and Dutch speakers when they use long negative VOTs for their voiced category and short positive VOTs for their voiceless category, whereas for English and Cantonese speakers, it is used in opposite way. This indicates that for all of these language groups, VOT perception is categorical in the sense that listeners show sharp identification boundaries between the categories relevant for their language, relatively good disrimination of stimulus pairs that straddle category boundaries, and relatively poor discrimination of stimulus pairs drawn from the same voicing category. This study also examined infants from a Spanish speaking environment and found that they showed enhanced discrimination of VOT differences, going back and forth with the Spanish or English voicing boundaries, and it was also found the same for infants from an English-speaking environment (Moore.et al, 2009).

Nonhumans and Categorical Perception
A child's ability to discriminate phonemes does not only rely on specific perceptual skills, such as speech sounds, but also on non speech sounds. In "Listening to Speech" written by Greenberg. et al, researchers investigated non humans ability perceiving language categorically. The study found that chincillas label and discriminate voiced /b/,/d/,/g/ from voiceless /p/,/t/,/k/ stop consonants in a fashion remarkably like that found for human listeners with the same stimuli, exhibiting categorical perception. In this study we see a phoneme boundary effect for /p/ and /b/, which is the property of mammalian audition the language uses, and it proves that categorical perception is not only unique to humans since we know language, but to nonhumans as well (Greenberg. et al, 2006). The researchers also argued that Kuhl's formulation of the native language magnet theory, suggesting that gross phonetic categories that are seperated by natural boundaries and produced by general auditory processes are common among related species, is best supported by the chinchilla study above (2006).

Statistic Learning
Kuhl et al describe statistical learning as the "Acquisition of knowledge through the computation of information about the distributional frequency with which certain items occur in relation to others, or probabilistic information in sequences of stimuli, such as the odds (transitional probabilities) that one unit will follow another in a given language." One of the studies in "The Psychology of Language" written by Timothy B. Jay (2003), examined statistical learning in 8 month old infants. They were presented to two minutes of tape recorded randomly combined 'words' in a single stream of speech (2003). Later in the study, they were either presented to these same 'words' or to 'nonwords' made up of the same syllables but in different orders. They found that infants listened longer to the nonwords than they did to the words, indicating that infants prefers novel stimuli over to familiar ones (2003). They knew the difference between them by learning the statistical probabilities of syllables in the words such as /tu/ is followed by /pi/. On the other hand, the nonwords had different transitional probabilities, even though the syllables in the words and nonwords were the same illustrating that infants were counting the statistical frequency of one particular syllable's being followed by another (2003). This study therefore reveals that infants can learn language and they can do it by using the statistical learning strategy.

Patricia Kuhl and Native Language Magnet Theory
It is Patricia Kuhl who proposed the Native Language Magnet Theory (NLM) of speech development to account for the influence of linguistic environment on speech perception. NLM specify three phases in this development. In phase 1, the initial state, infants are capable of differentiating human speech in all languages, and these abilities derive from their general auditory processing mechanisms rather than from a speech-specific mechanism. In phase 2, infants’ sensitivity to the distributional properties of linguistic input produces phonetic representations based on the distributional ‘modes’ in ambient speech input. In phase 3, the distortion of perception, which is the perceptual magnet effect, produces facilitation in native and a reduction in foreign language phonetic abilities. The research done by Goodman& Nisbaum (1994) focuses on the developmental transition from an infants' universal phonetic capacity to their native phonetic discrimination. These findings reveal that the child up to 1 months old have the ability to perceive difference in phonemes that adults cannot perceive. Infants are ready to learn any human language but their perceptual abilities adjust to their native speech environment within the first year.

Motherese
Kuhl. et al identify motherese as a slower, more stressed, simplified, and repetitive version of an adult sentence in the native language. When we talk to infants, we use this special language which has a unique acoustic signature that promotes infants' processing of speech. When compared to adult-directed speech, child-directed speech (motherese) is slower, has a higher average pitch, and contains exaggerated pitch contours. In child-directed speech, prosodic cues tend to be exaggerated in the kind of speech that is directed toward learning speech sounds. Likewise, motherese helps infants to analyze the structure of speech by highlighting boundaries between important units, such as words and clauses. Study examined women speaking either English, Russian, or Sweedish and their voices were recorded while they spoke to another adult or to their young infants. The research analyses illustrated that the vowel sounds (the /i/ in 'see' and the /a/ in 'saw' and the /u/ in 'Sue') in child-directed speech were more clearly articulated, because women were exaggerating all of the acoustic components of vowels which benefited the infants (Kuhl et al 1994). In contrast, in adult-directed speech, there did not find any emphasis in the articulation. Motherese has also been documented in a variety of cultures and across a typologically diverse set of languages, including sign language (1994).

Prosody
The role of prosody in the acquisition of language in general and in the development of speech perception in particular is fundamental for developmentalists. Empirical research suggests that prosody may help infants to segment speech and to help locate grammatical units. Boundaries between important grammatical units such as clauses and phrases are often marked by changes in variables related to prosody, including changes in pitch contour, increases in syllable duration, and pausing. It is Hirsh-Pasek et al (1987) who first examined whether infants respond to prosodic marking of clausal units in fluent speech. They initially collected speech samples of a mother talking to her 18-month old infant. They chose passages from were 15 to 20 seconds long and modified these passages by inserting 1 second pauses in them in one of two ways. The pauses were either inserted between the two successive clauses, coincident versions, or they were between two words within each clause, non coincident versions. They hypothesized that if infants are sensitive to prosodic marking of clausal units, then they will listen longer to the versions with pauses coinciding with the clause boundaries than those versions with the noncoincident pauses (1987). The procedure used in this test was the Head Turn Technique and there were groups of 6 to 9 month-old infants. They both displayed significant listening preferences for the coincident versions, concluding that the sensitivity to prosodic markers of clausal units is present in infants as young as 6 months (1987).

Critical Period for Learning Speech
Communication, by means of sound, are innate for animals and requires no experience to be correctly produced. Humans, on the other hand, require extensive postnatal experience to produce and decode speech sounds that are the basis of language. Language acquisition during the critical periods require hearing and practicing abilities in deaf children. While most babies begin producing speechlike sounds at about 7 months(babbling), naturally deaf infants show distinct deficits in their early vocalizations and such individuals fail to develop language if not provided with an alternative form of symbolic expression (Fitzpatrick D. et al, 2001). If these deaf children are exposed to sign language at an early age however, they begin to “babble” with their hands just as a hearing infant babbles audibly. This suggests that regardless of the modality early experience shapes language behaviour. There are other children who have acquired speech but lost their hearing right before puberty. These children also suffer a significant decline in spoken language, because they are unable to hear themselves talk and thus lose the opportunity to refine their speech by auditory feedback (2001). An example of a pathological case is given in a situation where a girl was raised by deranged parents until the age of 13 under conditions of almost total language deprivation. Despite the training later in her development, she was never able to learn more than the fundamental levels of communication, which led the researchers to support the importance of early language experience. If children can learn a language normally at young age, and as they become adults in the future, they are able to retain their ability to speak and comprehend language even if a significant amount of time has passed without exposure or speaking. The normal acquisition of human speech in other words is subject to a critical period: The process is sensitive to experience or deprivation before puberty and is determined by similar experience or deprivation in adulthood (2001).

During an individual's early life, the phonetic structures of the language they hear shapes both the perception and production of speech. Very young infants can perceive and discriminate between differences in all speech sounds and are not innately biased towards the characteristics of phonemes of any particular language. This universal appreciation however does not continue. For example, the phonetic distinction between /r/ and /l/ sounds in English is not present in Japanese language and therefore, Japanese speakers cannot reliably distinguish between them. Nonetheless, 4-month-old Japanese infants were tested using two procedures, the Head Turning and Sucking Technique, and showed that they can make this discrimination as reliably as 4-month-olds raised in English-speaking household (2001). By 6 months of age however infants begin showing specific preferences for phonemes in their native language over those in foreign languages, and by the end of their first year they no longer respond to phonetic elements other than their native languages. The ability to perceive these phonemic contrasts, the fact that children can learn to speak a second language without an accent and with fluent grammar, continues for several more years until about age 7 or 8. The performance later on declines gradually no matter what the extent of practice or exposure (2001). Interestingly, the “baby-talk” or “motherese” emphasizes phonetic distinctions compared to normal speech among adults. Thus, learning language during the critical period for its development entails an amplification and reshaping of innate biases by appropriate postnatal experience (2001).

Summary
Infant's first perception of speech starts in the utero. It is filtered and the mother's voice is the most familiar sounds of all other sounds. Many studies illustrate that one day old infants prefer mother's voice to other female voices and they also show preference for stories frequently heard while still in the utero. This indicates that infants are sensitive to prosody (Mehler et al, 1988). For example, when a baby is born, their preference of hearing their native language eventually becomes higher than other foreign languages. Infants detect contrasts from non-native language before they turn one year old and they lose this ability afterwards. We learned that infants as young as 4 weeks show categorical perception and it is revealed that this is not only unique to humans but also to non humans to have categorical perception. Infants also have pattern abstraction capabilities where 6 month-olds can perceptually 'sort' novel instances of phonemes into categories. They can detect similarity across different voices, intonation contours, and phonological (co-articulary) contexts.

Path Kuhl is one of the only scientists who emphasize substantially on development of speech perception and production. In 1992, she introduced the phenomenon called 'perceptual magnet effect', which demonstrated that as a second language is acquired, the brain gradually groups sounds according to their similarity with phonemes in the native language. Many of the thousands of human languages and dialects use different repertoires of speech elements called 'phonemes' to produce spoken words. For example, when asked to categorize a continuous spectrum of artificial phonemes between /r/ and /l/, native English speakers, but not Japanese speakers, tend to perceive sounds that sound as either /r/ or /l/. She also introduced a type of speech used in child directed speech called 'the motherese'. It has a slower rate, an exaggerated stress and prolonged vowel sounds used by mothers to communicate with their infants. Motherese has instructive meaning that it conveys knowledge for infants during their critical period of language acquisition.

In summary, it may seem deceptively simple to acquire language however researchers have struggled to explain the initial phases of how exactly infants can do this. At all levels, language learning is constrained, meaning that perceptual, social and neural factors affect what can be learned, how, and when. Identifying these constraints on infant learning and how they reflect innate knowledge will be a continuing focus in the next decade.

Learning Exercises
 PART 1: MATCHING TECHNIQUES 

''There are a couple of the discussed procedures in the chapter that may be used concurrently to perceive speech in infants. Please select the proper one from the following that best corresponds with the three examples below and explain why: (A) Visual Habituation Technique, (B) The Head Turn Technique, (C) High Amplitude Sucking Technique''

1.In one study, researchers examine what differences infants can make in speech sounds and how they begin to respond to the differences between them. They also observe what kind of information infants can remember from these different sounds. In order to do this study, they use one of the techniques mentioned above. In your opinion, which technique for this infant language study should be used and how will it function?

2.In another study, researchers examine how interesting different speech stimuli are to infants. By comparing infants’ preferences among two or more stimuli types, researchers can determine if infants are sensitive to the relevant properties that differentiate the stimuli. They also examine in this study how long infants can remember words once they are familiarized with it. In your opinion, regarding the procedure of this study, which technique will be most suitable to observe infant’s attention to the stimuli and how will it function?

3.In the last study, researchers examine infants understanding of word meanings and their discrimination between them. This technique shows parents how important it is to know that before their infant starts to say their first words, they are be able to comprehend some of the words that they hear. Therefore, in your opinion, what technique determines infants’ ability to discriminate between these words via visual and auditory stimuli and how will it function?

 PART 2: COMPREHENSION 

Upon reviewing the chapter which consists of information about Development of Speech Perception, write a short answer essay to the questions below:

Question #1: After watching the video clip provided here| Baby Talk: Stimulation Speech and Development, how would you define motherese? Also give an example from the chapter and other research articles on the importance of child-directed speech for infants’ language acquisition.

Question #2: What is categorical speech perception and when does it arise in infants?. Which of the techniques discussed in the chapter is best to illustrate the use of categorical speech in infants? Include the important dimension in speech perception such as ' The Voice onset Time' while answering the question.

 PART 3:TRUE OR FALSE 

Upon reviewing the chapter above, decide which statements are true and which are false and explain why in either case:

1.Deaf children are incapable to develop language and therefore it is impossible for them to bable in any sort of way.

2. In Kuhl’s Native Language Theory, Phase three is when the infants' sensitivity to the distributional properties of linguistic input produces phonetic representations. Experience accumulates and the representations most often activated begin to function as perceptual magnets for other members of the category.

3.Regarding the three phases from Native Language Magnet Theory, it is predicted that better native phonetic perception at 7 months of age will have accelerated language development at between 14 and 30 months whereas better non-native performance at 7 months will have slower language development at 14 and 30 months.

4.Jean-Philippe is 5 months old and he can already detect typical stress pattern in words resulting from statistical learning (transitional probabilities).

5. A study found that 8 month old infants can learn language by using the statistical learning strategy, and they do it by differentiating between the statistical probabilities of syllables in the words they learned and the transitional probabilities of syllables in the nonwords(created by words but put in different orders).

Follow the link to the answers