版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、Chapter 3: Speech PerceptionOverview of QuestionsCan computers perceive speech as well as humans?Why does an unfamiliar foreign language often sound like a continuous stream of sound, with no breaks between words?Does each word that we hear have a unique pattern of air pressure changes associated wi
2、th it?Are there specific areas in the brain that are responsible for perceiving speech?Speech perception refers to the processes by which humans are able to interpret and understand the sounds used in language. The study of speech perception is closely linked to the fields of phonetics and phonology
3、 in linguistics and cognitive psychology and perception in psychology. Research in speech perception seeks to understand how human listeners recognize speech sounds and use this information to understand spoken language. Speech research has applications in building computer systems that can recogniz
4、e speech, as well as improving speech recognition for hearing- and language-impaired listeners.Speech PerceptionThe first step in comprehending spoken language is to identify the words being spoken, performed in multiple stages:1. Phonemes are detected (/b/, /e/, /t/, /e/, /r/, )2. Phonemes are comb
5、ined into syllables (/be/ /ter/)3. Syllables are combined into words (“better”)4. Word meaning retrieved from memorySpectrogram: I owe you a yo-yoSpeech perception: two problemsWords are not neatly segmented (e.g., by pauses) Lack of phoneme invarianceCoarticulation = consecutive speech sounds blend
6、 into each other due to mechanical constraints on articulatorsSpeaker differences; pitch affected by age and sex; different dialects, talking speeds etc.The speech input consists of;Frequency range 50-5600HzCritical band filters(临界频带滤波器)Dynamic range 50dBTemporal resolution (瞬时清晰度)of 10msSmallest de
7、tectable change in F0 2HzSmallest change in F1 40HzSmallest change in F2 100HzSmallest change in F3 150HzThe Speech StimulusPhoneme - smallest unit of speech that changes meaning in a wordIn English there are 47 phonemes:23 major vowel sounds24 major consonant soundsNumber of phonemes in other langu
8、ages varied11 in Hawaiian and 60 in some African dialectsTable 13.1 Major consonants and vowels of English and their phonetic symbolsThe Acoustic SignalProduced by air that is pushed up from the lungs through the vocal cords and into the vocal tractVowels are produced by vibration of the vocal cords
9、 and changes in the shape of the vocal tractThese changes in shape cause changes in the resonant frequency and produce peaks in pressure at a number of frequencies called formants(共振峰)Figure 13.1 The vocal tract includes the nasal and oral cavities and the pharynx, as well as components that move, s
10、uch as the tongue, lips, and vocal cords.The Acoustic Signal - continuedThe first formant has the lowest frequency, the second has the next highest, etc.Sound spectrograms show the changes in frequency and intensity for speechConsonants are produced by a constriction of the vocal tractFormant transi
11、tions - rapid changes in frequency preceding or following consonantsFigure 13.3 Spectrogram of the word had showing the first (F1), second (F2), and third (F3) formants for the vowel /ae/. (Spectrogram courtesy of Kerry Green.) Figure 13.4 Spectrogram of the sentence “Roy read the will,” showing the
12、 formants such as F1, F2, and F3, and formant transitions such as T2 and T3. (Spectrogram courtesy of Kerry Green.)The Relationship between the Speech Stimulus and Speech PerceptionThe segmentation problem - there are no physical breaks in the continuous acoustic signalHow do we segment the individu
13、al words?The variability problem - there is no simple correspondence between the acoustic signal and individual phonemesVariability from a phonemes context Coarticulation - overlap between articulation of neighboring phonemesFigure 13.5 Spectrogram of “I owe you a yo-yo.” This spectrogram does not c
14、ontain pauses or breaks that correspond to the words that we hear. The absence of breaks in the acoustic signal creates the segmentation problem. (Spectrogram courtesy of David Pisoni.)Figure 13.6 Hand-drawn spectrograms for /di/ and /du/. (From “Perception of the Speech Code,” by A. M. Liberman, 19
15、67, Psychological Review, 74, 431-461, figure 1. Copyright 1967 by the American Psychological Association. Reprinted by permission of the author.)The Relationship between the Speech Stimulus and Speech Perception - continuedVariability from different speakersSpeakers differ in pitch, accent, speed i
16、n speaking, and pronunciationThis acoustic signal must be transformed into familiar wordsPeople perceive speech easily in spite of the segmentation and variability problems Figure 13.7 (a) Spectrogram of “What are you doing?” pronounced slowly and distinctly. (b) Spectrogram of “What are you doing?”
17、 as pronounced in conversational speech. (Spectrogram courtesy of David Pisoni.)Stimulus Dimensions of Speech PerceptionInvariant acoustic cues - features of phonemes that remain constantShort-term spectrograms are used to investigate invariant acoustic cuesSequence of short-term spectra can be comb
18、ined to create a running spectral displayFrom these displays, there have been some invariant cues discoveredFigure 13.8 Left: a short-term spectrum of the acoustic energy in the first 26 ms of the phoneme /ga/. Right: sound spectrogram of the same phoneme. The sound for the first 26 ms is indicated
19、in red. The peak in the short-term spectrum, marked a, corresponds to the dark band of energy, marked a in the spectrum. The minimum in the short-term spectrum, marked b, corresponds to the light area, marked b in the spectrogram. The spectrogram on the right shows the energy for the entire 500 ms d
20、uration of the sound, whereas the short-term spectrum only shows the first 26 ms at the beginning of this signal. (Courtesy of James Sawusch.)Figure 13.9 Running spectral displays for /pi/ and /da/. These displays are made up of a sequence of short-term spectra, like the one in Figure 13.8. Each of
21、these spectra is displaced 5 ms on the time axis, so that each step we move along this axis indicates the frequencies present in the next 5 ms. The low-frequency peak (V) in the /da/ display is a cue for voicing. (From “Time-Varying Features of Initial Stop Consonants in Auditory Running Spectra: A
22、First Report,” by D. Kewley-Port, and P. A. Luce, 1984, Perception and Psychophysics, 35, 353-360, figure 1. Copyright 1984 by Psychonomic Society Publications. Reprinted by permission.)Categorical PerceptionThis occurs when a wide range of acoustic cues results in the perception of a limited number
23、 of sound categoriesAn example of this comes from experiments on voice onset time (VOT) - time delay between when a sound starts and when voicing beginsStimuli are da (VOT of 17ms) and ta (VOT of 91ms)Categorical Perception - continuedComputers were used to create stimuli with a range of VOTs from l
24、ong to shortListeners do not hear the incremental changes, instead they hear a sudden change from /da/ to /ta/ at the phonetic boundaryThus, we experience perceptual constancy for the phonemes within a given range of VOT Figure 13.10 Spectrograms for /da/ and /ta/. The voice onset time - the time be
25、tween the beginning of the sound and the onset of voicing - is indicated at the beginning of the spectrogram for each sound (Spectrogram courtesy of Ron Cole.) Figure 13.11 The results of a categorical perception experiment indicate that /da/ is perceived for VOTs to the left of the phonetic boundar
26、y, and that /ta/ is perceived at VOTs to the right of the phonetic boundary. (From “Selective Adaptation of Linguistic Feature Detectors, by P. Eimas and J. D. Corbit, 1973, Cognitive Psychology, 4, 99-109, figure 2. Copyright 1973 Academic Press, Inc. Reprinted by permission.)Figure 13.12 In the di
27、scrimination part of a categorical perception experiment, two stimuli are presented, and the listener indicates whether they are the same or different. The typical result is that two stimuli with VOTs on the same side of the phonetic boundary (solid arrows) are judged to be the same, and that two st
28、imuli on different sides of the phonetic boundary (dashed arrows) are judged to be different. Figure 13.13 Perceptual constancy occurs when all stimuli on one side of the phonetic boundary are perceived to be in the same category even though their VOT is changed over a substantial range. This diagra
29、m symbolizes the constancy observed by Eimas and Corbit (1973) experiment, in which /da/ was heard on one side of the boundary and /ta/ on the other side.Speech Perception is MultimodalAuditory-visual speech perceptionThe McGurk effectVisual stimulus shows a speaker saying “ga-ga”Auditory stimulus h
30、as a speaker saying “ba-ba”Observer watching and listening hears “da-da”, which is the midpoint between “ga” and “ba”Observer with eyes closed will hear “ba”McGurk EffectFigure 13.14 The McGurk effect. The womans lips are moving as if she is saying /ga-ga/, but the actual sound being presented is /b
31、a-ba/. The listener, however, reports hearing the sound /da-da/. If the listener closes his eyes, so that he no longer sees the womans lips, he hears /ba-ba/. Thus, seeing the lips moving influences what the listener hears.Cognitive Dimensions of Speech PerceptionTop-down processing, including knowl
32、edge a listener has about a language, affects perception of the incoming speech stimulusSegmentation is affected by context and meaningI scream you scream we all scream for ice creamFigure 13.15 Speech perception is the result of top-down processing (based on knowledge and meaning) and bottom-up pro
33、cessing (based on the acoustic signal) working together.Meaning and Phoneme PerceptionExperiment by Turvey and Van GelderShort words (sin, bat, and leg) and short nonwords (jum, baf, and teg) were presented to listenersThe task was to press a button as quickly as possible when they heard a target ph
34、onemeOn average, listeners were faster with words (580 ms) than non-words (631 ms)Meaning and Phoneme Perception - continuedExperiment by WarrenListeners heard a sentence that had a phoneme covered by a coughThe task was to state where in the sentence the cough occurred Listeners could not correctly
35、 identify the position and they also did not notice that a phoneme was missing - called the phonemic restoration effectPhonemic restoration Auditory presentation PerceptionLegislature legislatureLegi_laturelegi latureLegi*lature legislatureIt was found that the *eel was on the axle. wheel It was fou
36、nd that the *eel was on the shoe. heel It was found that the *eel was on the orange. peel It was found that the *eel was on the table. meal Warren, R. M. (1970). Perceptual restorations of missing speech sounds. Science, 167, 392-393. Meaning and Word PerceptionExperiment by Miller and IsardStimuli
37、were three types of sentences:Normal grammatical sentencesAnomalous sentences that were grammaticalUngrammatical strings of wordsListeners were to shadow (repeat aloud) the sentences as they heard them through headphonesMeaning and Word Perception - continuedResults showed that listeners were89% acc
38、urate with normal sentences79% accurate for anomalous sentences56% accurate for ungrammatical word stringsDifferences were even larger if background noise was present Speaker CharacteristicsIndexical characteristics - characteristics of the speakers voice such as age, gender, emotional state, level
39、of seriousness, etc.Experiment by Palmeri, Goldinger, and PisoniListeners were to indicate when a word was new in a sequence of wordsResults showed that they were much faster if the same speaker was used for all the wordsSpeech Perception and the BrainBrocas aphasia - individuals have damage in Broc
40、as area (in frontal lobe)Labored and stilted speech and short sentences but they understand othersWernickes aphasia - individuals have damage in Wernickes area (in temporal lobe)Speak fluently but the content is disorganized and not meaningfulThey also have difficulty understanding othersFigure 13.1
41、6 Brocas and Wernickes areas, which are specialized for language production and comprehension, are located in the left hemisphere of the brain in most people.Speech Perception and the Brain - continuedMeasurements from cats auditory fibers show that the pattern of firing mirrors the energy distribut
42、ion in the auditory signalBrain scans of humans show that there are areas of the human what stream that are selectively activated by the human voiceFigure 13.17 (a) Short-term spectrum for /da/. This curve indicates the energy distribution in /da/ between 20 and 40 ms after the beginning of the sign
43、al. (b) Nerve firing of a population of cat auditory nerve fibers to the same stimulus. (From “Encoding of Speech Features in the Auditory Nerve,” by M. B. Sachs, E. D. Young, and M. I. Miller, 1981. In R. Carlson and B. Granstrom (Eds.) The Representation of Speech in the Peripheral Auditory System
44、, pp. 115-130. Copyright 1981 by Elsevier Science Publishing, New York. Reprinted by permission.)Experience Dependent PlasticityBefore age 1, human infants can tell difference between sounds that create all languagesThe brain becomes “tuned” to respond best to speech sounds that are in the environme
45、ntOther sound differentiation disappears when there is no reinforcement from the environmentMotor Theory of Speech PerceptionLiberman et al. proposed that motor mechanisms responsible for producing sounds activate mechanisms for perceiving soundEvidence from monkeys comes from the existence of mirro
46、r neuronsExperiment by Watkins et al.Participants had their motor cortex for face movements stimulated by transcranial magnetic stimulation (TMS)Motor Theory of Speech Perception - continuedResults showed small movements for the mouth called motor evoked potentials (MEP)This response became larger when the person listened to speech or watched someone elses lip movementsIn addition, the where stream may work with the what stream for speech perceptionFigure 13.18 The transcranial magnetic stimulation experiment that provides evidence for a link between speech perception
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2014年04月25日上午广东省公务员面试真题
- 四川省申论模拟185
- 地方公务员江苏申论59
- 2009年3月6日国家税务局面试真题
- 2024届中考数学一次方程(组)天天练(3)及答案
- 2024年工程装修合同模板3500字
- 2024年简单房屋装修合同协议书
- 建筑设备工程电梯工程导轨支架和导轨安装施工工艺标准
- 2024年采购合同协议模板
- 2024年厂房协议书
- 西师大版四年级数学下册 (认识三角形)三角形 教学课件(第1课时)
- 《中国华电集团公司火电项目前期工作管理办法》
- 初中数学角的认识名师优质课获奖市赛课一等奖课件
- 具备履行合同所必需的设备和专业技术能力的
- 怎样协调五方主体的关系
- 标准化与食品标准的制定-食品企业标准的制定程序
- 儿童牙外伤-年轻恒牙外伤(儿童口腔医学课件)
- 泌尿系统感染指南
- 《批判性思维与中学物理(批判性思维与基础教育课程教学丛书)》读书笔记模板
- 学校食品安全总监职责
- 幼儿园教学课件社会教育《收获的农场》
评论
0/150
提交评论