




已阅读5页,还剩8页未读, 继续免费阅读
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
外文翻译文献(中文)一个实验文语转换系统在分析韵律短语的贡献介绍我们描述了一个实验性的文语转换系统,它使用一个确定性的解析器和韵律规则为英文输入生成词组水平音高和时间持续久的信息。这一信息是用来注释输入句子,然后被处理的文本到语音程序目前在贝尔实验室开发。在建构这系统中,我们的目标一直是检验假设(i)该语法树中的信息可用。尤其地,如主谓和头补这样的语法功能,是bv公司本身在确定svnthetic韵律时有用的短语和语法功能(ii)它可以使用一个指定语法句法分析函数来确定合成语音的韵律短语。虽然语法和韵律之间的某些关联是众所周知的(例如像进度话词性应力的影响,或设立括号表达式关闭)实用的知识是非常小的语法问题上可能被连接到可用的韵律短语。在许多研究中,研究人员之间寻求成分结构和韵律连接(如cooper和paccia-cooper1980年。umeda1982年。gee和grosjean1983)但是,随着selkirk(1984年)的例外。他们往往忽略了在svntax树语法功能的代表性。此外,以前的工作还没有具体明确,提供了一个完整的系统实施的基础。在我们的韵律短语记录人类语言的研究的基础上,我们决定强调三个方面的结构,它涉及到短语:句法选区,语法功能及成分的长度。这些研究结果。我们将详细讨论,已实施了韵律规则的集合在一个实验文语转换系统。我们系统具有两个重要的特征。第一,对我们的韵律系统的输入是由一个一个分析树的deterministtc分析器fidditch(欣德尔1983)版本生成的。这个解析器左角落搜索策略,特别是,它的决定,给fidditch的速度,使在线文本到语音的生产是可行的。在建设一个解析树里,fldditch确定核心主谓对象关系,但没有试图代表附属或修饰关系。因此相对的条文,状语和其他非参数的成分在树中没有指定位置,而且没有指定的语义角色。第二,在韵律系统的规则通过参考据法结构和早期的语法结构来建立韵律树。其结果是一个支持该观点的分层表示,也是在selkirk(1984)提出该语法功能信息与韵律短语有关,但间接得,通过不同层次的处理。该系统的非正式测试显示,它在所产生的合成语音质量韵律中能够产生显著改善。我们在我们描述的调查系统的问题中,并没有发现任何严重违反我们的基本方针。在许多情况下,看来当前版本的问题能就通过进一步采取我们的做法来解决,包括所要求的另一个因素确定的韵律短语解析器的词汇信息文语转换大多数文语系统包括两部分:发音规则和语音合成器。发音规则转换成拼音输入文字,wav可以补充到一个提供关于一部分语音、强调模式和特定词语的拼音组成信息的字典。语音合成器然后转换拼音成语音参数系列,并在后来的处理中产生数字化语音。虽然这些系统往往表现在字的发音非常好,但当涉及到提供完整的句子很好的韵律时他们功亏一篑。目前的文本到语音系统无法获得语法和影响词组层次韵律的句子的语义特征。因此判刑韵律规则,当他们提供所有通常取决于文本(例如标点符号)表面的问题,以及在复杂程度不同的启发。虽然这种技术通常添加一个更自然的质量,由此产生的合成语音,他们可能会在一些重要方面失败,例如,忽略了冗长的主语和谓语韵律活动之间的韵律事件,以至于在字中正确的标记显著特征中的正确性和标记之间没有明确的韵律边界。 一些作者(如allen 1976; elovitz等al.1976。luce等1983)曾建议,语音合成与天然之间的韵律差异是主要的,在未解决的因素,导致合成语音的流利的理解困难。但是词组之间的层次韵律及其来源的关系,是如此知之甚少,以至于我们对在任何程度上不同层次的适用的解释-句法,语义或务实没有很好的理解。我们目前有一个合理的文本自动句法分析工具,但对于语义或语用文本分析并没有等价发达的东西。因此,一个明显的目的是探讨在何种程度上词组层次韵律可以解释语法树和发展这一关系的详细描述。另外一个目标是将这个关系而产生的见解转换成一个能够与语音合成器工作的系统。这使我们能够更充分地测试我们的描述,或许也将进一步产生一些文语技术。句法结构与韵律短语除了字一级水平,出现了句法结构和韵律短语之间的系统连接联系。cooper和paccia -cooper(1980),梅田(1982)和gee和grosjean (1983)和selkirk韵律理论(1984)在心理学声学调查是其中较显着的研究,代表了两种主要方法语法/韵律关系。在cooper和paccia -cooper(1980)和umeda(1982),从语法连接韵律短语是任何过滤过程作中间人,即他们提出了具体韵律短语可以直接从语法句法结构通过拥有音值的特别句法节点关联(或者成分界限),要么暂停,节段性延长,或交叉的语音规则,单词的调节阻塞。相比之下,gee和grosjean(1983)和selkirk(1984)认为,语法关系是间接的韵律:韵律短语是根据规则推导,是指由左到右的顺序,长度(或分支模式),并在在selkirk的情况下的语法功能,以及组成成员,以便推断层次韵律结构。但是,尽管各自的立场非常清楚,这些研究都不是决定性的。所有的语法框架缺乏足够详细和正式允许广泛的测试,大多数只考虑了少数的句子和句子类型。为了发展我们的分析,我们首先在从包含四个指令手册的不同文本里阅读我们的一次演讲来审查韵律短语。后来这些文本增加了一个专业阅读散文故事。韵律短语之间的界限被确定归类,然后根据他们的句法和语义方面的功能被归类。文语转换合成该方案构成的讲话组件中描述了liberman和buchsbaum(个人通信)。这些方案作为输入文字文本和产生数字化语音输出。通过注解文字输入这个系统,其运作的许多方面都可以重写或修改,例如:主要和次要的短语边界的位置,给单词的压力,转录的单词和它们之间的界限,时间段,以及等高线间距的细节。正如我们将显示,我们的韵律体制使我们能够生产其中的四个边境水平确定和感知区分,使用目前的文本到语音转换系统的注释字符串。韵律短语韵律规则使用的有关成分结构,语法的作用,和长度来映射一个表面结构树标识韵律短语边界的位置(由节点标志着)和每个边界(由节点号,标志着中)的相对强度信息。正是这一点是用来注释用转义序列提供有关韵律短语说明文字到语音转换系统的输入文字信息。在拟定我们的规则来建设韵律结构,我们以单单实施gee和grosjean(1983)模型的思想开始。这种模式最初提出来预测主观的描述句子结构,被称为性能结构,从句法树决定韵律边界,但声明不是明确提出了一个句法成分。我们起初被gee和grosjean的模式吸引,因为其对相对边界的比重,即在一个关于在句子中的其他界面边界强度的测定。我们发现,在我们所收集的数据,这个比重发挥了重要作用。事实上,我们直接纳入到我们的系统这样做的一个权重的方法,即gee和grosjean的规则来确定围绕一个使用相对长度(如终端节点数量衡量)动词短语的韵律边界的优势。当我们扩展gee和grosjean的模型来创建一个通用系统使用适当的算法,我们的算法偏离了它的出发点,反映了我们试图纠正在gee和grosjean模型中遇到的弱点和缺陷。我们遇到的这些问题并不奇怪,因为我们的目标和gee和grosjean之间的不同。gee和grosjean模式和我们目前的算法中最重要的区别是涉及边界的决定因素权重。gee和grosjean假设这个比重仅取决于句法节点的数量,其数量左到右顺序,在动词短语组成的长度的例子。相比之下,我们的数据与selkirk(1984)的理论分析一致,表明边界的力量是依赖于语法功能,在一个给定的句子成分的发挥。特别是,我们观察这些功能之间的边界方面的强度,就像如下讨论。我们的附加规则从大部分的selkirk的算法中推导出了。我们也取得了gee和grosjean(1983)从selkirk的工作采取的大部分思想,某些句法头划出语音短语边界,并提供更高层次的分析。我们的韵律运行规则使用四个独立的阶段.每个阶段是建立在之前的阶段,这样的规则可以参考语法和韵律结构,因为先后建立更高层次的韵律结构。结论我们描述了一个在线实验系统,该系统采用韵律规则由成分结构、语法功能、韵律和长度得到韵律应用。该系统包含三个模块:一个确定性的解析器,短语的韵律规则,和一个转换短语的韵律规则的输出到贝尔实验室文本语音转换系统的算法。基于基元选择的语音合成方法中普通话文语转换1、 介绍文语转换系统是一个可以自由转换文本文件到音频文件的系统。这是一个把文本文件读出来给人听的过程。对于文语转换系统,有着广范围的应用。一个典型的文语转换系统包含三个主要的部分:文本分析,韵律生成和语音合成。文本分析部分理解了每个文本并确定每个句子的声音;韵律合成部分产生控制语音变异的一些参数;语音合成部分根据发音和韵律的要求产生话语的表达。在过去的二十年,许多方法已被用于合成语音,主要途径可分为两个主要的类别,即以规则以基础的共振峰合成和串联合成。共振峰合成生成语音使用一套规则。这些规则经常是来自一个漫长的实验过程,这种方法需要小型计算机内存。但是语音质量受到了该方法本身的限制。然而,串联合成须使用一些预先录制的语音单位为模板。合成过程中,各单位通过使用信号处理技术被修改,然后联合在一起形成一段话语。这个方法通常需要更大的内存。但是语音质量也相对应地更好了。然而,随着科技的发展,人并不满足于这样的通过使用信号方法产生的语音话语机。正常连接合成的工作原理是保持一个小单位的库存在系统。合成过程中一个单位被选中,然后根据韵律特征修改使用信号处理技术。用该方法合成可生成具有较高的语音质量,但是,由于信号处理过程,合成语音或多或少扭曲。一个简单地产生好质量语音的方法是储存大量的人类发音的语音段在一个数据库里,当执行时,串联所有需要的语音段在一起不作任何修改。当然,选择的连接段时间越长,生成的讲话越自然。由于每个语音单位在不同情况下可能有很多变种或韵律情况下,这种方法需要一个大的内存来存储大量的语音段。因为几年前的计算能力和内存限制,该方法不实用。随着硬件的发展,大语料库语音合成用于直接连接使用单位是可能的。单位选择为基础的语音合成(或语料库为基础合成)已应用在英语及其他语言好几年。一些尝试(刘,王,1998年;楚等人,2001年;王等人,2000年,liet人,2001年)已使用中文tts的单位选择方式。吴等人 (2001)也提出了一个计划,选择发音,语言最佳单位,然后应用韵律修改。但是,所有提出的方法已在适当的韵律应用局限性。如果没有适当的韵律审议后,生成的语音质量,有时可能会很差。本文关注有关如何适用于一个单位选择基础的合成韵律。2基元选择模型一个基元选择模型具有良好的组织基元的数据库。该数据库包含了语音基元从一大主体,这是经过精心设计,有韵律的所有语音和覆盖面大变种各单位。在数据库中,每个基元有一个讲话可能变种的数量,这是适合出现在不同的语音和韵律环境。大语料进行了分析和离线所有的计算都储存在一个单位的数据库。在数据库中,每一个基元的实例所描述的特征向量。每个功能可能是离散或连续值。的特点包括单位本身和该单位的环境特点。本机的功能本身用于选择正确的单位,符合段的要求,而环境的特点是用于最好的选择内容相关的单位,这可能减少选择的单位之间的不连续性。主体为基础的合成实际上是一种串联模式匹配的过程。在合成,工作需要做的是选择最佳单位,发音和韵律的最佳匹配的目标单位。同时,选择的单位之间的不连续性,应尽可能小。为了满足这些要求,两种成本的界定应合成。一个是单位成本,介绍如何关闭选择的单位到所需的单位。另一种是连接的成本,它描述了连续性的程度单位之间的选择。总成本是两种成本的加权和。3 基元选择在语音合成过程中接受来自韵律生成零件信息,检索讲话单位数据库来为每一个适当的单位查找目标语音单位。该装置可以选择过程如图1所示,在图中,目标一句是“今天很热”,由4个音节组成。每个音节有一组候选单位。粗线厚边框显示选定的基元序列。在单位选择过程,为了获得最佳的讲话,我们要考虑(1)通过与目标单位的比较,候选单位是否适当,(2)被选择的单位之间链接的平滑。因此,选择过程是要找到一个在所有的最佳路径在连接晶格可能路径。搜索过程是按照一个成本函数,它描述对一个单位,两个单位之间的平滑度的适当程度。4 语料库正如我们前面提到的,一个大语料是用于基于合成的单位选择。该语料包含了大量收集的话语。合成的单位将被从语料中提取。尽可能多地覆盖上下文相关单位和韵律的变种是理想的。但是,建立一个非常大的语料,有一个完整的覆盖单位的变种,这通常是不可能的。由于建设有高品质的大型语料库的成本非常昂贵的,平衡是通常由覆盖面和规模之间衡量。在此研究中,我们建立了一个约38000音节语料。这语料的脚本是从一个大的文本语料库(约3亿个汉字)选择的。主体是设计来尽可能覆盖经常使用的独立音节和上下文相关的音节。我们使用北大人民日报的文本语料库,作为真正的word文本参考来评估脚的本主体。我们算出创建语料库覆盖的99.8%的音节出现在北大语料库。当单位上下文是由最初和最后一类分组(我们定义了11个声母类和10个韵母类)中,语料覆盖的76.8%的单位的类出现在北大文本语料库。有了这样的覆盖面,我们认为,对于基于合成的单位选择,语料库是合适的。外文翻译文献(英文) the contribution of parsing to prosodic phrasing in an experimental text-to-speech system introduction we describe an experimental text-to-speech system that uses a deterministic parser and prosody rules to generate phrase-level pitch and duration information for english input. this information is used to annotate the input sentence, which is then processed by the text-to-speech programs currently under development at bell labs. in constructing the system, our goal has been to test the hypotheses (i) that information available in the syntax tree. in particular. grammatical functions such as subject-predicate and head-complement, is bv itself useful in determining prosodic phrasing for svnthetic speech, and (ii) that it is possible to use a syntactic parser that specifies grammatical functions to determine prosodic phrasing for synthetic speech. although certain connections between syntax and prosody are well-known (e.g. the influence of part of speech on stress in words like progress, or the setting off of parenthetical expressions) very little practical knowledge is available on which aspects of syntax might be connected to prosodic phrasing. in many studies, investigators have sought connections between constituent structure and prosody (e.g. cooper and paccia-cooper 1980. umeda 1982. gee and grosjean 1983) but, with the exception of selkirk (1984). they tend to neglect the representation of grammatical functions in the svntax tree. moreover, previous work has not been specific enough to provide the basis for a full system implementation. based on our study of prosodic phrasing in recorded human speech, we decided to emphasize three aspects of structure that relate to phrasing: syntactic constituency, grammatical function, and constituent length. these findings. which we will discuss in detail, have been implemented as a collection of prosody rules in an experimental text-to-speech system. two important features characterize our system. first. the input to our prosody system is a parse tree generated by a version of the deterministtc parser fidditch (hindle 1983). the left-corner search strategy of this parser and, in particular, its determinism, give fidditch the speed that makes online text-to-speech production feasible. in building a parse tree, fldditch identifies the core subject-verb- object relations but makes no attempt to represent adjunct or modifier relations. thus relative clauses,adverbials, and other non-argument constituents have no specified position in the tree and no specified semantic role. second. the rules in the prosody system build a prosody tree by referring both to the syntactic structure and to earlier stages of prosodic structure. the result is a hierarchical representation that supports the view, also proposed in selkirk (1984) that grammatical function information is related to prosodic phrasin.g, but indirectly, through different levels of processing. informal tests of the system show that it is capable of producing a significant improvement in the prosodic quality of the resulting synthesized speech, our investigations of the systems problems, which we describe, have not revealed any serious counterexample to our basic approach. in many cases,it appears that problems with the current version can be resolved by taking our approach a step further, and including lexical information required by the parser as another factor in the determination of prosodic phrasing. text-to-speech most text-to-speech systems comprise two components: pronunciation rules and a speech synthesizer. pronunciation rules convert the input text into a phonetic transcription; this information mav also be supplemented by a dictionary that provides information about the part of speech, stress pattern and phonetic makeup of particular words. the speech synthesizer then converts this phonetic transcription into a series of speech parameters which are subsequently processed to produce digitized speech.while these systems tend to perform quite well on word pronunciation, they fall short when it comes to providing good prosody for complete sentences. current text-to-speech systems have no access to the syntactic and semantic properties of a sentence that influence phrase-level prosody. hence rules for sentence prosody, when they are provided at all typically depend on superficial aspects of text (e.g. punctuation) and on heuristics that vary widely in sophistication. although such techniques often add a more natural quality to the resulting synthetic speech, they can fail in important ways, for example, by ignoring the prosodic event between a lengthy subject and a predicate, so that there is no clear prosodic boundary between right and mark in the characters on the right mark the salient features. several authors (e.g. allen 1976; elovitz et al. 1976; luce et al. 1983) have suggested that prosodic differences between synthetic and natural speech are the primary, unaddressed factor leading to difficulties in the comprehension of fluent synthetic speech. the relation between phrase-level prosody and its sources, however, is so poorly understood that we have no good sense of the degree to which different levels of explanation-syntactic, semantic, or pragmatic-are applicable. we currently have reasonable tools for automatic syntactic analysis of a text. but there is nothing equivalently well-developed for semantic or pragmatic textual analysis. thus an obvious goal is to explore the extent to which phrase-level prosody can be explained by the syntax tree and develop a detailed description of that relation. a further goal is to convert the resulting insights about this relation into a system that can work with a speech synthesizer. this allows us to test our description more adequately and perhaps also produce something that will further text- to-speech technology. syntactic structure and prosodic phrasingbeyond the word level, however, there has been little investigation of systematic connections between syntactic structure and prosodic phrasing. the psycholinguistic and acoustic investigations of cooper and paccia-cooper (1980), umeda (1982) and gee and grosjean (1983)and the prosodic theory of selkirk (1984) are among the more notable studies and represent the two main approaches to syntax/prosody relations. in cooper and paccia-cooper (1980) and umeda (1982), the connection from syntax to prosodic phrasing is unmediated by any filtering process, i.e. they propose that the details of prosodic phrasing can be determined directly from syntactic structure by associating particular syntactic nodes (or constituent boundaries) with a phonetic value, either pausing, segmental lengthening, or the blocking of the cross- word conditioning of phonological rules. by contrast, gee and grosjean (1983) and selkirk (1984) believe that the syntax-prosody relation is indirect: prosodic phrasing is derived by rules that refer to left-to-right ordering, length (or branching patterns), and, in the case of selkirk grammatical function, as well as constituent membership in order to infer a hierarchical prosodic structure. but while their respective positions are quite clear, none of these studies is conclusive. all lack a syntactic framework sufficiently detailed and formalized to allow extensive testing, and most consider only a small number of sentences and sentence types.to develop our analysis, we first examined prosodic phrasing in the speech of one of us reading prose from various texts, including four instruction manuals. these texts were later augmented by a professional reading of a prose story. the boundaries between prosodic phrases were identified and then classed according to their syntactic context and semantic function. text-to-speech synthesisthe programs that make up the speech component are described in liberman and buchsbaum (personal communication). these programs take character text as input and produce digitized speech output. by annotating the input text to this system, many aspects of its operation can be overridden or modified: e.g. the location of major and minor phrase boundaries, the stress given to words, the transcription of words and the boundaries between them, the timing of segments, and details of the pitch contour. as we will show, with our prosody system we are able to produce strings in which four boundary levels are identified and perceptually distinguished, using the current text- to-speech system annotations. prosodic phrasing the prosody rules use information about constituent structure, grammatical role, and length to map a surface structure. the prosody tree identifies the location of phrase boundaries (signified by the nodes) and the relative strength of each boundary (signified by a number in the node). it is this information that is used to annotate the input text with escape sequences that provide the text-to- speech system with instructions about prosodic phrasing. in formulating our rules for building the prosodic structure, we began with the idea of simply implementing the model of gee and grosjean (1983). this model, initially proposed to predict a form of psychological data describing subjective sentence structure known as performance structure, determines prosodic boundaries from a syntactic tree, but assumes rather than explicitly presents a syntactic component.we were initially attracted to the gee and grosjean model because of its emphasis on relative boundary weighting, i.e., on the determination of the strength of a given boundary with respect to the other boundaries in the sentence. we found that in the data we had collected, this weighting played an important role. in fact, we incorporated directly into our system one method of doing this weighting, namely gee and grosjeans rule to determine the strengths of the prosodic phrase boundaries around a verb using relative length (as measured by terminal node count).the most important difference between the gee create an algorithm adequate for use in a general purpose system, our algorithm diverged from its starting point, reflecting our attempts to correct weaknesses and lacunae that we encountered in the gee and grosjean model. that we encountered these problems is not surprising given the difference between our goals and those of gee and grosjean. and grosjean model and our current algorithm involves the factors determining boundary weight. gee and grosjean assume that this weighting is dependent only on the number of syntactic nodes, their left-to-right ordering and, in the case of the verb phrase, on constituent length. in contrast, our data, in agreement with selkirks (1984) theoretical analysis, indicated that boundary strength is dependent on the grammatical functions that the constituents in a given sentence play. in particular, we observed a hierarchy among these functions with respect to boundary strength, as discussed below. our adjunction rules are derived for the most part from selkirks account. we have also made use of the idea, which gee and grosjean (1983) take largely from the work of selkirk, that certain syntactic heads mark off phonological phrase boundaries, and provide the basic prosodic constituents for higher level analysis. our prosody rules run in four independent stages. each stage builds on the previous stage, so that the rules can refer to both syntactic and prosodic structure as they build successively higher levels of prosodic structure.conclusions we have described an on-line experimental system that uses prosody rules to infer prosodic phrasing from constituent structure, grammatical functions, and length considerations. the system contains three modules: a deterministic parser, a set of prosodic phrasing rules, and an algorithm to convert the output of the prosodic phrasing rules into signals for the bell labs text-to-speech system.a unit selection-based speech synthesis approach for chinese mandarin text-to-speech1 introduction text-to-speech system is a system that converts free text into speech. this is a process that reads out the text for people. there is a wide range of applications for text-to-speech system. a typical text-to-speech system consists of thre
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 湖南吉利汽车职业技术学院《化工设备机械基础》2023-2024学年第二学期期末试卷
- 河南师范大学《二十世纪西方文学名著精读》2023-2024学年第二学期期末试卷
- 山东工程职业技术大学《外国舞蹈史》2023-2024学年第二学期期末试卷
- 古代建筑屋顶的材质
- 供应室敷料区概念
- 居民对预防接种、儿童保健服务满意度调查问卷
- 地下墙接头施工方案
- 广西壮族自治区柳州市2024-2025学年高一上学期期末考试数学试题(解析版)
- 广东庭院水景施工方案
- 电梯拉槽施工方案
- 农村宅基地买卖合同的标准版该如何写5篇
- 2025年安徽中医药高等专科学校单招职业适应性测试题库及参考答案
- 湖北省武汉市2024-2025学年高三2月调研考试英语试题含答案
- 2025年浙江省现场流行病学调查职业技能竞赛理论参考试指导题库(含答案)
- GB/T 45222-2025食品安全事故应急演练要求
- 深静脉的穿刺术课件
- 2025届高考英语二轮复习备考策略课件
- 医学课件-儿童2型糖尿病诊治指南(2025)解读
- 《结构平法与钢筋算量》课件-梁平法施工图识读
- 山东大学外科学历年试题要点【表格版】
- 2025年南京机电职业技术学院高职单招数学历年(2016-2024)频考点试题含答案解析
评论
0/150
提交评论