




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、转 一种新的累积频率表的数据结构转一种新的累积频率表的数据结构2010-03-30 14:23A New Data Structure for Cumulative Frequency Tables一种新的累积频率表的数据结构PETER M.FENWICK Department of Computer Science,University of Auckland,Private Bag 92019,Auckland,New Zealand(email:p_fenwickcs.auckland.ac.nz)SUMMARY概要A new method(thebinary indexed tree)
2、is presented for maintaining the cumulative frequencies which are needed to support dynamic arithmetic data compression.It is based on adecomposition of the cumulative frequencies into portions which parallel the binary representation of the index of the table element(or symbol).The operations to tr
3、averse the data structure are based on the binary coding of the index.In comparison with previous methods,the binary indexed tree is faster,using more compact data and simpler code.The access time for all operations is either constant or proportional to the logarithm of the table size.In conjunction
4、 with the compact data structure,this makes the new method particularly suitable for large symbol alphabets.一种新的方法(二进制索引树)是提出了维持所需要的支持动态数据压缩算法的累积频率。它是根据累积频率部分分解成平行的表元素(或符号)索引的二进制表示。遍历的业务数据结构上的索引的二进制编码的。与以前的方法相比,二进制索引树的速度更快,使用更紧凑的数据和简单的代码。对所有业务的访问时间要么是不变,或成比例表的大小数。在与数据结构紧凑的结合,这使得新方法,特别是适合大字母符号。KEY WO
5、RDS:Binary indexed tree Arithmetic coding Cumulative frequencies关键词:二进制索引树,算术编码,累积频率INTRODUCTION引言A major cost in adaptive arithmetic data compression is the maintenance of the table of cumulative frequencies which is needed in reducing the range for successive symbols.Witten,Neal and Cleary1ease th
6、e problem by providing amove-to-front mapping of the symbols which ensures that the most frequent symbols are kept near the front of the search space.It works well for highly skewed alphabets(which may be expected to compress well)but is much less efficient for more uniform distributions of symbol f
7、requency.Moffat2describes atree structure(actually aheap)which provides alinear-time access to all symbols.Jones3uses splay trees to provide an optimized data structure for handling the frequency tables.The three techniques will be referred to in this paper as MTF,HEAP and SPLAY,respectively.In all
8、cases they attempt to keep frequently used symbols in quickly-referenced positio ns within the data structure,but at the cost of sometimes extensive data reorganization.一个自适应算术数据压缩成本是主要的是在减少连续符号的范围内所需的累积频率表维护。威滕,奥尼尔和克利1通过提供方便,此举对前面的符号映射,保证了最常见的符号附近搜索空间方面保持了问题。它可以很好地用于高度倾斜字母(可将压缩好),但要少得多的象征频率更有效地均匀分布
9、。莫法特2描述了一个树状结构(实际上是堆)提供了一个线性时间访问所有符号。琼斯3利用决口扇树提供一个处理的频率表优化的数据结构。这三个技术将被提到一个的MTF,堆和伸展,分别纸。在所有情况下,他们试图保持快速内引用数据结构经常使用的符号位置,但在大量的数据,有时重组成本。This current paper describes anew method which uses only asingle array to store the frequencies,but stores them in acarefully chosen pattern to suit anovel search t
10、echnique whose cost is proportional to the number of 1bits in the element index.This cost applies to both updating and interrogating the table.In comparison with the other methods it is simple,compact and fast and involves no reorganization or movement of the data.此电流介绍了一种新的方法,只使用一个单一的存储阵列的频率,但在精心选择
11、的模式,并将它们存储来适应新的搜索技术,其成本是成正比1元素的索引位数。这笔费用适用于更新和审讯表。在与其他方法相比,这很简单,紧凑和快速,不涉及重组或移动数据。PRINCIPLES原理The basic idea is that,just as an integer is the sum of appropriate powers of two,so can acumulative frequency be represented as the appropriate sum of sets of cumulativesubfrequencies.Thus,if the index cont
12、ains a2 bitwe include two frequencies,if it has an8 bitwe include 8frequencies,and so on.Figure 1shows atable of size 16.基本想法是,正如一个整数,是两个适当的权力,总之,这样才能累积频率被看作是集合适当数额的代表累积次谐波频率。因此,如果该指数包含2位,我们有两个频率,如果它有一个8位,我们包括8个频率,等等。图1显示了一个大小为16表。The first row is simply the index.The second shows the contents of th
13、at entry of the table;for example,element 4contains the sum of frequencies 1to 4inclusive,and element 6has the sum of frequency 5and frequency 6.The final three rows show an actual example,with the individual frequencies,the true cumulative frequencies and the values stored in the table.In the follo
14、wing discussion,the stored values and item frequencies will be regarded as two arrays,V and F,respectively.第一行是简单的指数。第二表明该表项的内容,例如,元素4载频1款项4包容性,元素六有5的频率和频率6总和。最后的3行显示的频率与个人的实际例子,真正的累积频率和存储的值表。在随后的讨论中,存储的价值和项目的频率,将被视为两个数组,第五和F分别。The fundamental operation involves calculating anew index by stripping t
15、he least significant 1from the old index,and repeating this operation until the index is zero.For an initial index of 11 this process yields the sequence 11,10,8,0.To read the cumulative frequency for element 11,we form the sum V11+V10+V8+V0.Referring back to the second row of the table,we see that
16、this sequence corresponds to the frequencies F11+F9.10+F1.8+F0,where F1.8means F1+.+F8.The final value is thus F0.11,which is the desired result.基本的运作涉及计算剥夺一个新的指数最重要的一旧指数,反复直到该指数为零此操作。对于11初始索引这个过程产生的序列11,10,8,0。要阅读的要素11累积频率,我们形成了一笔V11+V10+V8+V0。转回表的第二行,我们看到,这个序列对应的频率F11+F9.10+F1.8+F0,其中F1.8指F1+.+F8。
17、最后的价值,因而F0.11,这是期望的结果。The indexing method generates atree within the table of partial frequencies,with the structure shown in Figure 2.Each bar represents the range of frequencies held in the array element corresponding to its topmost position(the shaded rectangles).It is clear that traversing the t
18、ree from any node to the root will accumulate all of the necessary frequencies.方法生成的索引内的部分频率表一树,在图2所示的结构。每个酒吧代表了数组元素相应的最上面位置(阴影矩形)频率范围内举行。很明显,任何节点遍历树的根会积累必要的频道。Alternatively,we can draw the tree in amore conventional form.The table is,in effect,two different trees superimposed on the same table and
19、differentiated by their access algorithms.Theinterrogation tree(to read acumulative frequency)is adecidedly unbalanced tree,as shown in Figure 3.Theupdate treewill be described later.或者,我们可以得出一个更常规的形式树。该表实际上是两种不同的树木叠加在同一个表,并通过他们获得不同的算法。在审讯树(阅读累积频率)是一个决定性不平衡树,如图3所示。在更新树会在后面介绍。The branching ratio of e
20、ach node is the number of trailing zeros in its binary representation(each child is formed by converting one of the trailing zeros to aone).The depth at each node is the Hamming weight of its binary index.It is unusual in that its power derives,not from its structure or shape,but from the indexing a
21、lgorithm.In recognition of the close relationship between the tree traversal algorithms and the binary representation of an element index,the namebinary indexed treeis proposed for the new structure.每个节点的分支比,是落后于它的二进制表示零(每个孩子的数量由转换的尾随零一至一个建制)。在每个节点的深度是它的二进制指数汉明重量。它的特别之处在于它的权力来自,而不是它的结构或形状,但是从索引算法。在树
22、之间的密切关系遍历算法和元素的索引,名称二进制表示二进制索引树,是为新的结构提出的承认。Figure 1.Example of the table图1、例如表Figure 2.The tree of partial frequencies图2、频率的部分树Figure 3.The interrogation tree图3、在审讯树OPERATIONS AND CODE TO HANDLE THE STRUCTURE作业和代码来处理结构We need the following functions when processing asymbol in conjunction with arith
23、metic coding.In all cases theindexis synonymous with the coding symbol.我们需要以下功能在处理与算术编码结合的象征。在所有情况下,索引,就是编码的象征。1.Read the cumulative frequency for an index.2.Update the table according anew frequency at agiven position.3.Read the actual frequency at aposition.4.Find the symbol position within which
24、agiven frequency lies.5.Scaling the entire tree by aconstant factor(usually halving all counts).1、读取索引的累积频率。2、更新表根据一个给定位置新的频率。3、阅读能力的实际频率。4、找到符号位置内给定频率所在。5、缩放一个常数因子整个树(通常是所有罪名减半)。In all of these functions we need efficient ways of isolating and manipulating the least significant 1bit of anumber.Isol
25、ating the bit is most easily done(for atwos complement number)by considering the operation of complementing apositive number.The serial complementing algorithm examines the bits in order from the right(least significant bit),copying all of the least significant zeros and the rightmost 1and then comp
26、lementing each bit to the left.Thus taking the logical AND of anumber and its twos complement isolates the least significant one bit;the bit is present in both values,to its right both values are all-zero,and to its left one or other of the numbers always has azero.For example,20 is represented to 8
27、bits as 00010100 with atwos complement of 11101100;ANDing the two values gives the result 00000100.With the function BitAnd,as used in many dialects of Pascal,we have that LSOne:=BitAnd(ix,-ix).在这些功能,我们需要孤立和操作最重要的1位数字的有效方法。隔离位是最容易做(两年补数目考虑补充正数操作)。串行算法补充检查位在右(最低有效位)的顺序,复制所有的最重要的零和最右边的1,然后互补位到左边。因此,到逻
28、辑与数字的和补分离最重要的一个位,该位目前在这两个值,其权利两个值都是零,和它的左边的一个或数其他人都会有一个为零。例如,20是代表与一两个的8位为00010100补充11101100;安定两个值给出结果00000100。随着功能BitAnd,正如在许多方言Pascal使用,我们有LSOne:=BitAnd(ix,-ix)。From the above discussion we see that the assignment ix:=ix-BitAnd(ix,-ix)will strip off the least significant one bit of abinary number.
29、A slightly simpler realization of the function is ix:=BitAnd(ix,ix-1).The discussion is similar to the above,noting that(ix-1)replaces atrailing.10000.by.01111.,leaving unchanged the bits to the left of the rightmost 1.从上面的讨论,我们看到,转让ix:=ix-BitAnd(ix,-ix)将脱掉最重要的一个二进制数位。函数的一个稍微简单的实现是ix:=BitAnd(ix,ix-1
30、)。讨论类似上述情况,指出,(ix-1)取代了落后.10000.通过.01111.,让不变的最右边1左位。Both of the operations(extracting the bit and removing the bit)are simple and can be done with negligible overhead on most computers.这些行动都(提取位和消除位)很简单,可与大多数计算机上完成的开销可以忽略不计。Although these techniques were developed for twos complement binary numbers
31、,it is worth noting that stripping the least significant one with ix:=BitAnd(ix,ix-1)will work equally well with oness complement or signed magnitude binary arithmetic,as long as the initial value of ix is strictly positive,as is assumed throughout this paper.Starting with this as abasis,the operati
32、on LSOne:=ix-BitAnd(ix,ix-1)will extract the least significant one from anumber for all three representations.An alternative method is LSOne:=BitAnd(ix,-ix),where is apower of 2greater than the table size.虽然这些技术是开发补二进制数,但值得注意的是,剥离与ix:=BitAnd(ix,ix-1)不重要将与那些补码或二进制算术签署的规模,只要同样出色的初始值的ix是严格正的,因为在整个本文假设。
33、这起为基础,操作LSOne:=ix-BitAnd(ix,ix-1)将提取的所有三个意见书至少重要的一年。另一种方法是LSOne:=BitAnd(ix,-ix),其中正为2比表的大小更大的权力。THE CUMULATIVE FREQUENCY累积频率A Pascal function to read the cumulative frequency is shown in Figure 4.For this and the following examples,the array Tree contains the appropriate subfrequencies.The number of
34、 iterations is clearly just the number of 1bits in the desired index.一个Pascal函数来读取累积频率,如图4所示。对于这一点,下面的例子,该数组树包含适当的次谐波频率。的迭代次数显然只是一所期望的指数位数。As asimple indication of the cost of reading avalue from the table,we can count the number of memory accesses into the data table.For atable of entries,this is c
35、learly 1+N/2 on average.(Unless otherwise stated we will assume auniform frequency distribution.)Note that this is an average value only.The combination of an irregular symbol distribution and the non-uniform access costs of the binary indexed tree can lead to some variations for real symbol alphabe
36、ts.作为阅读从表中的值成本简单说明,我们可以计算到数据表中的内存访问次数。如需为项表,这显然是1+N/2的平均。(除非另有说明,我们将承担统一的频率分布。)请注意,这是一个平均的价值。一个不规则的象征分配和非结合的二进制索引树的统一接入的费用可能会导致真正的符号字母的一些变化。function GetCumul(Ix:integer):integer;read cumulative valuevar Sum:integer;begin Sum:=Tree0;initial valuewhile Ix 0do begin Sum:=Sum+TreeIx;include this valueIx
37、:=BitAnd(Ix,Ix-1);remove Least Sig oneend;GetCumul:=Sum;end;Figure 4.The GetCumul function图4、函数GetCumul procedure PutValue(Val,Ix:integer);begin repeat TreeIx:=TreeIx+Val;Ix:=Ix+BitAnd(Ix,-Ix);add least sig oneuntil Ix=TableSz;eend;Figure 5.Updating the table图5、更新表UPDATING THE TABLE更新表In reading ava
38、lue we strip off 1bits and move back towards the start of the table.In updating the table we must increment all subfrequencies above the position being incremented(see Figure 5).Referring to Figure 1,an adjustment to element 9must be accompanied by adjustments to elements 10 and 12(those whose range
39、s cover 9).From 9we step to 10(add 1)and then to 12(add 2).Instead of stripping off the least significant 1bit(i.e.subtracting),we now add it on at each stage to get the next entry to adjust.在阅读的价值,我们带了1位,并朝着表的开始回来。在更新表中,我们必须增加上述被递增(见图5)立场所有次谐波频率。参考图1,1至9项调整的同时,必须调整元素10和12(那些范围覆盖9)。从9我们的步骤,10(加1),然后
40、到12(加2)。剥离而不是最不显着1位(即扣除)外,我们现在在每个阶段,到下一个条目得到调整。In the example of Figure 1,if we wish to adjust position 5,the successive indices are 5,then 6(5+1),and finally 8(6+2).These three changes affect all of the cumulative frequencies from position 5up.在图1的示例中,如果我们要调整的位置5,指数连续第5,那么6(5+1),最后8(6+2)。这三个变化影响从位置
41、5时起所有的累积频率。The tree for updating is essentially the mirror-image of the interrogation tree,with each element resembling its 16-complement in the earlier one.It is shown in Figure 6.Each parent still has a1 with trailing zeros,but the child indices are formed by successively replacing the first 1,2,3
42、.of those zeros by ones.The interrogation tree has element 0at its root;the update tree has an implicit element 16 at the root and element 0stands apart as aspecial case.(Element 16,or its equivalent,is used as the END symbol in Wittens implementation of arithmetic coding and is then avalid part of
43、the table.)树的更新基本上是镜子的审讯树形象每个类似的16元,补中较早的一个。这是如图6所示。每个家长仍与尾部零一,但儿童指数,先后成立了第一个取代1,2,3.受的那些零。审讯树元素从根本上0;更新树有一个隐含的元素在根元素0代表除了作为一种特殊情况16。(元素16,或同等学历,用作在维滕的算术编码符号,执行完,然后一表的有效组成部分。)Figure 6.The updating tree图6、树的更新The cost in table references is most easily found by noting that the interrogation and updat
44、ing trees are mirror-images of each other.The cost of adding afrequency into the table is therefore still references to N/2 elements,or Nreferences with separate reads and writes.Once again,we can expect real alphabets to have some deviations from this average.在表引用的成本是最容易被发现时指出,审讯和更新树是一面镜子,彼此的图像。在表中
45、加入一个频率的费用,因此仍对引用N/2的元素,或有独立的读取和写入N参考。再次,我们可以期待真正字母,从这个平均一些偏差。READING ASINGLE FREQUENCY阅读单一频率We read asingle frequency by taking the difference of two adjacent cumulative frequencies.If the paths from the two nodes to the root have some part in common the shared parts will cancel and we need evaluate
46、 the paths only as far as the junction.Detection of the common path is facilitated by an interesting property of the binary indexed tree.We consider anode,its predecessor and its parent,with indices,andrespectively and=+1.Then(being aparent)is some bit pattern followed by one or more zeros,say x0000
47、.We obtainfromby changing one zero to a1,say x0100.This gives=x0011 which clearly hasas an ancestor because removing trailing ones fromwill eventually yield.Ifis odd then=and the parent and predecessor coincide.In general:The parent of any node is either an ancestor of the predecessor of that node,o
48、r is the predecessor itself.我们阅读到的两个相邻的累积频率的差异单一频率。如果从两个节点到根的路径有一些共同的共享部分部分将取消,我们需要评估的路径只到路口。常见的路径检测是促进了一个二进制索引树有趣的特性。我们认为一个节点,其前身及其母公司,与指数,和分别=+1。然后,(即父母)的位模式是一些由一个或多个零,其次说x0000。我们取得从更改了一个零到一,说x0100。这使=x0011,显然已成为一个祖先因为摘除尾随的最终收益率。如果是奇数则=和家长和前身是一致的。一般来说:任何节点的父节点可以是一个该节点的前任,祖先或本身是前身。For element iwe t
49、hen read the value at node i,obtain the parent to node iand then trace back from node i-1 to the parent of node i,subtracting off the values traversed.Code is shown in Figure 7.对于元素i那么,我们在读节点i的值,获取父母节点i,然后跟踪回从节点i-1到节点i的父,减去关闭值走过。代码显示在图7。The cost is one plus the number of trailing zeros in the index.
50、Half the time(with an odd initial index)it is necessary to read only asingle value from the table,for one quarter of the time(indices 2,6,10,12,.)it is necessary to read two values,in one of eight cases to read three values,and so on.Each term has the form and the series has asum of 2,which value ma
51、y be taken as areasonable approximation of the cost in most cases.For a256-element table the sum is actually 1.93.费用是加上尾随在索引中零的数字。有一半的时间(以一个奇怪的初始索引)有必要只读表从单一价值,其中四分之一的时间(指数2,6,10,12,.)有必要读两个值,在8起案件一读三个值,等等。每学期的形式为和系列有2总和,该值可能被视作费用在大多数情况下合理的近似。对于一个256元素表的数额实际上是1.93。function GetProb(Ix:integer):intege
52、r;read individual frequencyvar Val,Parent:integer;begin Val:=TreeIx;get the current probif Ix 0thenIx=0 is aspecial casebegin Parent:=BitAnd(Ix,Ix-1);get the parent nodeIx:=Ix-1;the previous nodewhile Parent Ix dorepeat until at prev parentbegin Val:=Val-TreeIx;adjust for traversed valueIx:=BitAnd(I
53、x,Ix-1);get this parentend;end;GetProb:=Val;end;Figure 7.Finding asingle frequency图7、寻找一个单频FINDING AN ELEMENT CORRESPONDING TO AFREQUENCY找到一个元素对应频率The last operation is that of finding the element corresponding to agiven cumulative frequency.This action is performed by the modified binary search sho
54、wn in Figure 8.最后一次操作是,找到相应的元素给定的累积频率。这一行动,是由改性二元图8所示的搜索。It is called with the test value and amask which initially locates the midpoint of the table.(With the 16-element table of the examples here,the initial value of Mask would be 8.)At each stage Index defines the base of the area still to be sea
55、rched.The midpoint is probed and,if the value is above the midpoint,the value is subtracted off the desired frequency and the midpoint becomes the new Index value(or base of the search area).Finally,the Mask value is halved to search at afiner resolution.这就是所谓的测试值和面罩最初位于表的中点。(与16元素表的例子在这里,在面具的初始值将是8
56、)。指数在每个阶段确定了该地区的基地仍进行搜索。中点的探讨,如果该值高于中点,值减去关闭所需频率的中点将成为新的指数值(或搜索领域的基础)。最后,掩码值减半寻找在一个较好的解决。The program of Figure 8fails if the true frequency of the element is zero.This is not aproblem with the arithmetic coding algorithm of Witten et al.which requires non-zero frequencies.There seems to be no effici
57、ent programming solution to this problem,but asimple detour is to assume aconstant base frequency for all values,adjusting the cumulative or real frequencies as they are read.在图8程序失败如果元素的真实频率是零。这不是与编码算法威滕等算术问题。这需要非零频率。似乎没有有效的方案解决这个问题,但绕道是一个简单的假设为基础,所有值恒定频率,调整的累积或真正的,因为他们看频率。The average cost in table
58、 references is an initial test of the zero element,followed by aprobe for each bit of the index and a50 per cent probability of having to read the value to revise the frequency(although this last reference may disappear with compiler optimization).The cost in table references,for aentry table,is then either 1+N,or 1+3N/2.The cost is again logarithmic in the table size.在表引用的平均成本是零元素的初始测试中,每个索引的位探头跟踪和百分之五十不得不读值调整的频率可能性(尽管这最后一个引用的编译器可能会消失优化)。在表参考价格为项表,然后是1+N,或1+3N/2。费用又是对数表的大小。function getIndex(Prob,Mask:integer):integer var In
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2024年春七年级语文下册 第4单元 13 叶圣陶先生二三事教学实录 新人教版
- 6、有余数的除法(教学设计) -2023-2024学年二年级下册数学人教版
- 10 青山处处埋忠骨 教学设计-2023-2024学年语文五年级下册统编版
- 2024年六年级品社下册《第四单元 放飞和平鸽》教学实录 未来版
- 某学院景观工程施工组织设计方案
- 2023七年级数学下册 第一章 整式的乘除5 平方差公式第1课时 平方差公式的认识教学实录 (新版)北师大版
- 2024-2025学年高中化学 第1章 第3节 有机化合物的命名教学实录 新人教版选修5
- 2024年学年九年级语文上册 第六单元 世间百态 第25课《竞选州长》教学实录2 沪教版五四制
- 2024年学年七年级地理下册 第六章 认识大洲 第一节 亚洲及欧洲教学实录 (新版)湘教版
- 8 这些东西哪里来 第三课时 教学设计-2023-2024学年道德与法治四年级下册统编版
- 安徽2025年安徽医科大学第一附属医院临床医技护理管理岗位招聘156人笔试历年参考题库附带答案详解
- 传染病习题库与参考答案
- 旅游景区股份合作开发协议书范本
- 2025年湖南信息职业技术学院单招职业技能测试题库参考答案
- 学情分析方案及学情分析报告范文
- 《CRISPR-Cas9及基因技术》课件
- 【博观研究院】2025年跨境进口保健品市场分析报告
- 游戏直播平台推广合作协议
- 《高科技服装与面料》课件
- 2025中国船舶集团限公司招聘高频重点模拟试卷提升(共500题附带答案详解)
- 土壤侵蚀与碳汇-深度研究
评论
0/150
提交评论