




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、Chapter 10HashingADT of Dictionary const int DefaultSize = 26;templateclass Dictionarypublic: Dictionary ( int size = DefaultSize ); int IsIn ( Name name ); Attribute *Find ( Name name ); void Insert ( Name name, Attribute attr ); void Remove ( Name name ); Hash tableSupport the following operations
2、Find InsertDelete. (deletions may be unnecessary in some applications)Unlike binary search tree, AVL tree and B+-tree, the following functions cannot be done:Minimum and maximumSuccessor and predecessorReport data within a given rangeList out the data in orderUnrealistic solution Each position (slot
3、) corresponds to a key in the universe of keysTk corresponds to an element with key kIf the set contains no element with key k, then Tk=NULLUnrealistic solutioninsert, delete and find all take O(1) (worst-case) timeProblem:The scheme wastes too much space if the universe is too large compared with t
4、he actual number of elements to be stored. E.g. student IDs are 8-digit integers, so the universe size is 108, but we only have about 4000 students in Software College.HashingUsually, m m, where m is the size of the hash tableDesign a good hash functionthat is fast to compute and can minimize the nu
5、mber of collisionsDesign a method to resolve the collisions when they occurHash FunctionThe division method h(k) = k mod me.g. m=12, k=100, h(k)=4 Requires only a single division operation (quite fast)Certain values of m should be avoidede.g. if m=2p, then h(k) is just the p lowest-order bits of k;
6、the hash function does not depend on all the bitsSimilarly, if the keys are decimal numbers, should not set m to be a power of 10Hash FunctionIts a good practice to set the table size m to be a prime numberGood values for m: primes not too close to exact powers of 2e.g. the hash table is to hold 200
7、0 numbers, and we dont mind an average of 3 numbers being hashed to the same entrychoose m=701Hash Function.Can the keys be strings?Most hash functions assume that the keys are natural numbersif keys are not natural numbers, a way must be found to interpret them as natural numbersHash Function.Metho
8、d 1Add up the ASCII values of the characters in the stringProblems:Different permutations of the same set of characters would have the same hash valueIf the table size is large, the keys are not distribute well. e.g. Suppose m=10007 and all the keys are eight or fewer characters long. Since ASCII va
9、lue a reasonably equitable distributionProblemEnglish is not randomOnly 28 percent of the table can actually be hashed to (assuming a table size of 10,007)Method 3computesinvolves all characters in the key and be expected to distribute wellCollision Handling: (1) Separate ChainingInstead of a hash t
10、able, we use a table of linked listkeep a linked list of keys that hash to the same valueh(K) = K mod 10Separate ChainingTo insert a key KCompute h(K) to determine which list to traverse If Th(K) contains a null pointer, initiatize this entry to point to a linked list that contains K alone. If Th(K)
11、 is a non-empty list, we add K at the beginning of this list.To delete a key Kcompute h(K), then search for K within the list at Th(K). Delete K if it is found.Separate Chaining Assume that we will be storing n keys. Then we should make m the next larger prime number. If the hash function works well
12、, the number of keys in each linked list will be a small constant.Therefore, we expect that each search, insertion, and deletion can be done in constant time.Disadvantage: Memory allocation in linked list manipulation will slow down the program. Advantage: deletion is easy.Collision Handling:(2) Ope
13、n AddressingOpen addressing: relocate the key K to be inserted if it collides with an existing key. That is, we store K at an entry different from Th(K). Two issues arisewhat is the relocation scheme?how to search for K later?Three common methods for resolving a collision in open addressingLinear pr
14、obingQuadratic probingDouble hashingOpen AddressingTo insert a key K, compute h0(K). If Th0(K) is empty, insert it there. If collision occurs, probe alternative cell h1(K), h2(K), . until an empty cell is found.hi(K) = (hash(K) + f(i) mod m, with f(0) = 0f: collision resolution strategyLinear Probin
15、gf(i) =icells are probed sequentially (with wraparound) hi(K) = (hash(K) + i) mod mInsertion:Let K be the new key to be inserted. We compute hash(K)For i = 0 to m-1compute L = ( hash(K) + I ) mod mTL is empty, then we put K there and stop. If we cannot find an empty entry to put K, it means that the
16、 table is full and we should report an error. Linear Probinghi(K) = (hash(K) + i) mod mE.g, inserting keys 89, 18, 49, 58, 69 with hash(K)=K mod 10To insert 58, probe T8, T9, T0, T1To insert 69, probe T9, T0, T1, T2 Primary ClusteringWe call a block of contiguously occupied table entries a clusterOn
17、 the average, when we insert a new key K, we may hit the middle of a cluster. Therefore, the time to insert K would be proportional to half the size of a cluster. That is, the larger the cluster, the slower the performance. Primary ClusteringLinear probing has the following disadvantages:Once h(K) f
18、alls into a cluster, this cluster will definitely grow in size by one. Thus, this may worsen the performance of insertion in the future.If two cluster are only separated by one entry, then inserting one key into a cluster can merge the two clusters together. Thus, the cluster size can increase drast
19、ically by a single insertion. This means that the performance of insertion can deteriorate drastically after a single insertion.Large clusters are easy targets for collisions.Quadratic Probingf(i) = i2hi(K) = ( hash(K) + i2 ) mod mE.g., inserting keys 89, 18, 49, 58, 69 with hash(K) = K mod 10To ins
20、ert 58, probe T8, T9, T(8+4) mod 10To insert 69, probe T9, T(9+1) mod 10, T(9+4) mod 10Quadratic ProbingTwo keys with different home positions will have different probe sequencese.g. m=101, h(k1)=30, h(k2)=29probe sequence for k1: 30,30+1, 30+4, 30+9probe sequence for k2: 29, 29+1, 29+4, 29+9If the
21、table size is prime, then a new key can always be inserted if the table is at least half empty Quadratic ProbingSecondary clusteringKeys that hash to the same home position will probe the same alternative cellsTo avoid secondary clustering, the probe sequence need to be a function of the original ke
22、y value, not the home positionDouble HashingTo alleviate the problem of clustering, the sequence of probes for a key should be independent of its primary position = use two hash functions: hash() and hash2()f(i) = i + hash2(K)E.g. hash2(K) = R - (K mod R), with R is a prime smaller than mDouble Hash
23、inghi(K) = ( hash(K) + f(i) ) mod m; hash(K) = K mod mf(i) = i + hash2(K); hash2(K) = R - (K mod R),Example: m=10, R = 7 and insert keys 89, 18, 49, 58, 69To insert 49, hash2(49)=7, 2nd probe is T(9+7) mod 10To insert 58, hash2(58)=5, 2nd probe is T(8+5) mod 10To insert 69, hash2(69)=1, 2nd probe is
24、 T(9+1) mod 10Choice of hash2()Hash2() must never evaluate to zeroFor any key K, hash2(K) must be relatively prime to the table size m. Otherwise, we will only be able to examine a fraction of the table entries. E.g.,if hash(K) = 0 and hash2(K) = m/2, then we can only examine the entries T0, Tm/2, a
25、nd nothing else!Choice of hash2()One solution is to make m prime, and choose R to be a prime smaller than m, and set hash2(K) = R (K mod R)Quadratic probing, however, does not require the use of a second hash function likely to be simpler and faster in practice例: 给定关键字序列为 19,14,23,1,68,20,84,27,55,1
26、1,10,79 散列函数:H(key)=key%13 散列表空间地址为012, 试用线性探查法建立散列表0123456789101112191423111682084848484272727272755111079存储的是数据元素1 891011765312421014 27 55 68 84 19 20 10 23 11 Deletion in open addressingActual deletion cannot be performed in open addressing hash tablesotherwise this will isolate records further
27、down the probe sequenceSolution: Add an extra bit to each table entry, and mark a deleted slot by storing a special value DELETED (tombstone)class HashTable /用线性探查法处理溢出时散列表类的定义public: enum KindOfEntry Active, Empty, Deleted ; HashTable ( ) : TableSize ( DefaultSize ) AllocateHt ( ); CurrentSize = 0;
28、 HashTable ( ) delete ht; const HashTable & operator = ( const HashTable & ht2 ); int Find ( const Type & x ); int Insert ( const Type & x ); int Remove ( const Type & x ); int IsIn ( const Type & x ) return ( i = Find (x) ) = 0 ? 1 : 0; void MakeEmpty ( );private: struct HashEntry Type Element; Kin
29、dOfEntry info; int operator= ( HashEntry &, HashEntry & ); int operator!= ( HashEntry &, HashEntry & ); HashEntry ( ) : info (Empty ) HashEntry (const Type & E, KindOfEntry i = Empty ) : Element (E), info (i) ; enum DefaultSize = 11 HashEntry *ht; int CurrentSize, TableSize; void AllocateHt ( ) ht =
30、 new HashEntryTableSize ; int FindPos ( const Type & x ) const; template int HashTable:Find ( const Type & x ) /线性探查法的搜索算法,函数返回找到位置。/若返回负数可能是空位,若为-TableSize则失败。 int i = FindPos ( x ), j = i; while ( != Empty & htj.Element != x ) j = ( j + 1 ) % TableSize; if ( j = i ) return -TableSize; if ( = Active ) return j; else -j;template void HashTabe : MakeEmpty ( ) /置表中所有表项为空 for ( int i = 0; i TableSize; i+) = Empty; CurrentSize = 0;template const HashTable & HashTable :operator = ( const HashTab
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 煤矿下料口管理办法
- CQI审核管理办法
- 临床质控员管理办法
- 煤矿灾害防治与重大事故处理课件
- 福建省薪酬管理办法
- 中国洗染业管理办法
- 物流分包方管理办法
- 甲方与结算管理办法
- 陕西浮桥管理办法
- j酒店车辆管理办法
- 招商银行智慧营销体系规划方案(2022年-2023年)
- 2023年中国(浦东)知识产权保护中心专利预审员招聘笔试参考题库附带答案详解
- 勘界定标技术报告
- von frey丝K值表完整版
- 危险性较大的分部分项工程施工前安全生产条件核查表
- GB/T 5696-2006预应力混凝土管
- GB/T 528-2009硫化橡胶或热塑性橡胶拉伸应力应变性能的测定
- GB/T 3299-1996日用陶瓷器吸水率测定方法
- 精轧机组机械设备使用说明书
- 2022年机械制图期末试卷及答案
- 设备维护保养制度-设备维护保养制度规定
评论
0/150
提交评论