




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、searchgiven: distinct keys k1, k2, , kn and collection t of n records of the form(k1, i1), (k2, i2), , (kn, in)where ij is the information associated with key kj for 1 = j = n.search problem: for key value k, locate the record (kj, ij) in t such that kj = k.searching is a systematic method for locat
2、ing the record(s) with key value kj = k.successful vs. unsuccessfula successful search is one in which a record with key kj = k is found.an unsuccessful search is one in which no record with kj = k is found (and presumably no such record exists). approaches to search1. sequential and list methods (l
3、ists, tables, arrays).2. direct access by key value (hashing)3. tree indexing methods.searching ordered arrayssequential searchbinary searchdictionary searchlists ordered by frequencyorder lists by (expected) frequency of occurrence.perform sequential searchcost to access first record: 1cost to acce
4、ss second record: 2expected search cost:.2121nnnpppcexamples(1)(1) all records have equal frequency.2/ ) 1(/1nnicninexamples(2)(2) exponential frequencyninipniiif2/111if2/11niinic1. 2)2/(zipf distributionsapplications: distribution for frequency of word usage in natural languages. distribution for p
5、opulations of cities, etc.80/20 rule: 80% of accesses are to 20% of the records. for distributions following 80/20 rule,niennnnnniic1.log/h/.1 . 0 ncnself-organizing listsself-organizing lists modify the order of records within the list based on the actual pattern of record accesses.self-organizing
6、lists use a heuristic for deciding how to reorder the list. these heuristics are similar to the rules for managing buffer pools.heuristics1. order by actual historical frequency of access. (similar to lfu buffer pool replacement strategy.)2. move-to-front: when a record is found, move it to the fron
7、t of the list.3. transpose: when a record is found, swap it with the record ahead of it.text compression exampleapplication: text compression.keep a table of words already seen, organized via move-to-front heuristic.if a word not yet seen, send the word.otherwise, send (current) index in the table.t
8、he car on the left hit the car i left.the car on 3 left hit 3 5 i 5.this is similar in spirit to ziv-lempel coding.searching in setsfor dense sets (small range, high percentage of elements in set).can use logical bit operators.example: to find all primes that are odd numbers, compute:001101010001010
9、0 & 0101010101010101hashing (1)hashing: the process of mapping a key value to a position in a table.a hash function maps key values to positions. it is denoted by h.a hash table is an array that holds the records. it is denoted by ht.ht has m slots, indexed form 0 to m-1.hashing (2)for any value
10、 k in the key range and some hash function h, h(k) = i, 0 = i m, such that key(hti) = k.hashing is appropriate only for sets (no duplicates).good for both in-memory and disk-based applications.answers the question “what record, if any, has key value k?”simple examples(1) store the n records with key
11、s in range 0 to n-1. store the record with key i in slot i. use hash function h(k) = k.(2) more reasonable example: store about 1000 records with keys in range 0 to 16,383. impractical to keep a hash table with 16,384 slots. we must devise a hash function to map the key range to a smaller table.coll
12、isions (1)given: hash function h with keys k1 and k2. is a slot in the hash table.if h(k1) = = h(k2), then k1 and k2 have a collision at under h.search for the record with key k:1. compute the table location h(k).2. starting with slot h(k), locate the record containing key k using (if necessary) a c
13、ollision resolution policy.collisions (2)collisions are inevitable in most applications.example: 23 people are likely to share a birthday.hash functions (1)a hash function must return a value within the hash table range.to be practical, a hash function should evenly distribute the records stored amo
14、ng the hash table slots.ideally, the hash function should distribute records with equal probability to all hash table slots. in practice, success depends on distribution of actual records stored.hash functions (2)if we know nothing about the incoming key distribution, evenly distribute the key range
15、 over the hash table slots while avoiding obvious opportunities for clustering.if we have knowledge of the incoming distribution, use a distribution-dependent hash function.examples (1)int h(int x) return(x % 16);this function is entirely dependent on the lower 4 bits of the key.mid-square method: s
16、quare the key value, take the middle r bits from the result for a hash table of 2r slots.examples (2)for strings: sum the ascii values of the letters and take results modulo m.int h(char* x) int i, sum; for (sum=0, i=0; xi != 0; i+) sum += (int) xi; return(sum % m);this is only good if the sum is la
17、rge compared to m.examples (3)elf hash: from executable and linking format (elf), unix system v release 4.int elfhash(char* key) unsigned long h = 0; while(*key) h = (h 24; h &= g; return h % m;open hashingwhat to do when collisions occur?open hashing treats each hash table slot as a bin.bucket
18、hashingdivide the hash table slots into buckets.example: 8 slots/bucket.include an overflow bucket.records hash to the first slot of the bucket, and fill bucket. go to overflow if necessary.when searching, first check the proper bucket. then check the overflow.closed hashingclosed hashing stores all
19、 records directly in the hash table.each record i has a home position h(ki).if another record occupies is home position, then another slot must be found to store i.the new slot is found by a collision resolution policy.search must follow the same policy to find records not in their home slots.collis
20、ion resolutionduring insertion, the goal of collision resolution is to find a free slot in the table.probe sequence: the series of slots visited during insert/search by following a collision resolution policy.let 0 = h(k). let (0, 1, ) be the series of slots making up the probe sequence.insertion/ i
21、nsert e into hash table httemplate bool hashdict:hashinsert(const elem& e) int home; / home position for e int pos = home = h(getkey(e); / init for (int i=1; !(eecomp:eq(empty, htpos); i+) pos = (home + p(getkey(e), i) % m; if (eecomp:eq(e, htpos) return false; / duplicate htpos = e; / insert e
22、return true;search/ search for the record with key ktemplate bool hashdict:hashsearch(const key& k, elem& e) const int home; / home position for k int pos = home = h(k); / initial posit for (int i = 1; !kecomp:eq(k, htpos) & !eecomp:eq(empty, htpos); i+) pos = (home + p(k, i) % m; / next
23、 if (kecomp:eq(k, htpos) / found it e = htpos; return true; else return false; / k not in hash tableprobe functionlook carefully at the probe function p().pos = (home + p(getkey(e), i) % m;each time p() is called, it generates a value to be added to the home position to generate the new slot to be e
24、xamined.p() is a function both of the elements key value, and of the number of steps taken along the probe sequence.not all probe functions use both parameters.linear probinguse the following probe function:p(k, i) = i;linear probing simply goes to the next slot in the table.past bottom, wrap around
25、 to the top.to avoid infinite loop, one slot in the table must always be empty.linear probing exampleprimary clustering: records tend to cluster in the table under linear probing since the probabilities for which slot to use next are not the same for all slots.improved linear probinginstead of going
26、 to the next slot, skip by some constant c.warning: pick m and c carefully.the probe sequence should cycle through all slots of the table.pick c to be relatively prime to m.there is still some clusteringex: c=2, h(k1) = 3; h(k2) = 5.probe sequences for k1 and k2 are linked together.pseudo-random pro
27、bing(1)the ideal probe function would select the next slot on the probe sequence at random.an actual probe function cannot operate randomly. (why?)pseudo-random probing(2)select a (random) permutation of the numbers from 1 to m-1:r1, r2, , rm-1all insertions and searches use the same permutation.exa
28、mple: hash table size of m = 101r1=2, r2=5, r3=32.h(k1)=30, h(k2)=28.probe sequence for k1:probe sequence for k2:quadratic probingset the ith value in the probe sequence ash(k, i) = i2;example: m=101 h(k1)=30, h(k2) = 29. probe sequence for k1 is: probe sequence for k2 is:secondary clusteringpseudo-
29、random probing eliminates primary clustering.if two keys hash to the same slot, they follow the same probe sequence. this is called secondary clustering.to avoid secondary clustering, need probe sequence to be a function of the original key value, not just the home position.double hashingp(k, i) = i * h2(k)be sure that all probe seq
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 二零二五年度商铺租赁合同终止与竞业禁止协议
- 买现房合同范本
- 二零二五年度金融产品居间推广费用标准合同
- 幼儿园中秋节手工活动
- 2025年度老年公寓护工雇佣与管理协议
- 二零二五年度股东借款合同争议解决合同
- 2025年度电子证书跨行业应用合作协议书
- 护士N1晋级N2述职报告
- 2025新学期开学典礼校长讲话:从李大钊到人工智能!一场跨越百年的青春对话
- “她”绽芳华共谱新章-学校2025年三八妇女节活动方案
- GB/T 44718-2024城市轨道交通无障碍运营服务规范
- DB41T 2567-2023 消防技术服务机构服务规范
- 2024年职工普法教育宣讲培训课件
- 音乐鉴赏与实践 第一单元第四课音乐的力量(下)
- 《外科护理学(第七版)》考试复习题库-上(单选题)
- 92枪械课件教学课件
- 追觅科技在线测评逻辑题
- (人教PEP2024版)英语一年级上册Unit 1 教学课件(新教材)
- 凝中国心铸中华魂铸牢中华民族共同体意识-小学民族团结爱国主题班会课件
- 2024义务教育2022版《道德与法治课程标准》真题库与答案
- 全国职业院校技能大赛高职组(市政管线(道)数字化施工赛项)考试题库(含答案)
评论
0/150
提交评论