程序设计与算法基础课件

上传人：1*** IP属地：江苏上传时间：2023-12-05 格式：PPTX 页数：86 大小：557.71KB 积分：6 举报 版权申诉

已阅读5页，还剩81页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

程序设计与算法基础(6)1潘爱民2006/10/30Outline2Hash

tablesBloom

filterInverted

indexSearching

problem

again3For

linked

list,

->O(n)For

sorted

array,

O(logn)Can

expect

O(1)?

which

meansRegardless

the

number

elements

beingsearched,

the

run

time

always

the

sameGiven

key,

the

position

the

table

can

beaccessed

directlyOperations

hash

tables4Collection

pairs(key,

element),

here

key

maybe

string,number,record,

etc.Pairs

have

different

keysOperationsGet(theKey)Delete(theKey)Insert(theKey,

theElement)Ideal

Hashing5Uses

array

(or

table)

table[0…m-1].Each

position

this

array

bucketA

bucket

can

normally

hold

only

one

pairUses

hash

function

that

converts

eachkey

into

index

the

range

[0,

m-1]h(k)

the

home

bucket

for

key

kEvery

pair

(key,

element)

stored

its

homebucket

table[h[key]][6]6[7]Ideal

Hashing

ExamplePairs

are:

(22,a),

(33,c),

(3,d),

(73,e),(85,f)Hash

table

table[0…7],

8Hash

function

ish(key)

key/11Pairs

are

stored

table

below(3,d)(22,a)(33,c)(73,e)(85,f)[0]

[1]

[2]

[3]

[4]

[5]Get,

Insert,

and

Delete

take

O(1)

timeWhat

Can

Wrong?7[0]

[1]

[2]

[3]

[4]

[5]

[6]

[7]Where

does

(26,g)

go?Keys

that

have

the

same

home

bucket

aresynonyms22

and

are

synonyms

with

respect

the

hashfunctionthat

use.The

home

bucket

for

(26,g)

already

occupied.(3,d)(22,a)(33,c)(73,e)(85,f)What

Can

Wrong?8A

collision

occurs

when

the

home

bucket

fora

new

pair

occupied

pair

with

adifferent

keyAn

overflow

occurs

when

there

space

inthe

home

bucket

for

the

new

pairWhen

bucket

can

hold

only

one

pair,collisions

and

overflows

occur

togetherNeed

method

handle

overflows(3,d)(22,a)(33,c)(73,e)(85,f)Hash

Table

Issues9Choice

hash

functionsCollision

resolutionSize

hash

tableThe

size

bucket

number

bucketsHash

functions10Definition:A

hash

function

transforms

key

into

index

thetable

used

for

storing

items

the

same

type

KIf

hash

function

transforms

different

keys

into

differentnumbers,

called

aperfect

hash

functionTherefore,

hash

function

mapping

from

itemsto

positionsTotal

number

possible

mappings

mnIf

the

number

perfect

hash

functions

m!/(m-n)!Hash

Functions11Two

partsConvert

key

into

integer

case

the

key

isnot

anintegerh(k)Map

integer

into

home

bucketf(k)

integer

the

range

[0,

m-1],

where

thenumber

buckets

the

tableString

Non-negative

Integer12Each

character

is1

byte

longAn

int

4bytesA

two-character

string

may

converted

into

aunique

byte

non-negative

int

using

the

code:int

answer

s.at(0);answer

(answer

+s.at(1);Strings

that

are

longer

than

characters

not

havea

unique

non-negative

int

representationString

Nonnegative

Integerunsigned

long

operator()(const

stringtheKey){//

Convert

theKey

nonnegative

integer.unsigned

long

hashValue

=0;int

length

(int)

theKey.length();for

(int

length;

i++)hashValue

hashValue+

theKey.at(i);return

hashValue;}13Map

into

homebucket14Most

common

method

bydivisionhomeBucket

h(theKey)

divisor;divisor

equals

the

number

buckets

homeBucket

divisor

m[0]

[1]

[2]

[3]

[4]

[5]

[6][7](3,d)(22,a)(33,c)(73,e)(85,f)Uniform

Hash

Function15Let

keySpacebe

the

set

all

possible

keysA

uniform

hash

function

maps

the

keys

inkeySpace

into

buckets

such

thatapproximately

the

same

number

keys

getmapped

into

eachbucket[0]

[1][2]

[3]

[4]

[5]

[6][7](3,d)(22,a)(33,c)(73,e)(85,f)Uniform

Hash

Function16[0]

[1]

[2]

[3]

[4]

[5]

[6]

[7]Equivalently,

the

probability

that

randomlyselected

key

has

bucket

its

home

bucket

is1/m,0

i<mA

uniform

hash

function

minimizes

thelikelihood

overflow

when

keys

areselected

random(3,d)(22,a)(33,c)(73,e)(85,f)Hashing

Division17keySpace

all

intsFor

every

the

number

ofints

that

getmapped

(hashed)

into

bucket

isapproximately232/mTherefore,

the

division

method

results

auniform

hash

function

whenkeySpace

all

intsIn

practice,

keys

tend

correlatedSo,

the

choice

the

divisor

affects

thedistribution

home

bucketsSelecting

The

Divisor18Because

this

correlation,

applications

tendtohave

bias

towards

keys

that

map

into

oddintegers

(or

into

even

ones)When

the

divisor

even

number,

oddintegers

hash

into

odd

home

buckets

andevenintegers

into

even

home

buckets20%14

30%14

8%14

815%14

3%14

23%14

9The

bias

the

keys

results

biastowardeither

the

odd

even

home

bucketsSelecting

The

Divisor19When

the

divisor

odd

number,

odd

(even)integers

may

hash

into

anyhome20%15

30%15

8%15

815%15

3%15

23%15

8The

bias

the

keys

does

not

result

biastoward

either

the

odd

even

home

bucketsBetter

chance

uniformly

distributed

homebucketsSo

not

use

evendivisorSelecting

The

Divisor20Similar

biased

distribution

home

bucketsisseen,

practice,

when

the

divisor

multipleof

prime

numbers

such

…The

effect

each

prime

divisor

mdecreases

gets

largerIdeally,

choosem

that

primenumberAlternatively,

choosem

that

has

primefactor

smaller

than

20Hashing

folding21The

key

divided

into

several

parts,

andthese

parts

are

combined

folded

togetherShift

foldingUsing

simple

operation

such

addition

combinethem

certain

wayFor

example,

the

social

security

number

(SSN),123-45-6789(123

456

789)%m

(m=1000)Boundary

foldingThe

pieces

the

keys

are

folded

the

bordersbetween

different

parts,

for

example(123+654+789)%m

(m=1000)About

folding22Simple

and

fast,

bit

pattern

can

usedinstead

numerical

valuesIn

the

case

ofstringsXor’ing

the

characters

together,

and

usingtheresult

for

theaddressXor’ing

the

chunks

strings

rather

than

singlecharacters.

The

length

chunk

equal

thenumber

bytes

integerTypically,

the

result

folding

processing

aredivided

modulo

mHashing

Mid-squarefunction23The

key

issquared

and

the

middle

mid

part

theresult

used

the

hashvalueEntire

key

participates

generating

the

hash

value,

abetter

chance

that

the

different

values

are

generated

fordifferent

keysIn

practice,

efficient

choose

power

forthe

size

the

table,

and

extract

the

mid

part

the

bitrepresentation

the

square

key,

just

using

maskand

shiftoperationFor

example,

31212=100101001010000101100001,the

mid

part

101000010

322Hashing

extraction24Only

part

the

key

used

compute

theaddressIf

the

part

distributed

uniformly,

can

besufficient

for

hashing

(provided

the

omitted

portiondistinguishes

the

keys

only

insignificant

way)For

example,

the

student

ID,

some

digits

aresafely

omitted

hash

functionISBN

code

issimilarHashing

radixtransformation25The

key

transformed

into

another

numberbaseFor

example,

the

decimal

number345,thenits

value

base

423(9)Then

divided

modulo

original

base,

theresulting

number

used

the

hash

valueIf

100,

345(10)

and

245(10)

are

not

hashed

tothe

same

value,

but345(10)

and

264(10)

will

behashed

same

value,

23Hashing

cryptographichash

algorithms26Strong

hash

algorithmsA

change

any

bit

the

input

will

cause

thechange

any

bit

the

output

with

0.5

probabilityapproximatelyThe

encryption

algorithm

has

similar

featureUsing

cryptographic

algorithms

hash

akeyFor

example,

compute

the

MD5(key)

orSHA1(key)or

encrypt

the

key

with

fixed

encryptionkey,DESk(key)Then

divided

modulo

mCollision

resolution27AvoidcollisionIncreasing

the

table

size

may

lead

better

hashing,

butsometimesnotThe

two

factors

–

hash

function

and

table

size

–

mayminimize

the

number

collisions,

but

they

can

notcompletely

eliminate

them(if

the

size

key

space

larger

than

the

tablesize)Some

strategiesOpen

addressingChainingBucket

addressingCollision

resolution

byopening

addressing28A

collision

occurs

when

the

home

bucket

for

anew

pair

(key,

element)

occupiedWe

may

handle

collisions

by:Search

the

hash

table

some

systematic

fashion

fora

bucket

that

not

occupied.Linear

probingQuadratic

probingRandom

probingEliminate

overflows

permitting

each

bucket

tokeep

list

all

pairs

for

which

the

homebucketDynamic

arraylinked

listOpening

addressing29If

the

position

h(K)

occupied,

then

the

positions

inthe

probing

sequencenorm(h(K)+p(1)),

norm(h(K)+p(2)),

…,

norm(h(K)+p(m))are

trieduntileither

available

cell

foundor

the

same

positions

are

tried

repeatedlyor

the

table

fullIn

the

probing

sequence,Function

probing

functioni

aprobenorm

normalization

function,

(division

modulo

thesize

thetable)Linear

probing30In

the

open

addressingp(i)

c*i,so

=1,

then

the

position

tried

(h(K)

i)In

linear

probing,Insertion:

the

position

store

key

found

bysequentially

searching

all

positions

starting

from

theposition

calculated

the

hash

function

until

emptycellis

foundTendency

create

clusters

the

table,

the

empty

cellsfollowing

clusters

have

much

greater

chance

filledthan

other

positionsLinear

Probing

–

Get

AndInsert31divisor

(number

buckets)

17Home

bucket

key

170

81216Insert

pairs

whose

keys

are

12,

34,

29,

28,11,23,7,0,33,30,45340456237281229113033Quadratic

probing32In

the

open

addressingIn

general,

p(i)

c2*i2+c1*i

for

example,p(i)

(-1)i-1((i+1)/2)2,

the

position

tried

ish(K),

h(K)

+1,

h(K)

-1,

h(K)

-4,

…,h(K)+(m-1)2/4,

h(K)

–(m-1)2/4The

experience

value

(table

size)

prime

theform

4j+1Question:Why

not

usep(i)

i2,

i<m(i2-j2)=(i+j)(i-j)The

effect

clustering

better

than

linear

probingSecondary

cluster

for

the

keys

hashed

the

same

locationRandom

probing33p

function

defined

pseudo

randomnumber

generatorArequirementIn

order

find

the

probing

sequence,

the

randomnumber

generator

should

initialized

tothesame

seed

for

the

same

keyAvoid

secondary

clustersThe

same

probing

sequence

for

keyDifferent

probing

sequences

for

different

keys

withthe

same

hashvalueDouble

hashingp

function

defined

with

another

hash

function,hp(K)p(i)

i*hp(K)The

probing

sequence

ish(K),

h(K)

+hp(K),

h(K)

+2hp(K),

h(K)

3hp(K),

…,h(K)+(m-1)hp(K)The

probing

sequence

will

depend

the

choice

hashfunction

hp(K)The

hash

functionhp(K)

can

defined

with

theoriginal

hash

function

h(K),

for

example,hp(K)

i*h(K)+1,

then

the

probing

sequence

ish(K)

+i*(i*h(K)+1)=(i2+1)h(K)+iSecondary

clusters34Performance

openingaddressing35Successful

searches

vs.

unsuccessful

searches[The

Art

Computer

Programming,

Volume

3]Linear

probingQuadraticsearchDouble

hashingSuccessfulsearches[1+1/(1-LF)]/21-ln(1-LF)-LF/2-ln(1-LF)/LFUnsuccessfulsearches[1+1/(1-LF)2]/21/(1-LF)-LF-ln(1-LF)1/(1-LF)LF

=n/m,

load

factor

(number

elements

the

table)/table

sizePerformance

Linear

Probing36Worst-case

Get/Insert

time

(n),

where

the

number

keys

thetableThis

happens

when

all

pairs

are

the

same

clusterExpected

PerformanceSn

expected

number

buckets

examined

successfulsearchUn

expected

number

buckets

examined

unsuccessfulsearch0

1216340456237281229113033Expected

Performance37Sn

[1+1/(1-LF)]/2Un

[1+1/(1-LF)2]/2Note

that0

1LF

0.65isrecommended.LFSnUn0.51.52.50.651.94.60.752.58.50.905.550.5Linear

Probing

–

Delete380

81216340456237281229113033Delete(0)048121634456237281229113033Search

cluster

for

pair

(if

any)

fill

vacatedbucket0

1634456237281229113033Linear

Probing

–

Delete(34)39Search

cluster

for

pair

(if

any)

fill

vacatedbucket0

16045623728122911303304812160456237281229113033048121634045623728122911303304812160456237281229113033Linear

Probing

–

Delete(29)0

1240163404562372812291130330

12Search

cluster

for

pair

(if

any)

fill

vacatedbucket1634045623728121130330

48121634045623728121130330481216340456237281211303304812163406237281211304533Hash

Table

Design41Performance

requirements

are

given,

determinemaximum

permissible

load

factorWe

want

successful

make

than10

compares

(expected)Sn

½(1

1/(1

–

LF))LF

18/19We

want

unsuccessful

make

morethan

compares(expected)Un

½(1

1/(1

–

LF)2)LF

4/5So

min{18/19,

4/5}

4/5Hash

Table

Design42Dynamic

resizing

tableWhenever

load

factor

exceeds

threshold

(4/5

ourexample),

rehash

into

table

approximately

twicethecurrent

sizeFixed

table

sizeKnow

maximum

number

pairsNo

than

1000

pairsLF

4/5

=>m

5/4*1000

=1250.Pick

(equal

divisor)

prime

number

oddnumber

with

prime

divisors

smaller

than

20Collision

resolution:

chaining43Each

bucket

keeps

linear

list

all

pairs

forwhich

the

home

bucketNever

overflow

the

capacity

the

linear

list

isunlimitedThe

linear

list

may

not

sorted

key.The

linear

list

may

array

linear

list

chainPerformanceIncreasing

the

length

the

lists

can

degraderetrieval

performanceRequire

additional

space

for

pointersSorted

ChainsPut

pairswhose

keys

are6,12,34,29,28,11,23,7,0,33,30,45Home

bucket=key

%17.[0][4][8][12][16]12634292811237033304544Expected

Performance45Notethat

0.Expected

chain

length

LF.Sn

LF/2

(in

the

case

unsortedlist)Un

LFCoalesced

hashing

(coalescedchaining)Combine

linear

probing

and

chainingl

h(K)

=K%17l

Insert

pairs

whose

keys

are6,

12,

34,

29,

28,

11,23,

1634

11Alternatively,

the

colliding

key

can

put

inan

overflow

area

(called

cellar)46Bucket

addressing47Associate

bucket

with

each

address,

and

abucket

block

space

enough

storemultiple

itemsCollision

not

totally

avoided,

bucket

isfull,

thenA

new

item

hashed

the

bucket

can

stored

inthe

bucket,

just

does

the

openaddressing

approachThe

new

item

can

also

stored

anoverflowarea.

The

bucket

will

marked

with

flagindicating

has

additional

items

searchedPerfect

hash

functions48Perfect

hash

functions

ideal

hash

functionsIt

hashes

key

its

proper

position,

and

collisionsoccurAn

assumption

that

the

key

set

knownIf

the

number

cells

the

table

equal

the

number

ofdata

items,

called

minimal

perfect

hash

functionso

space

wastedExamplesReserved

words

used

assemblers,

compilers,

filesonunerasable

optical

disks,

dictionaries,

…It

not

easy

obtain

perfect

hash

functionCichelli’s

method

construct

aminimal

perfect

hash

function49Developed

Richard

CichelliUsed

hash

relatively

small

number

ofreserved

wordsThe

function

the

formh(word)

(length(word)

g(firstletter(word))+

g(lastletter(word)))

mod

mwhere

the

function

constructedCichelli’s

algorithm50Choose

value

for

MaxCompute

the

number

occurrences

each

firstand

last

letter

the

set

all

wordsOrder

all

words

accordance

the

frequency

ofoccurrence

the

first

and

the

last

lettersExamples:

Calliope,

Clio,

Erato,

Euterpe,Melpomene,

Polyhymnia,

Terpsichore,

Thalia,Urania

E(6),

A(3),

C(2),

O(2),

T(2),

M(1),

P(1),

U(1)Euterpe,

Calliope,

Erato,

Terpsichore,

Melpomene,Thalia,

Clio,

Polyhymnia,

UraniaCichelli’s

algorithm

(cont.)Search(wordList)if

wordList

emptyhalt;word

first

word

from

wordList;wordList

wordList

with

the

first

word

detached;if

the

first

and

the

last

letters

word

are

assigned

g-valuestry(word,

-1,

-1)

-1

signifies

‘value

already

assigned’ifsuccessSearch(wordList)put

word

the

beginning

wordList

and

detach

itshashvalue;else

neigher

the

first

nor

the

last

letters

has

g-valuefor

each

n,m

{0,

…,

Max)try(word,

m);ifsuccessSearch(wordList)put

word

the

beginning

wordList

and

detach

itshashvalue;else

either

the

first

the

last

letter

has

ag-valuefor

each

{0,

…,

Max)try(word,

-1,

n);ifsuccess51Cichelli’s

algorithm

(cont.)try(word,

firstLetterValue,

lastLetterValue)if

h(word)

has

not

been

claimedreserve

h(word);if

not

-1(i.e.,

not

reserved)assign

firstLetterValue

and/or

lastLetterValueasg-values

offirstletter(word)

and/or

lastletter(word)return

successreturn

failure52A

invocations

the

searching

procedurereserved

hash

valuesEuterpe

E=0

h=7TerpsichoreCalliope

C=0

h=8Erato

O=0

h=5T=0h=2Melpomene

M=0Thalia

A=0Clioh=0h=6h=4h=1h=6*h=7*h=8*h=0*h=1*h=2

*h=3h=6*{7}{78}{578}{2578}{02578}{025678}{0245678}{01245678}{01245678}{01245678}{01245678}{01245678}{01245678}{0245678}{02345678}{02345678}h=7*h=8*h=0*Polyhymnia

P=0Urania

U=0Urania

U=1Urania

U=2Urania

U=3Urania

U=4Polyhymnia

P=1Polyhymnia

P=2Urania

U=0Urania

U=1Urania

U=2Urania

U=3Urania

U=4h=1{02345678}{02345678}{02345678}{012345678}53About

cichelli’s

algorithm54A

brute-force

for

g-function,

thesearchprocess

exponentialIt

not

applicable

large

number

wordsIt

does

not

guarantee

that

perfect

hash

function

can

befoundExtensions(1)

The

second

last

letters

the

word

are

involved

thehash

functions(2)

h(word)

length(word)

g1(firstletter(word))

+…+glength(word)(lastletter(word))(3)

h(word)

bucketgr(word)

hgr(word)(word)FHCD

algorithm55Search

for

minimal

perfect

hash

function

theformh(word)

h0(word)+

g(h1(word))+g(h2(word))h0:

key

space

->[0,m)h1:

key

space

->[0,

r)h2:

key

space

->[r,2r)Steps:Mapping

(dependency

graph):

between

for

eachwordOrdering:

first

determine

the

node

with

linksSearching:

assign

hash

values

keys

choosingappropriate

g-value

for

each

vertexFor

each

vertex,

eitherg(h1(word))

g(h2(word))

knownExtensible

hashing000110110001100110001011001111011111000110001011221b00b01b1000110110001100110001011001101100010112h(k)=11001

h(k)=000012

222b00b01b101101111100110012b1100011000011001101100010113322b000b01b101101111100110012b110000010100111001011101115600110001013b001An

example

hash

functionIn

distributed

environments,

how

partition

alarge

number

jobs

(or

records)

into

acluster

ofmachineskeyv=h(key)57Linear

hashingNo

index

neededThe

bucket

split

ifnecessaryAt

each

level

splitting,

linear

hashing

maintaintwo

hash

functions,hlevel

and

hlevel+1,

suchthathlevel

mode(TSize*2level)000101011000110000100111100111010011101158STL

hash_map59Simply

uses

divisor

that

odd

numberThis

simplifies

implementation

because

wemust

able

resize

the

hash

table

morepairs

are

put

into

the

dictionaryArray

doubling,

for

example,

requires

you

gofrom

array

table

whose

length

(which

isodd)

array

whose

length

2m+1

(which

isalsoodd)About

hash

tables60Perfect

hash

functions

m)Uniform

hash

functions

≥

m)Performance

requirementsHow

deal

with

collisionsHow

deal

with

overflows

the

size

ofbuckets

fixedBloom

Filters

–

introduction61Differential

FilesSimple

large

databaseFile

records

residing

diskSingle

keyIndex

recordsOperationsRetrieveUpdateInsert

new

recordMake

changes

existingrecordDelete

recordNaïve

Mode

OperationProblemsIndex

and

File

change

with

timeRecovery

=>Copy

Master

File

(MF)

from

backupCopy

Master

Index

(MI)

from

backupProcess

all

transactions

since

last

backupRecovery

time

depends

MIsize

transactions

since

lastbackupKeyIndexFile62Differential

FileMake

changes

master

fileAlter

index

and

write

updated

recordKeyIndexFileDFto

new

file

calleddifferential

fileAdvantage63DF

smaller

than

File

and

may

bebacked

frequentlyIndex

needs

backed

upwhenever

is.

So,

index

should

beno

larger

than

DFRecovery

time

reducedDifferential

File

OperationKeyIndexFileDFDisadvantage64Eventually

becomes

large

and

canno

longer

backed

with

desiredfrequencyMust

integrate

File

and

nowFollowing

integration,

emptyDifferential

File

OperationKeyIndexFileDFLarge

Index65Index

cannot

backed

asfrequently

desiredTime

recover

current

state

index

&DF

isexcessiveUse

differential

indexMake

changes

IndexDI

index

all

deleted

records

andupdated

records

DFDifferential

File

Index

OperationPerformance

hitMost

queries

both

andIndexIncrea

人人文库> 全部分类> 图纸下载 > 毕业设计

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

程序设计与算法基础课件

文档简介

温馨提示

最新文档

评论

程序设计与算法基础课件

文档简介

温馨提示

最新文档

评论

相关文档