Symbol-table Probem
Table s holding n records,each record has a key value and some satellite data.
operations
1) Insert 2) Delete 3)Search
Direct Access Table
suppose keys are drawm from U={0,1,…,m-1}.
Assume keys are distinct.
set up array T[0,1,…,m-1] to represent dynamic set s.
T[k] = x if x∈s && key[x] = k ,otherwise nil.
Here all opyions take θ(1) time.
Limitation: a small-size array with large value like 64 bits long is unafford to store .
Hashing
We use a hashing function H which maps the keys “randomly” into slots of table T.
-
collision
When a record to be insered maps to an already occupied slot,a collison occurs. -
Resolving collisions by chaining
- Idea:link records in the same slot into list
- Analysis
- Worst-Case
every kry hashes to the same slot.
Access take θ(n) time if |S| = n. - Average-Case
With assumption of simple uniform of hashing,each key k∈S equally likely to be hashed to any slot in T,independent pf where other keys are hashed to.
- Worst-Case
-
Definition of Load factor
Thel load factor of hash table with n keys and m slots is α=n/m = average |keys| per slot.
We give a conclusion that Expected search time = θ(1+α)。 -
How to choose a hash function?
- It should distribute keys uniformly into slots.
- Regularity in the key distributions should not afffect uniformity.
Several usual hash functions
1) Division method
h(k) = K mod m
- Don not pick m with small divisor d.
e.x. d = 2 and all keys are even,then odd slots never used.
m = 2^r ->hash does not depend on all slots.
- We shold pick m primely not too close to power of 2 or 10.
2)Multiplication method
m = 2^r, computer has w-bits words.
h(k) = (A*k mod 2^w) rsh (w-r) ,here rsh denotes right shift,and A is an odd number between 2^(w-1) and 2^w
- Dont pick A too close to 2^(w-1) or 2^w.
- multiplication and mod operation is faster than division and rsh is fast,too.
e.x. m = 8 = 2^3 ,w = 7, w-r = 4
1 0 1 1 0 0 1 A
* 1 1 0 1 0 1 1 k
= 1 0 0 1 0 1 0 0 1 1 0 0 1 1
High order of result will be ignore,then h(k) = 0 1 1 after right shifting.
- Resolving collision by open addressing(No storage for links)
- Idea:
- Probe table systematically utill an empty slot is found.
- Probe step shold be permutation.
limitation: deletion operation is difficult.
- Probing strategies
- Linear - h(k,i) = (h(k,α)+i) )mod n
“Primary clustering” - long runs of filled slots - Double hashing - h(k) = (h1(k)+i*h2(k)) mod m
excellent method - use pick m = 2^r and h2(k) odd
- Anylysis of open addressing
Assumption of uniform hashing: each key is equally likely to have one of the m’ permutations as probe sequence,independent of other keys.
We can proof that the expected probe nums E[#probes] <= 1/(1-α) if α<1 ,so n<m.
And if α is a constant,then we it takes θ(1)probe.