Collision Handling
Hash collisions are inevitable: different keys can hash to the same bucket index. A robust hash table must handle these collisions efficiently and correctly.
This article covers the two primary collision handling strategies and their trade‑offs.
1. Separate Chaining
In separate chaining:
- Each bucket in the hash table array stores a collection of entries.
- Common implementations:
- Linked list of
(key, value)pairs. - Balanced BST (e.g. Java’s
HashMaptreeifies buckets when they get too large).
- Linked list of
On insert:
- Compute
idx = hash(key) mod capacity. - If bucket is empty, create a new list/tree.
- Insert
(key, value)into that bucket (update if key already exists).
On lookup:
- Compute
idxand fetch bucket. - Scan the bucket’s collection, comparing keys for equality.
Pros:
- Simple to implement.
- Handles high load factors reasonably (though with longer chains).
- Deletion is straightforward (remove from list/tree).
Cons:
- Extra pointer overhead for lists/trees.
- Cache locality can be worse than open addressing.
2. Open Addressing
In open addressing:
- All entries live directly in the array.
- On collision, you probe alternative slots according to some rule.
Common probing strategies:
- Linear probing:
- Try
idx, idx+1, idx+2, ...wrapping around with modulo.
- Try
- Quadratic probing:
- Use an increasing step size (e.g.
idx + 1², idx + 2², ...).
- Use an increasing step size (e.g.
- Double hashing:
- Use a second hash function to determine the step size.
On insert:
- Compute
idx = hash(key) mod capacity. - If slot is occupied by a different key, probe to next candidate slot.
- Insert in the first free slot found.
On lookup:
- Probe through slots using the same rule until:
- You find the key, or
- You hit an empty slot that indicates “not found” (depending on deletion scheme).
Pros:
- Better cache locality (data stored in contiguous array).
- No extra pointers or node allocations.
Cons:
- More complicated deletion logic (tombstones or rehashing).
- Performance degrades sharply at high load factors.
- Clustering issues (especially with naive linear probing).
Choosing a Strategy
High‑level trade‑offs:
- Use separate chaining when:
- You want simplicity and robust performance under varied load.
- You don’t want to deal with complex deletion semantics.
- Use open addressing when:
- You care about memory overhead and cache locality.
- You can keep the load factor low and control insert/delete patterns.
Many high‑level languages choose chaining because it is easier to implement correctly and tune.
Practical Tips
- In interviews, you usually don’t need to implement collision handling from scratch.
- You should, however, understand:
- Why
HashMap/HashSetlookups are averageO(1)but can degrade in worst cases. - That collision handling determines how the structure behaves under heavy load or bad hashes.
- Why
When designing your own hash‑based structures (LRU cache, custom index), choosing the right strategy is key to predictable performance.