Collision Handling

Hash collisions are inevitable: different keys can hash to the same bucket index. A robust hash table must handle these collisions efficiently and correctly.

This article covers the two primary collision handling strategies and their trade‑offs.

1. Separate Chaining

In separate chaining:

  • Each bucket in the hash table array stores a collection of entries.
  • Common implementations:
    • Linked list of (key, value) pairs.
    • Balanced BST (e.g. Java’s HashMap treeifies buckets when they get too large).

On insert:

  1. Compute idx = hash(key) mod capacity.
  2. If bucket is empty, create a new list/tree.
  3. Insert (key, value) into that bucket (update if key already exists).

On lookup:

  1. Compute idx and fetch bucket.
  2. Scan the bucket’s collection, comparing keys for equality.

Pros:

  • Simple to implement.
  • Handles high load factors reasonably (though with longer chains).
  • Deletion is straightforward (remove from list/tree).

Cons:

  • Extra pointer overhead for lists/trees.
  • Cache locality can be worse than open addressing.

2. Open Addressing

In open addressing:

  • All entries live directly in the array.
  • On collision, you probe alternative slots according to some rule.

Common probing strategies:

  • Linear probing:
    • Try idx, idx+1, idx+2, ... wrapping around with modulo.
  • Quadratic probing:
    • Use an increasing step size (e.g. idx + 1², idx + 2², ...).
  • Double hashing:
    • Use a second hash function to determine the step size.

On insert:

  1. Compute idx = hash(key) mod capacity.
  2. If slot is occupied by a different key, probe to next candidate slot.
  3. Insert in the first free slot found.

On lookup:

  1. Probe through slots using the same rule until:
    • You find the key, or
    • You hit an empty slot that indicates “not found” (depending on deletion scheme).

Pros:

  • Better cache locality (data stored in contiguous array).
  • No extra pointers or node allocations.

Cons:

  • More complicated deletion logic (tombstones or rehashing).
  • Performance degrades sharply at high load factors.
  • Clustering issues (especially with naive linear probing).

Choosing a Strategy

High‑level trade‑offs:

  • Use separate chaining when:
    • You want simplicity and robust performance under varied load.
    • You don’t want to deal with complex deletion semantics.
  • Use open addressing when:
    • You care about memory overhead and cache locality.
    • You can keep the load factor low and control insert/delete patterns.

Many high‑level languages choose chaining because it is easier to implement correctly and tune.

Practical Tips

  • In interviews, you usually don’t need to implement collision handling from scratch.
  • You should, however, understand:
    • Why HashMap/HashSet lookups are average O(1) but can degrade in worst cases.
    • That collision handling determines how the structure behaves under heavy load or bad hashes.

When designing your own hash‑based structures (LRU cache, custom index), choosing the right strategy is key to predictable performance.