Hash Table Fundamentals

Hash tables (or hash maps) are the workhorses behind dictionaries, sets, and many caches. They provide:

  • Average O(1) insert, lookup, and delete (with a good hash function and load factor).

Understanding how they work internally helps you reason about performance and edge cases.

Core Idea

To store a key k:

  1. Compute its hash: h = hash(k).
  2. Map h to an array index: idx = h mod capacity.
  3. Store (k, value) in bucket idx.

To look up k, repeat the process and search within the chosen bucket.

Because many keys may map to the same bucket index (hash collisions), we need a collision handling strategy.

Collision Handling Strategies

Two common approaches:

  1. Separate chaining:
    • Each bucket holds a list (or tree) of (key, value) pairs.
    • On collision, append to that bucket’s list.
    • Lookup: scan the bucket’s list and match on exact key equality.
  2. Open addressing:
    • All entries live directly in the array.
    • On collision, probe to another slot (linear, quadratic, double hashing).
    • Requires careful handling of deletes and load factor.

Most high‑level languages use:

  • Separate chaining with lists or trees (Java’s HashMap uses trees when buckets get too large).
  • Or a variant of open addressing (e.g. CPython’s dict).

Load Factor and Resizing

Performance depends on the load factor:

  • loadFactor = size / capacity

As load factor increases:

  • More collisions.
  • Buckets get longer (for chaining) or probe sequences get longer (for open addressing).

Typical strategy:

  • When load factor exceeds a threshold (e.g. 0.75), allocate a larger array and rehash all entries.

This keeps average operations near O(1) while trading off occasional O(n) resize steps.

Complexity Summary

Under normal conditions (good hash, controlled load factor):

OperationAverageWorst case
InsertO(1)O(n) (all in one bucket)
LookupO(1)O(n)
DeleteO(1)O(n)

In practice:

  • With a good implementation and balanced keys, worst cases are rare.
  • Hash table performance is usually dominated by constant factors like hashing cost.

Hash Tables in Interviews

Typical use cases:

  • Frequency counting / grouping by key.
  • Checking if an element has been seen before.
  • Mapping values to indices or other metadata.
  • Implementing sets (HashSet) and deduplication.

Many “easy to medium” problems become trivial with a hash map, but be mindful of:

  • Space usage.
  • Hash collisions and potential performance degradation for adversarial inputs.

Gotchas

  • Using mutable keys:
    • If you mutate a key after inserting into a hash map, its hash and equality may change, making it unreachable.
  • Bad hash functions:
    • Simple or predictable hashes may cause many collisions under adversarial input.
  • Floating load factor:
    • Very high load factors degrade performance; very low waste memory.

When using built‑in hash maps, most of these details are handled for you, but it’s important to understand them when performance matters.

Example: Frequency Counting with HashMap (Java)

Java
Map<String, Integer> freq = new HashMap<>();
for (String word : words) {
    freq.put(word, freq.getOrDefault(word, 0) + 1);
}

This is one of the most common hash table patterns in interviews: turning a list of values into counts per key in O(n) expected time.