Hash Table Fundamentals

Hash tables (or hash maps) are the workhorses behind dictionaries, sets, and many caches. They provide:

Average O(1) insert, lookup, and delete (with a good hash function and load factor).

Understanding how they work internally helps you reason about performance and edge cases.

Core Idea

To store a key k:

To look up k, repeat the process and search within the chosen bucket.

Because many keys may map to the same bucket index (hash collisions), we need a collision handling strategy.

Two common approaches:

Separate chaining:
- Each bucket holds a list (or tree) of (key, value) pairs.
- On collision, append to that bucket’s list.
- Lookup: scan the bucket’s list and match on exact key equality.
Open addressing:
- All entries live directly in the array.
- On collision, probe to another slot (linear, quadratic, double hashing).
- Requires careful handling of deletes and load factor.

Most high‑level languages use:

Separate chaining with lists or trees (Java’s HashMap uses trees when buckets get too large).
Or a variant of open addressing (e.g. CPython’s dict).

Performance depends on the load factor:

As load factor increases:

More collisions.
Buckets get longer (for chaining) or probe sequences get longer (for open addressing).

Typical strategy:

When load factor exceeds a threshold (e.g. 0.75), allocate a larger array and rehash all entries.

This keeps average operations near O(1) while trading off occasional O(n) resize steps.

Under normal conditions (good hash, controlled load factor):

In practice:

With a good implementation and balanced keys, worst cases are rare.
Hash table performance is usually dominated by constant factors like hashing cost.

Typical use cases:

Many “easy to medium” problems become trivial with a hash map, but be mindful of:

Using mutable keys:
- If you mutate a key after inserting into a hash map, its hash and equality may change, making it unreachable.
Bad hash functions:
- Simple or predictable hashes may cause many collisions under adversarial input.
Floating load factor:
- Very high load factors degrade performance; very low waste memory.

When using built‑in hash maps, most of these details are handled for you, but it’s important to understand them when performance matters.

Java
Map<String, Integer> freq = new HashMap<>();
for (String word : words) {
    freq.put(word, freq.getOrDefault(word, 0) + 1);
}

This is one of the most common hash table patterns in interviews: turning a list of values into counts per key in O(n) expected time.