HashMap Deep Dive - Resize, Collisions, Treeify

Understand how Java's HashMap works internally: its array-and-bucket structure, hash distribution, collision handling with linked lists and red-black trees, resize behavior, and treeification conditions. This article explains the mechanisms behind HashMap and how to use it effectively in real systems.

Overview

HashMap stores key-value pairs in an array of buckets. Each bucket can hold a linked list of nodes (or a red-black tree when the list grows long). The structure enables average O(1) get and put operations when the hash function distributes keys well, but understanding internals helps you avoid performance pitfalls and choose the right data structure for concurrent access.

Key concepts:

  • Hash and bucket index: hash = (h = key.hashCode()) ^ (h >>> 16); then index = (n - 1) & hash where n is the table length (a power of 2). The XOR with the high 16 bits reduces collisions when the lower bits are similar.
  • Collisions: Keys that map to the same bucket are chained in a linked list. In Java 8+, when a bucket's list exceeds a threshold, it is converted to a red-black tree to avoid O(n) lookups.
  • Resize: When size > threshold (threshold = capacity × loadFactor, default 0.75), the table doubles in size and all entries are rehashed.

Example

Basic put and get

Java
Map<String, Integer> map = new HashMap<>();
map.put("a", 1);
map.put("b", 2);
map.put("c", 3);
int v = map.get("a");  // 1
// Initial capacity is 16, threshold is 12; the 13th put triggers resize

Collision and chaining

When multiple keys produce the same bucket index, they form a chain. In Java 8, new entries are appended at the end of the list (or inserted into the tree). A poor hashCode() that clusters keys in few buckets degrades performance.

Java
// Keys with similar hashCodes may land in the same bucket
Map<BadKey, String> map = new HashMap<>();
// If BadKey.hashCode() always returns the same value, all entries chain in one bucket

Treeification

When a bucket's list reaches length 8 and the table has at least 64 buckets, that bucket is converted to a red-black tree. This avoids O(n) lookups when many keys collide (e.g. due to a weak hash or a deliberate attack). When the tree shrinks to 6 or fewer nodes, it is converted back to a list.

Java
// With 8+ colliding keys in one bucket, the bucket becomes a tree
Map<Integer, String> map = new HashMap<>();
// Use keys that deliberately collide to observe treeification (e.g. same lower bits)
for (int i = 0; i < 10; i++) {
    map.put(i * 16, "value" + i);  // many of these may land in same bucket
}

Hash and Bucket Index

The hash is computed as:

Java
int hash = (h = key.hashCode()) ^ (h >>> 16);
int index = (n - 1) & hash;
  • The high 16 bits are XORed with the low 16 bits so that variations in the high bits affect the lower bits used for indexing. This improves distribution when n is small.
  • The table length n is always a power of 2, so (n - 1) & hash is equivalent to hash % n but faster.

Resize Behavior

When size > threshold:

  1. Allocate a new table with twice the capacity.
  2. Rehash every entry: each node either stays at the same index or moves to index + oldCapacity. Because the new capacity doubles, one extra bit decides the new bucket; nodes split evenly.
  3. If the old bucket was a tree and splits into two, each half may be converted back to a list if it has 6 or fewer nodes.

Resize is expensive (O(n)), so if you know the expected size, use new HashMap<>(expectedSize) or slightly larger to reduce resizes. The default load factor 0.75 balances space and collision probability.

Treeification and Untreeify

ConditionAction
Bucket list length ≥ 8 and table length ≥ 64Convert bucket to red-black tree
Bucket tree size ≤ 6 (after split or removal)Convert back to linked list

Treeification avoids worst-case O(n) lookups when many keys collide. The threshold of 8 is chosen based on statistical analysis of hash distributions; the 6-node untreeify threshold avoids frequent convert-back-and-forth when the size fluctuates around 8.

OpenJDK Constants You Should Memorize

If you search in OpenJDK source (java.util.HashMap), these constants explain most interview questions:

Java
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // 16
static final float DEFAULT_LOAD_FACTOR = 0.75f;
static final int TREEIFY_THRESHOLD = 8;
static final int UNTREEIFY_THRESHOLD = 6;
static final int MIN_TREEIFY_CAPACITY = 64;

How they work together:

  • 16 + 0.75 means default resize threshold is 16 * 0.75 = 12.
  • 8 means treeification candidate threshold for bucket chain length.
  • 64 means even if chain length reaches 8, HashMap prefers resize first when table is still small. Treeification is allowed only when table size is already at least 64.
  • 6 means if a tree bucket shrinks to 6 or fewer nodes (after remove/split), it turns back into list.

This design balances:

  • Runtime speed under collisions.
  • Memory overhead (tree nodes are heavier than list nodes).
  • Stability (avoid tree/list frequent conversion).

Why "treeify threshold 8 and 64"?

Many users search: "java 8 hashmap treeify threshold 8 64" and "java 8 hashmap treeify threshold 8 red black tree".

The short answer:

  • 8 is not enough by itself.
  • Treeify only when both conditions are true:
    • bucket length >= 8
    • table capacity >= 64

Why this two-step rule exists:

  1. If the table is small, long chains often mean the whole table is just too small.
  2. Expanding the table (resize) can naturally split chains into different buckets.
  3. Building a red-black tree too early would add extra memory and compare overhead.
  4. Only when capacity is already 64+ and collisions still concentrate does treeification become worthwhile.

So for early growth, HashMap uses resize first, tree later.

Why "resize threshold is size > threshold"?

Another common query: "openjdk hashmap resize threshold size > threshold".

In Java 8 HashMap insertion path, resize is triggered when:

  • ++size > threshold

This means:

  • Resize is evaluated after insertion count increases.
  • The element that makes size exceed threshold causes resize.

For defaults:

  • capacity = 16
  • loadFactor = 0.75
  • threshold = 12

Then:

  • 1st to 12th insert: no resize.
  • 13th insert: size becomes 13, now 13 > 12, so resize to 32.

This is why people say "the 13th put triggers first resize".

Default Capacity 16 and Load Factor 0.75: Practical Sizing Formula

Another common search is close to:

"java hashmap default initial capacity 16 load factor 0.75 treeify threshold 8".

For production code, avoid repeated resize by pre-sizing.

If you expect expectedSize elements, use:

Java
int cap = (int) Math.ceil(expectedSize / 0.75d);
Map<K, V> map = new HashMap<>(cap);

Example:

  • expected 1,000 entries
  • required threshold >= 1,000
  • required capacity >= 1,334
  • next power of two = 2,048

So new HashMap<>(2048) is a reasonable choice for stable latency.

Interview and Production Pitfalls

  1. Treeify is not global: only one bucket is treeified, not the whole map.
  2. Treeify does not mean no collisions: it changes bucket lookup from list O(n) to tree O(log n) for that bucket.
  3. Large map still can be slow if hashCode() quality is poor.
  4. Mutable key is a correctness bug, not just performance issue.
  5. HashMap is not thread-safe. Resize/collision races in concurrent mutation can corrupt behavior; use ConcurrentHashMap.

FAQ (Based on Search Queries)

java 8 hashmap treeify threshold 8 64

  • TREEIFY_THRESHOLD = 8
  • MIN_TREEIFY_CAPACITY = 64
  • Both must be satisfied to treeify. Otherwise resize is preferred.

openjdk hashmap resize threshold size > threshold

  • Trigger condition is ++size > threshold.
  • With defaults (16, 0.75), threshold is 12, so first resize occurs on insert #13.

hashmap resize threshold size > threshold meaning

  • threshold is not "max size forever".
  • It is the resize trigger for current capacity only.
  • After resize, threshold is recalculated using new capacity and load factor.

Key Rules

  • Implement hashCode() and equals() correctlyequals implies hashCode equality; unequal objects should ideally have different hash codes.
  • Prefer immutable keys — mutable keys changed after insertion can break the map (wrong bucket, lost entries).
  • Initial capacity — use new HashMap<>(expectedSize) to avoid repeated resizes when the size is known.
  • Thread safetyHashMap is not thread-safe. Use ConcurrentHashMap for concurrent access.

What's Next

For the concurrent version, see ConcurrentHashMap Internals. For collection overview, see Java Collections in Practice.