Hash Tables
A fitness app tracking hundreds of workouts across months needs to find this week’s Monday run. Linear search through an unsorted list means checking every entry until finding the match—taking O(n)
time. With 365 days of workouts, that’s potentially 365 comparisons just to find one value.
Hash tables solve this problem with mathematics instead of searching. A hash function transforms a key like “Monday Run” into a number. That number becomes an array index, allowing instant O(1)
access regardless of collection size. One calculation, direct retrieval—no iteration required.
In this chapter, we’ll explore hash table design principles and implement a robust, modern hash table in Swift using generics from Chapter 7 and collision resolution via linked lists from Chapter 9.
Understanding hash tables
Hash tables solve a fundamental problem in computer science: how do we store and retrieve data in constant time? The answer lies in using a hash function to transform keys into array indices, allowing direct access to stored values.
Keys & values
Compared to other data structures:
- Linked Lists: Flexible but require
O(n)
search time due to sequential traversal - Arrays: Fast indexed access but require knowing the exact position
- Hash Tables: Combine the best of both worlds - fast
O(1)
average case operations with flexible key-based access
A well-designed hash table can achieve constant time O(1)
for insertion, deletion, and lookup operations, making it one of the most efficient data structures available.
iOS apps use hash tables extensively. UserDefaults stores app preferences as key-value pairs for instant lookup. HTTP headers in network requests use hash table lookups. Workout tracking apps retrieve today’s stats by date using hash calculations instead of iterating through all dates. Caching systems store expensive computations for instant retrieval.
The mechanics are elegant: hash the key to produce a number, use modulo to convert that number into an array index, jump directly to that position in O(1)
time. No loops. No comparisons. Just mathematics. This is why key-value lookups feel instant even with thousands of entries.
Hash function fundamentals
A hash table consists of two essential components:
- Hash Function: Transforms keys into array indices
- Bucket Array: Stores the actual key-value pairs
The hash function is the heart of any hash table. It takes an input key and produces a numeric hash value that determines where to store the data:
// Example: "Swift" → hash function → index 7
// The key "Swift" will always map to the same index
Hash function properties
A good hash function should be:
- Deterministic: Same input always produces same output
- Fast:
O(1)
computation time - Uniform: Distributes keys evenly across buckets
- Avalanche Effect: Small input changes create large output changes
Bucket storage
Values are stored in non-contiguous “buckets” within an array. The hash function determines which bucket to use, enabling direct access without searching through the entire structure.
Consider hashing workout dates to bucket indices. Different keys hash to different buckets, avoiding collisions and maintaining O(1)
lookup:
Key transformation example:
"2024-01-15" → hash() → 872349 → % 16 → bucket 13
"2024-01-16" → hash() → 103847 → % 16 → bucket 7
"2024-01-17" → hash() → 445092 → % 16 → bucket 4
The hash function transforms any string (workout date, exercise name, user ID) into an integer. The modulo operation wraps that integer into a valid bucket index (0 to capacity-1). This is why hash tables can store millions of workout records with instant lookups—calculating where data lives rather than searching for it.
Modern hash table implementation
Our hash table implementation uses generics from Chapter 7 and modern Swift patterns for type safety and performance. We’ll start with the supporting structures:
// Generic node for collision resolution via chaining (linked list from Chapter 9)
public class HashNode<Key: Hashable, Value> {
public let key: Key
public var value: Value
public var next: HashNode<Key, Value>?
public init(key: Key, value: Value) {
self.key = key
self.value = value
}
}
// Result types for hash table operations
public enum HashTableResult {
case success
case collision
case notFound
case updated
}
Protocol-based design
We’ll use Swift’s Hashable
protocol which provides built-in hash functionality, making our implementation more robust and compatible with Swift’s type system:
// All Hashable types automatically work with our hash table
// This includes String, Int, Double, and custom types that conform to Hashable
Core hash table structure
Our modern hash table implementation focuses on performance, type safety, and Swift best practices:
// Hash table with dynamic resizing and load factor management - O(1) average operations
public class HashTable<Key: Hashable, Value> {
private var buckets: [HashNode<Key, Value>?]
private var capacity: Int
private var size: Int = 0
// Load factor threshold for resizing
private let maxLoadFactor: Double = 0.75
public init(capacity: Int = 16) {
self.capacity = capacity
self.buckets = Array(repeating: nil, count: capacity)
}
public var count: Int {
return size
}
public var isEmpty: Bool {
return size == 0
}
// Current load factor for performance monitoring
public var loadFactor: Double {
return Double(size) / Double(capacity)
}
}
Dynamic resizing
Unlike our earlier simple implementation, this hash table automatically resizes when the load factor becomes too high, maintaining optimal performance:
// Resize and rehash all elements when load factor exceeds threshold - O(n)
extension HashTable {
private func shouldResize() -> Bool {
return loadFactor > maxLoadFactor
}
private func resize() {
let oldBuckets = buckets
capacity *= 2
size = 0
buckets = Array(repeating: nil, count: capacity)
// Rehash all existing elements
for head in oldBuckets {
var current = head
while let node = current {
insert(node.key, value: node.value)
current = node.next
}
}
}
}
Advanced hash function implementation
Instead of creating a custom protocol, we leverage Swift’s built-in Hashable
protocol, which provides robust hashing for all standard types and allows custom types to easily conform:
// Modern hash function using Swift's built-in Hasher for uniform distribution
extension HashTable {
// Modern hash function using Swift's built-in hasher
private func hashIndex(for key: Key) -> Int {
return abs(key.hashValue) % capacity
}
// Alternative hash function for better distribution
private func betterHashIndex(for key: Key) -> Int {
var hasher = Hasher()
hasher.combine(key)
return abs(hasher.finalize()) % capacity
}
}
Custom type support
Any type can be used as a key by conforming to Hashable
:
// Example: Custom vertex type for graph algorithms (Chapter 13)
struct Vertex: Hashable {
let id: String
let coordinates: (x: Int, y: Int)
// Hashable conformance
func hash(into hasher: inout Hasher) {
hasher.combine(id)
hasher.combine(coordinates.x)
hasher.combine(coordinates.y)
}
static func == (lhs: Vertex, rhs: Vertex) -> Bool {
return lhs.id == rhs.id &&
lhs.coordinates.x == rhs.coordinates.x &&
lhs.coordinates.y == rhs.coordinates.y
}
}
// Usage
let vertexTable = HashTable<Vertex, String>()
Hash function quality
Our implementation uses Swift’s robust hashing system which implements SipHash (a cryptographically secure hash function), seed randomization (producing different hash values across program runs), and collision resistance to minimize hash collisions.
Collision resolution strategies
Even with excellent hash functions, collisions are inevitable. Collisions occur when different keys map to the same index. Our implementation uses separate chaining with linked lists (Chapter 9) to handle these situations.
Consider what happens when two different workout names hash to the same bucket:
// Two workouts colliding at the same bucket
"Morning Run" → hash() → 582934 → % 16 → bucket 6
"Evening Walk" → hash() → 647766 → % 16 → bucket 6 // Collision!
// Solution: Chain them in a linked list at bucket 6
Bucket 6: [("Morning Run", 450)] → [("Evening Walk", 320)] → nil
Even with collisions, lookups remain fast. Instead of checking all workouts, lookups only check the 2-3 items that happened to hash to the same bucket. This is why Dictionary maintains O(1)
average case even with millions of entries—collisions are rare and shallow.
Here’s how our implementation handles collisions:
// Handle collisions by chaining nodes in linked list - O(1) insertion, O(k) search where k=chain length
extension HashTable {
private func insertInChain(key: Key, value: Value, at index: Int) -> HashTableResult {
if buckets[index] == nil {
// No collision - direct insertion
buckets[index] = HashNode(key: key, value: value)
size += 1
return .success
}
// Handle collision via chaining
var current = buckets[index]
while let node = current {
if node.key == key {
// Update existing key
node.value = value
return .updated
}
if node.next == nil {
// Add to end of chain
node.next = HashNode(key: key, value: value)
size += 1
return .collision
}
current = node.next
}
return .success
}
}
Alternative collision resolution
While we use chaining, other strategies include:
- Open Addressing: Find next available slot
- Robin Hood Hashing: Minimize variance in probe distances
- Cuckoo Hashing: Guarantees
O(1)
worst-case lookup
Complete CRUD operations
Our modern hash table provides full Create, Read, Update, Delete functionality with automatic resizing:
// Complete CRUD operations with automatic resizing - O(1) average case
extension HashTable {
// Insert or update a key-value pair
@discardableResult
public func insert(_ key: Key, value: Value) -> HashTableResult {
let index = hashIndex(for: key)
let result = insertInChain(key: key, value: value, at: index)
// Resize if load factor is too high
if shouldResize() {
resize()
}
return result
}
// Retrieve value for a given key
public func getValue(for key: Key) -> Value? {
let index = hashIndex(for: key)
var current = buckets[index]
while let node = current {
if node.key == key {
return node.value
}
current = node.next
}
return nil
}
// Remove a key-value pair
@discardableResult
public func remove(_ key: Key) -> Value? {
let index = hashIndex(for: key)
guard let head = buckets[index] else {
return nil
}
// Handle removal of first node
if head.key == key {
let removedValue = head.value
buckets[index] = head.next
size -= 1
return removedValue
}
// Search through chain
var current = head
while let next = current.next {
if next.key == key {
let removedValue = next.value
current.next = next.next
size -= 1
return removedValue
}
current = next
}
return nil
}
// Check if key exists
public func contains(_ key: Key) -> Bool {
return getValue(for: key) != nil
}
}
Subscript support
Make our hash table feel like a native Swift collection:
// Subscript support for Dictionary-like syntax - O(1) average
extension HashTable {
public subscript(key: Key) -> Value? {
get {
return getValue(for: key)
}
set {
if let value = newValue {
insert(key, value: value)
} else {
remove(key)
}
}
}
}
Performance analysis
Hash tables offer excellent performance characteristics when properly implemented, as analyzed using Chapter 8 principles:
Time complexity
- Average Case:
O(1)
for insert, search, delete - Worst Case:
O(n)
when all keys hash to same bucket - Space:
O(n)
where n is the number of elements
Load factor impact
Load Factor | Average Chain Length | Performance |
---|---|---|
0.25 | 0.25 | Excellent (fast, memory wasteful) |
0.50 | 0.50 | Very Good (balanced) |
0.75 | 0.75 | Good (our resize threshold) |
1.00 | 1.00 | Fair (getting slower) |
2.00 | 2.00 | Poor (approaching O(n)) |
Comparison with other data structures
Operation | Array | Linked List (Ch 9) | BST (Ch 11) | Hash Table |
---|---|---|---|---|
Search | O(n) | O(n) | O(log n) | O(1) avg |
Insert | O(n) | O(1) | O(log n) | O(1) avg |
Delete | O(n) | O(n) | O(log n) | O(1) avg |
Ordered Traversal | ✓ | ✗ | ✓ | ✗ |
Memory Overhead | Low | Medium | Medium | Medium-High |
Hash tables in iOS development
Every iOS app uses hash tables. Data caching provides one of the most common use cases. Rather than repeatedly formatting the same distance value, cache the result in a Dictionary:
// Cache formatted strings to avoid repeated work
var formattedDistances = [Double: String]()
if let cached = formattedDistances[5.2] {
return cached // O(1) lookup
} else {
let formatted = formatDistance(5.2)
formattedDistances[5.2] = formatted // O(1) store
return formatted
}
Session management in web applications relies on hash tables to map session IDs to user data. With one billion users, hash tables still deliver O(1)
lookup performance. Workout aggregation benefits from hash tables too. Grouping workouts by month becomes trivial:
// Group workouts by month using hash table
var workoutsByMonth = [String: [Workout]]()
workoutsByMonth["2024-01"] = [...] // Instant grouping
Why does Dictionary beat arrays for lookups? With arrays, finding a workout by name requires checking every element—O(n)
time. With Dictionary, hashing directly to the workout takes O(1)
time. For 10 workouts, the difference doesn’t matter. For 10,000 workouts, Dictionary wins by 10,000× in the worst case.
Building algorithmic intuition
Hash tables achieve O(1)
average-case performance through clever use of hashing and array indexing. Converting keys to array indices enables direct access, eliminating the need for comparisons. Collision handling reveals different strategies for managing conflicts: separate chaining uses linked lists for simplicity and flexibility, while open addressing probes for alternative locations, optimizing cache performance.
Hash tables complement other data structures throughout this book. Binary search trees (Chapter 11) maintain order but require O(log n)
operations, tries (Chapter 14) excel at prefix searches, and hash tables sacrifice ordering for O(1)
lookups. Recognizing when constant-time access matters more than ordering determines when hash tables provide the optimal solution.