Hash Tables

Hash tables solve a fundamental problem in computer science: how do we store and retrieve data in O(1) constant time? The answer lies in using a hash function to transform keys into array indices, allowing direct access to stored values.

Keys and values

Compared to other data structures, linked lists from Chapter 9 are flexible but require O(n) search time due to sequential traversal, and arrays offer fast indexed access but require knowing the exact position. Hash tables combine the best of both — fast O(1) average operations with flexible key-based access. A well-designed hash table achieves constant time for insertion, deletion, and lookup, making it one of the most efficient data structures available.

iOS apps use hash tables extensively. UserDefaults stores app preferences as key-value pairs for instant lookup. HTTP headers in network requests use hash table lookups. Workout tracking apps retrieve today’s stats by date using hash calculations instead of iterating through all entries.

How hashing works

A hash table consists of two essential components: the hash function, which transforms keys into array indices, and the bucket array, which stores the actual values. The hash function is the heart of any hash table. It takes an input key and produces a numeric hash value that determines where to store the data.

Values are stored in non-contiguous slots called buckets within an array. The hash function determines which bucket to use, enabling direct access without searching through the entire structure. Unlike a dictionary, the computed index is not stored alongside the data — it can be recalculated at any time from the key itself.

The Indexable protocol

An interesting question arises when building a generic hash table. Since elements are arranged based on their hash index, how can our algorithm learn to interpret different data types? To manage these requirements, the Structures package defines a protocol called Indexable:

// Requires conforming types to provide a numeric hash representation
public protocol Indexable: Hashable {
    var asciiRepresentation: Int {get}
}

The Indexable protocol extends Hashable and requires a single computed property — asciiRepresentation — that converts the type’s content into an integer suitable for hash table indexing. Here’s how String conforms to this requirement:

// Computes a hash value by summing unicode scalar values
extension String: Indexable {
    public var asciiRepresentation: Int {
        var divisor: Int = 0
        for item in self.unicodeScalars {
            divisor += Int(item.value)
        }
        return divisor
    }
}

This approach converts each character to its unicode scalar value and sums them together. The string “Swift” produces a sum of its character values (83 + 119 + 105 + 102 + 116 = 525). This numeric representation becomes the input to our hash function.

Educational example

Our asciiRepresentation is an educational hash function designed to illustrate the core mechanics of hashing. Production systems use far more sophisticated algorithms — MD5, SHA-256, and SipHash (which Swift’s standard library uses internally) all employ advanced techniques to minimize collisions. These algorithms are beyond the scope of this book, but the fundamental principle remains the same: transform input into a numeric index.

Building a basic hash set

With the Indexable protocol in place, we can build our first hash table. HashSet is the simplest implementation — it stores elements directly in buckets with no collision handling. When two elements hash to the same index, insertion simply fails:

// Basic hash table storing unique elements without collision handling
public class HashSet<T: Indexable> {

    // Array of buckets for storing elements
    private var buckets: Array<T?>

    // Number of remaining empty slots in the hash table
    private var slots: Int = 0

    // Creates a new empty hash set with the specified initial capacity
    public init(capacity: Int = 20) {
        self.buckets = Array<T?>(repeatElement(nil, count: capacity))
        self.slots = buckets.capacity
    }
}

The buckets array holds optional elements — each position is either nil (empty) or contains a single value. The slots counter tracks remaining capacity to know when the table needs more space.

The hash function

The hash function is where mathematics replaces searching. It takes an element’s asciiRepresentation and uses the modulo operator to wrap it into a valid bucket index:

// Computes the hash index for an element using modulo-based hashing
private func hash(_ element: T) -> Int {
    var remainder: Int = 0
    remainder = element.asciiRepresentation % buckets.count
    return remainder
}

The modulo operation ensures the result falls within 0..<buckets.count, regardless of how large the ascii representation is. The string “Swift” with an ascii sum of 525 placed into a 20-bucket table would hash to index 525 % 20 = 5.

Inserting elements

With the hash function defined, insertion computes the hash and places the element directly into the corresponding bucket:

// Inserts an element if its hash index is unoccupied
public func insert(_ element: T) -> Bool {

    //compute hash value
    let hvalue = self.hash(element)

    if buckets[hvalue] == nil {
        buckets[hvalue] = element
        slots -= 1

        //determine if more slots are needed
        if slots == 1 {
            buckets.append(nil)
            slots = 1
        }

        return true
    }

    else {
        //separate chaining..
    }

    return false
}

When the bucket is empty, the element is stored and slots is decremented. If only one slot remains, a new bucket is appended to maintain capacity. The else branch is where collision handling would go — but in HashSet, the method simply returns false. The comment hints at the solution we’ll implement next.

Finding elements

Looking up an element follows the same pattern — compute the hash and check the corresponding bucket:

// Checks whether an element exists in the hash set
public func contains(_ element: T) -> Bool {

    //compute hash value
    let hvalue = self.hash(element)

    guard buckets[hvalue] != nil else {
        return false
    }

    return true
}

This is why hash tables achieve O(1) lookup. Instead of iterating through every element like a linked list, we compute a single hash value and jump directly to the answer.

Discovering collisions

The creation of hash algorithms is considered more art than science. With HashSet, our straightforward modulo-based hash function works well until two different inputs produce the same index. Consider what happens when we insert the names “Albert Einstein” and “Andrew Collins” into a table with 20 buckets.

Computing the ascii representation for “Albert Einstein” sums the unicode scalar values of each character to produce 1451. The modulo operation yields 1451 % 20 = 11. Computing the same for “Andrew Collins” produces a different sum, but % 20 also yields 11. Both names hash to the same bucket — a collision. Since HashSet stores only one element per bucket, the second insertion fails and returns false.

This limitation is not rare. As more elements fill the table, collisions become increasingly likely. We need a strategy that allows multiple elements to share the same bucket index.

Separate chaining

To handle collisions, we introduce a supporting data structure: a singly-linked list called Chain. Each bucket will hold a chain instead of a single element, allowing multiple values at the same index:

// Singly-linked list for hash table collision resolution
public class Chain<T: Equatable> {

    // Reference to the first node in the chain
    private var head = LLNode<T>()

    // Cached value of the last element for O(1) tail access
    private var lastvalue: T?

    // Creates a new empty chain
    public init() {
        //package support
    }

    // Returns the last value in the chain without traversing
    var last: T? {
        return lastvalue
    }

    // Returns all values stored in the chain as an array
    public var values: Array<T> {
        var current: LLNode? = head
        var results = Array<T>()

        while let item = current {
            if let tvalue = item.tvalue {
                results.append(tvalue)
            }
            current = item.next
        }

        return results
    }
}

The Chain class uses LLNode from Chapter 9 to build a forward-linked list. Each node holds a value and a reference to the next node in the chain. The values property traverses the entire chain and collects all stored elements into an array.

Adding and finding chain elements

The chain supports two essential operations — appending new values and checking whether a value exists:

// Appends a new value to the end of the chain
public func append(_ tvalue: T) {

    guard head.tvalue != nil else {
        head.tvalue = tvalue
        lastvalue = tvalue
        return
    }

    let childToUse = LLNode<T>()
    childToUse.tvalue = tvalue

    var current: LLNode<T> = head

    //find next position - O(n)
    while let item = current.next {
        current = item
    }

    childToUse.previous = current
    current.next = childToUse

    lastvalue = tvalue
}

// Checks whether a specific value exists in the chain
public func contains(_ tvalue: T) -> Bool {

    var current: LLNode<T>? = head

    //find possible match - O(n)
    while current != nil {
        if let item = current {
            if let chainValue = item.tvalue {
                if chainValue == tvalue {
                    return true
                }
            }
            current = item.next
        }
    }

    return false
}

The append method traverses to the end of the chain and links a new node. The contains method walks the chain comparing each value until it finds a match or exhausts all nodes. Both operations are O(k) where k is the chain length — but since well-distributed hash functions keep chains short, this remains fast in practice.

Solving collisions with HashChain

With the Chain data structure in place, we can build HashChain — a hash table that handles collisions through separate chaining. The structure mirrors HashSet, but each bucket now holds an optional Chain instead of a single element:

// Hash table with collision resolution via separate chaining
public class HashChain<T: Indexable> {

    // Number of remaining empty buckets in the hash table
    private var slots: Int = 0

    // Array of bucket chains for storing colliding elements
    var buckets: Array<Chain<T>?>

    // Creates a new empty hash table with separate chaining
    public init(capacity: Int = 20) {
        self.buckets = Array<Chain<T>?>(repeatElement(nil, count: capacity))
        self.slots = buckets.capacity
    }
}

Inserting with collision handling

The insert method now handles both cases — empty buckets and collisions:

// Inserts an element, handling collisions via separate chaining
public func insert(_ element: T) {

    //compute hash value
    let hvalue = self.hash(element)

    if buckets[hvalue] == nil {

        //new chain
        let chain = Chain<T>()
        chain.append(element)

        buckets[hvalue] = chain
        slots -= 1

        if slots == 1 {
            buckets.append(nil)
            slots = 1
        }

    }
    else {
        print("collision detected!")

        //use existing chain
        if let chain = buckets[hvalue] {
            if chain.contains(element) == false {
                chain.append(element)
            }
        }

    }

}

When a bucket is empty, the method creates a new Chain, appends the element, and stores the chain at the computed index. When a collision occurs — the bucket already has a chain — the element is appended to the existing chain after checking for duplicates. Unlike HashSet, no element is rejected. Both “Albert Einstein” and “Andrew Collins” now coexist at bucket 11, linked together in a chain.

Collision-aware lookup

The contains method delegates to the chain’s own contains method for collision-aware searching:

// Checks whether an element exists in the hash table
public func contains(_ element: T) -> Bool {

    //compute hash value
    let hvalue = self.hash(element)

    guard let chain = buckets[hvalue] else {
        return false
    }

    return chain.contains(element)
}

If the bucket is empty, the element cannot exist and we return false immediately. If a chain exists at the computed index, we search through it for the target value. The hash function narrows the search to a single bucket, and the chain handles the rest.

The hash function

HashChain uses the same hash function as HashSet — the only difference is how collisions are handled after the hash is computed:

// Computes the hash index for an element using modulo-based hashing
private func hash(_ element: T) -> Int {
    var remainder: Int = 0
    remainder = element.asciiRepresentation % buckets.count
    return remainder
}

Performance analysis

Hash tables offer excellent performance characteristics when properly implemented, as analyzed using Chapter 8 principles. In the average case, insert, search, and delete all run in O(1) time. The worst case degrades to O(n) when all keys hash to the same bucket, forming a single long chain. Space complexity is O(n) where n is the number of stored elements.

The ratio of stored elements to bucket count is called the load factor. A load factor below 0.75 generally maintains O(1) performance. As the load factor increases, chains grow longer and performance approaches O(n). Production hash tables like Swift’s Dictionary automatically resize when the load factor exceeds a threshold, rehashing all elements into a larger bucket array to restore short chains.

Comparison with other data structures

Operation Array Linked List (Ch 9) BST (Ch 11) Hash Table
Search O(n) O(n) O(log n) O(1) avg
Insert O(n) O(1) O(log n) O(1) avg
Delete O(n) O(n) O(log n) O(1) avg
Ordered Traversal Yes No Yes No
Memory Overhead Low Medium Medium Medium-High

Hash tables in iOS development

Every iOS app uses hash tables. Swift’s Dictionary is a hash table, providing the same O(1) key-value operations we built in this chapter. Data caching provides one of the most common use cases — rather than repeatedly formatting the same distance value, cache the result:

// Cache formatted strings to avoid repeated computation
var formattedDistances = [Double: String]()

if let cached = formattedDistances[5.2] {
    return cached  // O(1) lookup
} else {
    let formatted = formatDistance(5.2)
    formattedDistances[5.2] = formatted  // O(1) store
    return formatted
}

Grouping data by category is another natural fit. Workout aggregation by month becomes trivial with hash table semantics:

// Group workouts by month using hash table
var workoutsByMonth = [String: [Workout]]()
workoutsByMonth["2024-01"] = [...]  // Instant grouping

With arrays, finding a workout by name requires checking every element — O(n) time. With Dictionary, hashing directly to the value takes O(1) time. For 10 workouts, the difference is negligible. For 10,000 workouts, Dictionary wins by orders of magnitude.

Building algorithmic intuition

Hash tables achieve O(1) average-case performance through clever use of hashing and array indexing. The progression from HashSet to HashChain reveals a fundamental design pattern — start with the simplest possible solution, identify its limitations through concrete examples, then layer on complexity only where needed. Collision handling through separate chaining demonstrates how combining data structures (arrays and linked lists) creates solutions more powerful than either alone.

Hash tables complement other data structures throughout this book. Binary search trees from Chapter 11 maintain sorted order but require O(log n) operations, tries from Chapter 14 excel at prefix searches, and hash tables sacrifice ordering for O(1) lookups. Recognizing when constant-time access matters more than ordering is a practical skill that extends to shortest path algorithms in Chapter 16 and dynamic programming in Chapter 18.