The Iterator Pattern in Python: Beyond Basic Loops

The Iterator pattern provides a way to access elements of a collection sequentially without exposing its underlying structure. Python has this pattern baked into the language itself. Every time you use a for loop, you’re using iterators. But understanding how to build custom iterators unlocks powerful patterns for memory-efficient data processing.

Python’s Iterator Protocol

At its core, Python’s iterator protocol requires two methods:

__iter__(): Returns the iterator object itself
__next__(): Returns the next value, or raises StopIteration when exhausted

Any object implementing these methods can be used in a for loop, with list(), or anywhere an iterable is expected.

Example 1: A Custom Linked List

Let’s implement a linked list with proper iterator support:

class Node:
    def __init__(self, data):
        self.data = data
        self.next = None


class LinkedList:
    def __init__(self):
        self.head = None

    def append(self, data):
        """Add a node to the end of the list."""
        new_node = Node(data)
        if not self.head:
            self.head = new_node
        else:
            current = self.head
            while current.next:
                current = current.next
            current.next = new_node

    def __iter__(self):
        """Return an iterator for the linked list."""
        return LinkedListIterator(self.head)


class LinkedListIterator:
    """Separate iterator class to allow multiple simultaneous iterations."""

    def __init__(self, start_node):
        self.current = start_node

    def __iter__(self):
        return self

    def __next__(self):
        if self.current is None:
            raise StopIteration
        data = self.current.data
        self.current = self.current.next
        return data

Usage

linked_list = LinkedList()
linked_list.append(1)
linked_list.append(2)
linked_list.append(3)

for value in linked_list:
    print(value)  # 1, 2, 3

# Can also use with list(), sum(), etc.
print(list(linked_list))  # [1, 2, 3]

The key insight here is that LinkedListIterator is a separate class. This allows multiple iterations over the same list simultaneously, which is important for nested loops or concurrent access.

Example 2: A Chunked CSV Reader

Here’s a more practical example: reading large CSV files in chunks to avoid loading everything into memory.

import csv
from typing import Optional


class CsvReader:
    """
    A CSV reader that supports chunked iteration for memory-efficient
    processing of large files.
    """

    def __init__(
        self,
        file_path: str,
        chunk_size: Optional[int] = None,
        skip_rows: int = 0
    ):
        self.file_path = file_path
        self.chunk_size = chunk_size
        self.skip_rows = skip_rows

    def __enter__(self):
        self.file = open(self.file_path, "r")
        self.reader = csv.reader(self.file)
        self.headers = next(self.reader, None)  # Store headers

        # Skip requested rows
        for _ in range(self.skip_rows):
            next(self.reader, None)
        return self

    def __exit__(self, exc_type, exc_val, exc_tb):
        self.file.close()

    def __iter__(self):
        if self.chunk_size is not None:
            return self  # Iterate in chunks
        else:
            return iter(self.read_all())  # Return all at once

    def __next__(self):
        """Return the next chunk of rows."""
        if self.chunk_size is None:
            raise ValueError("Use read_all() when chunk_size is None")

        chunk = []
        try:
            for _ in range(self.chunk_size):
                chunk.append(next(self.reader))
        except StopIteration:
            if chunk:
                return chunk  # Return remaining rows
            raise
        return chunk

    def read_all(self):
        """Read entire CSV as a list (only when not chunking)."""
        if self.chunk_size is not None:
            raise ValueError("Cannot use read_all with chunk_size set")
        return list(self.reader)

Usage

# Process a large file in 5000-row chunks
with CsvReader("large_data.csv", chunk_size=5000) as reader:
    for chunk in reader:
        print(f"Processing {len(chunk)} rows...")
        # Process chunk without loading entire file

# Skip to near the end of a file
with CsvReader("large_data.csv", chunk_size=1000, skip_rows=9500) as reader:
    for chunk in reader:
        print(f"Found {len(chunk)} rows")

# Read entire small file at once
with CsvReader("small_data.csv") as reader:
    all_data = reader.read_all()
    print(f"Total rows: {len(all_data)}")

This pattern is essential for data engineering work where files can be gigabytes in size. By iterating in chunks, you maintain constant memory usage regardless of file size.

Why Use Custom Iterators?

Memory Efficiency

Instead of loading everything into memory, iterators process one element (or chunk) at a time:

# Bad: Loads entire file into memory
data = open("huge_file.txt").readlines()
for line in data:
    process(line)

# Good: Processes line by line
with open("huge_file.txt") as f:
    for line in f:  # File objects are iterators!
        process(line)

Lazy Evaluation

Iterators only compute values when requested:

# This doesn't compute anything yet
squares = (x**2 for x in range(1_000_000))

# Only computes when we iterate
for square in squares:
    if square > 100:
        break  # Stops early, didn't compute all million squares

Separation of Concerns

The collection doesn’t need to know how it will be traversed:

class Database:
    def __iter__(self):
        return DatabaseIterator(self.connection)

class DatabaseIterator:
    def __next__(self):
        # Handles pagination, connection management, etc.
        pass

Iterator vs Iterable

A common source of confusion:

Iterable: Has __iter__() that returns an iterator (e.g., list, dict)
Iterator: Has both __iter__() and __next__() (e.g., file objects, generators)

my_list = [1, 2, 3]           # Iterable
my_iter = iter(my_list)       # Iterator

next(my_iter)  # 1
next(my_iter)  # 2

Generators: The Easy Way

For simple cases, Python’s generators provide iterator functionality with minimal code:

def chunked_reader(file_path, chunk_size):
    """Generator function that yields chunks of lines."""
    with open(file_path) as f:
        chunk = []
        for line in f:
            chunk.append(line)
            if len(chunk) >= chunk_size:
                yield chunk
                chunk = []
        if chunk:
            yield chunk


# Usage is identical to a class-based iterator
for chunk in chunked_reader("data.txt", 100):
    process(chunk)

Generators are often the right choice for simple iteration patterns. Use class-based iterators when you need:

Multiple independent iterations
Complex state management
Methods beyond just iteration

The Iterator Pattern in Python: Beyond Basic Loops

Python’s Iterator Protocol

Example 1: A Custom Linked List

Usage

Example 2: A Chunked CSV Reader

Usage

Why Use Custom Iterators?

Memory Efficiency

Lazy Evaluation

Separation of Concerns

Iterator vs Iterable

Generators: The Easy Way

Further Reading