How indexing works internally in database
The main purpose of indexing is to dramatically speed up data retrieval (queries). While it makes queries faster, indexing almost always makes data modification (updates, inserts, and deletes) slower.
Here is how indexing works internally and why it has this trade-off:
1. How Indexing Works Internally (The Library Analogy) 📚
An index works like the card catalog in a library, while the actual table data is the books themselves.
The Problem Without an Index (Sequential Scan)
Imagine you want to find a book by a specific author in a library with millions of books, but there's no catalog.
- You would have to walk up and down every aisle, opening every single book's cover to check the author's name.
- In database terms, this is a Sequential Scan—the server reads every row on the disk until it finds the one it needs. This is very slow for large tables.
The Solution With a B-Tree Index (Index Scan)
A B-Tree (the standard index type) organizes the indexed column's values in a structured, hierarchical tree:
- Creation: When you create an index on the
last_namecolumn, PostgreSQL builds a separate, highly organized data structure (the index). - Search: When you query
WHERE last_name = 'Smith', the database starts at the top of the B-Tree and follows a path (like a digital "road map") straight to the specific location of the 'Smith' data. - Result: The index gives the database the exact physical address (the disk location) of the data row. The server skips searching millions of pages and goes directly to the one it needs. This is called an Index Scan and is incredibly fast.
2. The Indexing Trade-Off: Speed vs. Cost
The internal structure of the index explains the trade-off between read speed and write cost:
A. Queries are Faster (Reads) ✅
- Benefit: The index structure is optimized solely for searching. It allows the database to instantly locate a small subset of data, making
SELECTqueries much faster.
B. Updates are Slower (Writes) ❌
- Cost: When you perform an
INSERT,UPDATE, orDELETE, the database doesn't just change the data in the main table; it must also update the index(es) associated with that table.- INSERT: The database has to insert the new row into the main table and insert the new key into the index structure.
- UPDATE: If you update a column that is indexed, the database has to delete the old key from the index and insert a new key into the index.
- DELETE: The database has to delete the row from the main table and remove the key from the index.
Since the database must maintain two synchronized data structures (the table and the index), any write operation takes more time and resources.
Conclusion
You should only create an index if the gain in read speed outweighs the cost in write speed.
| Operation | Index Effect |
|---|---|
| SELECT (Query) | Faster |
| INSERT, UPDATE, DELETE | Slower |
In most real-world applications, you have many more reads than writes, so using indexes is a net positive for overall application performance.