Indexing in PostgreSQL vs MySQL

I explain how PostgreSQL and MySQL store their indexes and their effect on reads vs. writes.

Table

Data in the database is organized using tables. A database table is a data structure containing columns and rows. A column is a data unit having a specific data type (number, text, etc.). A table can have multiple columns. A row is a data record with filled-in values in each column.

/posts/db/indexing_in_postgresql/table.jpg

Index

An index is a separate data structure that holds the field the index is sorting and a pointer from each record to their corresponding record in the original table where the data is stored.

/posts/db/indexing_in_postgresql/index.png

PostgreSQL includes built-in support for regular B-tree (not to be confused with a Binary tree or B+-tree.) and hash table indexes.

B-tree visualization

PostgreSQL vs MySQL InnoDB index

Indexes in PostgreSQL all point to the table directly.

MySQL default storage engine InnoDB works differently than PostgreSQL. Although the previous default MySQL storage engine MyISAM worked similar to PostgreSQL.

InnoDB always uses a primary key index. Even if you are not creating the primary key index, MySQL InnoDB still makes it for you. As with PostgreSQL, the primary key index points to the table, BUT any secondary index you create does NOT point to the table directly. Instead, it points to the primary key index value, which corresponds to the table. The secondary index has multiple hops.

So what are the consequences of such design differences?

Update or Delete

PostgreSQL - such change triggers all index updates. Slower.

MySQL InnoDB engine - updates primary key (deletion key). Updates specific index if a value in that index column has been updated. We must avoid the primary key update because this triggers all index updates.

Read

PostgreSQL - read extremely fast because directly jumping from index to value.

MySQL InnoDB engine - doing multiple hoops in case searching, but non primary index. Have to jump to the primary key index and then get the row. Slower. Therefore it is critical to decide which key is a primary key. This will improve performance significantly.