Contact
Offices

PostgreSQL 10 Bits - WAL-Logged Hashindex

news rss

Julian Schauder

Next to the indextypes like B-tree and BRIN, PostgreSQL supports hashindexes now for a long time. This index however was usually not recommended, as a lack of REDO-information in our transactionlog (Write Ahead Log, WAL) prevented replication and crash safety. A crash of the service could hence cause the necessity to rebuild via REINDEX to prevent inconsistencies and errors. PostgreSQL-standby systems were unable to use this index, as it did not get passed via WAL.

This is about to change in PostgreSQL10 due to the 'write-ahead logging support to hash indexe'-Patch. This Patch and multiple others provide major changes and add the missing functionality for hash indexes to the PostgreSQL 10 featurerelease. Perfect time to look at hashindexes in general and see what they can be used for.

Theory

In theory hashindexes are quite simple. It is assumed that the hashed value of a key defines the exact lookup-position within the index structure. This provides the heap address with a single index-step. In other words, a runtime of O(1) is possible which ideally provides constant accesstimes, regardless of size. In contrast, a sorted index like the B-tree needs to traverse multiple index pages. The resulting runtime of O(log(n)) implies multiple lookup and comparison cycles to reach the target.

PostgreSQL implements hashindexes as described in the paper A new Hashing Package for UNIX by Selzer und Yigit. This implementation in particular utilizes metapages, targetbuckets and overflowbuckets to enable a dynamic indexgrowth while maintaining the typical runtime characteristics.

Runtime

In practice this mostly results in less computation. Hence the hashindex should be quicker than a B-tree when hashing, a value becomes cheaper than the B-trees many comparision operations that lead to the target. It is required to differenciate the comparison operation though, as comparing integers is a lot cheaper than alphanumerical comparisons of variable length strings as we'll demonstrate below.

EXPLAIN ANALYZE SELECT id FROM generate_series(1,100000)id ORDER BY 1 DESC Time: 0.3s
EXPLAIN ANALYZE SELECT id::text FROM generate_series(1,100000)id ORDER BY 1 DESC Time: 2.4s

If we transfer this behavior of costs to indexlookups, the following should emerge:

The described benchmark displays the combined TPS of four concurrent connections. Each of which selects a random tuple from the table, using text as primary key. The testbatch 'similar' utilizes keys where the first letters of the key are identical. 'Random' in contrast uses completely random strings. For this test the data is completely cached to prevent unwanted artifacts in the benchmark. Our goal is to measure the required computations per lookup. The keys are defined as functions within the queries. MD5 for instance is a heavyweight, the two batches themselves should slightly differ and no direct comparison should be done.

CREATE TABLE test(key_1 text, key_2 text);
CREATE INDEX ON test USING HASH(key_1);
CREATE INDEX on test(key_2);
 
Sorted:
INSERT INTO test SELECT repeat('x', :width )||id , repeat('x', :width )|| id FROM generate_series(1, :scale ) id;
 
Random:
INSERT INTO test SELECT substr(md5(id::text)||repeat('x', :width -32 ), 0, :width ) , substr(md5(id::text) || repeat('x', :width -32),0, :width ) FROM generate_series(1, :scale ) id;

It is observable that the CPU-costs in a B-tree scale with the length of the Key. As a B-tree compares from left to right, clear regression becomes visible for strings that remain similar for a long period. This seems unlikely in general but it's rather typical for URIs.

PageFetches

The hashindexes runtime relies on the assertion that a target-tuple could be reached by checking the calculatd indexpage. The additional metapage and overflowpage exists aswell, but the metapage is usually cached and overflowpages statistically not.

This leads to very acceptable runtimes even with slow random-reads. This is most effective for barely cached indexes which are rarely used or simply too large for the available RAM.

This benchmark shows the pagefetches with a growing amount of tuples. It becomes clear that the hashindex requests need significantly less pages for a lookup than the B-tree. Assuming an average duration of 8ms per IO-Request as it is expected for a 7200rpm HDD a difference of four pages would result in roughly 32ms runtime difference per query. Additionally, the random reads of a system are limited by the hardware, hence hashindexes could provide better concurrency.

Size

Hashindexes use a fixed key length. This allows a static index size that only correlates with the amount of tuples. Although, for integers, a hashindex is approximately 40% bigger than B-tree, variable length types may differ substancially.

Upgrading from previous PostgreSQL Versions with pg_upgrade

Due to many changes, the physical representation of hashindexes changed a lot between previous versions and PostgreSQL 10. If an upgrade to PostgreSQL10 via pg_upgrade was chosen, Hashindexes need to be rebuilt via REINDEX. Pg_Upgrade will invalidate Hashindexes prior to the REINDEX due to incompatibility. After an upgrade pg_upgrade will inform the user about the necessity of a rebuilt.

Your installation contains hash indexes. These indexes have different internal formats between your old and new clusters, so they must be reindexed with the REINDEX command. After upgrading, you will be given REINDEX instructions.

Your installation contains hash indexes.  These indexes have different
internal formats between your old and new clusters, so they must be
reindexed with the REINDEX command.  After upgrading, you will be given
REINDEX instructions.

After pg_upgrade finished, a SQL-Script called reindex_hash.sql appeared in the working-directory. This file contains the necessary REINDEX-commands for the rebuilt.

Conclusion:

Hashindexes are worth a try if expensive operators or big datasets define the index. The keys should be random because hashindex does not support rangescans (<, >, >= and <=). As no key-grouping will occur cachaffinity for similar values is not likely. If this assumptions can be met, a hashindex might significantly improve performance.

Blog Categories: 

Add new comment

Image CAPTCHA