Tuning Snowflake Performance Using the Query Cache
In
terms of performance tuning in Snowflake, there are very few options
available. However, it is worth understanding how the Snowflake
architecture includes various levels of caching to help speed your
queries. This article provides an overview of the techniques used, and
some best practice tips on how to maximise system performance using
caching.
Snowflake Database Architecture
Before
starting it’s worth considering the underlying Snowflake architecture,
and explaining when Snowflake caches data. The diagram below illustrates
the overall architecture which consists of three layers:-
- Service Layer: Which accepts SQL requests from users, coordinates queries, managing transactions and results. Logically, this can be assumed to hold the result cache – a cached copy of the results of every query executed.
- Compute Layer: Which actually does the heavy lifting. This is where the actual SQL is executed across the nodes of a Virtual Data Warehouse. This layer holds a cache of data queried, and is often referred to as Local Disk I/O although in reality this is implemented using SSD storage. All data in the compute layer is temporary, and only held as long as the virtual warehouse is active.
- Storage Layer: Which provides long term storage of results. This is often referred to as Remote Disk, and is currently implemented on either Amazon S3 or Microsoft Blob storage.
Snowflake Cache Layers
The diagram below illustrates the levels at which data and results are cached for subsequent use. These are:-
- Result Cache: Which holds the results of every query executed in the past 24 hours. These are available across virtual warehouses, so query results returned to one user is available to any other user on the system who executes the same query, provided the underlying data has not changed.
- Local Disk Cache: Which is used to cache data used by SQL queries. Whenever data is needed for a given query it's retrieved from the Remote Disk storage, and cached in SSD and memory.
- Remote Disk: Which holds the long term storage. This level is responsible for data resilience, which in the case of Amazon Web Services, means 99.999999999% durability. Even in the event of an entire data centre failure.
Comments
Post a Comment