ablog

不器用で落着きのない技術者のメモ

Amazon Redshift の ra3 インスタンスのローカルSSDのキャッシュについて

Amazon Redshift の ra3 インスタンスのローカルSSDのキャッシュについて


Compute nodes use large, high performance SSDs as local caches. Redshift leverages workload patterns and techniques such as automatic fine-grained data eviction and intelligent data prefetching, to deliver the perfor-mance of local SSD while scaling storage automatically to Amazon S3.

Figure 5 shows the key components of RMS extending from in-memory caches to committed data on Amazon S3. Snapshots of data on Amazon S3 act as logical restore points for the customer. Redshift supports both the restore of a complete cluster, as well as of specific tables, from any available restore point. Amazon S3 is also the data conduit and source of truth for data sharing and machine learning. RMS accelerates data accesses from S3 by using a prefetching scheme that pulls data blocks into memory and caches them to local SSDs. RMS tunes cache replacement to keep the relevant blocks locally available by tracking accesses to every block. This information is also used to help customers decide if scaling up their cluster would be beneficial. RMS makes in-place cluster resizing a pure metadata operation since compute nodes are practically stateless and always have access to the data blocks in RMS. RMS is metadata bound and easy to scale since data can be ingested directly into Amazon S3. The tiered nature of RMS where SSDs act as cache makes swapping out of hardware convenient.
RMS-supported Redshift RA3 instances provide up to 16PBs of capacity today. The in-memory disk-cache size can be dynamically changed for balancing performance and memory needs of queries.

Amazon Redshift re-invented - Amazon Science

Redshift の CW メトリクスの Write Latency

Write latency – Shows the average amount of time in milliseconds taken for disk write I/O operations. You can evaluate the time for the write acknowledgment to return. When latency is high, it means that the sender spends more time idle (not sending any new packets), which reduces how fast throughput grows.

Viewing cluster performance data - Amazon Redshift

Amazon Redshift でノード・スライスごとのブロック数を確認する

Amazon Redshift でノード・スライスごとのブロック数を確認する。

select b.node, a.slice, sum(blocknum)
from svv_diskusage a, stv_slices b
where a.slice = b.slice
group by b.node, a.slice
order by b.node, a.slice, sum(blocknum) desc;

node|slice|sum
0|0|2821012504
0|1|2885748816
0|2|2830922313
0|3|2967025944
1|4|5211643368
1|5|2820034230
1|6|2844777960
1|7|2769496495
2|8|2811089919
2|9|2843159619
2|10|2835198368
2|11|2783326269
3|12|2799212712
3|13|2765290236
3|14|2870821537
3|15|2874467798
(16 rows)

Amazon Redshift の redistribute と broadcast

Amazon Redshift の redistribute と broadcast の違いについてのメモ。

  • redistribute と broadcast(A copy of the entire table is broadcast to all the compute nodes) がある。

DS_DIST_OUTER
The outer table is redistributed.

DS_BCAST_INNER
A copy of the entire inner table is broadcast to all the compute nodes.

DS_DIST_ALL_INNER
The entire inner table is redistributed to a single slice because the outer table uses DISTSTYLE ALL.

DS_DIST_BOTH
Both tables are redistributed.

Evaluating the query plan - Amazon Redshift
  • broadcast はテーブル全体または行セットを全ノードに転送すること
  • distribute は行を他ノードに転送すること
Query plan operation Label field value Description
DS_BCAST_INNER BCAST (broadcast) Broadcasts an entire table or some set of rows (such as a filtered set of rows from a table) to all nodes.
DS_DIST_NONE
DS_DIST_ALL_NONE
DS_DIST_INNER
DS_DIST_ALL_INNER
DS_DIST_ALL_BOTH
DIST (distribute) Distributes rows to nodes for parallel joining purposes or other parallel processing.
クエリの概要へのクエリプランのマッピング - Amazon Redshift
  • 大きなファクトテーブルと小さいなディメンションテーブルを結合する際、ファクトテーブルの結合対象の行数分のディメンションテーブルレコードを転送するとファクトテーブルの結合対象行数の回数だけディメンションテーブルの対象行を転送すると効率が悪いため、ディメンションテーブル側の全行を転送する(broadcast)ということか。

ALL distribution replicates the entire table across all nodes. Each node holds a full copy of the table, eliminating data movement during query execution.

Use Case: This style is best suited for small, frequently accessed tables, such as lookup tables. Typical scenarios include:

  • > Small dimension tables joined with large fact tables.
  • > Queries requiring broadcast joins to avoid redistribution costs.
Understanding Amazon Redshift Distribution Styles and Internal Architecture
  • Merge join は結合キーが分散キーかつ結合キーの場合のみのため、distribute や broadcast は発生しないと考えられる。
  • Nested Loop
    • The least optimal join, a nested loop is used mainly for cross-joins (Cartesian products) and some inequality joins.
  • Hash Join and Hash
    • Typically faster than a nested loop join, a hash join and hash are used for inner joins and left and right outer joins. These operators are used when joining tables where the join columns are not both distribution keys and sort keys. The hash operator creates the hash table for the inner table in the join; the hash join operator reads the outer table, hashes the joining column, and finds matches in the inner hash table.
  • Merge Join
    • Typically the fastest join, a merge join is used for inner joins and outer joins. The merge join is not used for full joins. This operator is used when joining tables where the join columns are both distribution keys and sort keys, and when less than 20 percent of the joining tables are unsorted. It reads two sorted tables in order and finds the matching rows. To view the percent of unsorted rows, query the SVV_TABLE_INFO system table.
  • Spatial Join
    • Typically a fast join based on proximity of spatial data, used for GEOMETRY and GEOGRAPHY data types.
Creating and interpreting a query plan - Amazon Redshift

Redistribution of both tables

In my example database, all of the tables are distributed on userid. But what if they weren’t? In particular, what if they had the EVEN distribution style (which is Redshift’s default)? In that case, you’d see a query plan that looks like this:

XN Limit
   ->  XN HashAggregate
       ->  XN Hash Left Join DS_DIST_BOTH
           Outer Dist Key: atc.userid
           Inner Dist Key: cc.userid
           Hash Cond: (("outer".userid)::text = ("inner".userid)::text)
           ->  XN Seq Scan on add_to_cart atc
           ->  XN Hash
               ->  XN Seq Scan on checkout_complete cc

I have seen cases where both tables were redistributed because the query was based on an alternate — but valid — set of join columns. These tend to be extremely rare, but if you find that you’re frequently doing such joins, the best solution is to create a second copy of the tables, distributed on that alternate key.

Redshift の分散スタイルやソートキーの自動最適化履歴を確認する

Redshift の分散スタイルやソートキーの自動最適化履歴は SVL_AUTO_WORKER_ACTION で確認することができる。

select table_id, type, status, eventtime, sequence, previous_state
from SVL_AUTO_WORKER_ACTION;
 table_id |  type   |                        status                        |         eventtime          | sequence | previous_state
----------+---------+------------------------------------------------------+----------------------------+----------+----------------
   118082 | sortkey | Start                                                | 2020-08-22 19:42:20.727049 | 0        |
   118078 | sortkey | Start                                                | 2020-08-22 19:43:54.728819 | 0        |
   118082 | sortkey | Start                                                | 2020-08-22 19:42:52.690264 | 0        |
   118072 | sortkey | Start                                                | 2020-08-22 19:44:14.793572 | 0        |
   118082 | sortkey | Failed                                               | 2020-08-22 19:42:20.728917 | 0        |
   118078 | sortkey | Complete                                             | 2020-08-22 19:43:54.792705 | 0        | SORTKEY: None;
   118086 | sortkey | Complete                                             | 2020-08-22 19:42:00.72635  | 0        | SORTKEY: None;
   118082 | sortkey | Complete                                             | 2020-08-22 19:43:34.728144 | 0        | SORTKEY: None;
   118072 | sortkey | Skipped:Retry exceeds the maximum limit for a table. | 2020-08-22 19:44:46.706155 | 0        |
   118086 | sortkey | Start                                                | 2020-08-22 19:42:00.685255 | 0        |
   118082 | sortkey | Start                                                | 2020-08-22 19:43:34.69531  | 0        |
   118072 | sortkey | Start                                                | 2020-08-22 19:44:46.703331 | 0        |
   118082 | sortkey | Checkpoint: progress 14.755079%                      | 2020-08-22 19:42:52.692828 | 0        |
   118072 | sortkey | Failed                                               | 2020-08-22 19:44:14.796071 | 0        |  
   116723 | sortkey | Abort:This table is not AUTO.                        | 2020-10-28 05:12:58.479233 | 0        |
   110203 | distkey | Abort:This table is not AUTO.                        | 2020-10-28 05:45:54.67259  | 0        | 
SVL_AUTO_WORKER_ACTION - Amazon Redshift

SYS_LOAD_ERROR_DETAIL で他のユーザーが実行したロード処理も参照できるようにする

デフォルトだと一般ユーザーは SYS_LOAD_ERROR_DETAIL を参照すると、自分が実行したロード処理のエントリしか表示できない。
他のユーザーが実行したロード処理のエントリも参照したい場合は SYSLOG ACCESS UNRESTRICTED を付与すればよい。

デフォルトでは、ユーザーが表示可能なテーブルの大部分で、別のユーザーによって生成された行は、通常のユーザーには表示されません。通常のユーザーに SYSLOG ACCESS UNRESTRICTED を付与すると、ユーザーが表示できるテーブルのすべての行 (別のユーザーが生成した行を含む) を表示できます。

システムテーブルとビューのリファレンス - Amazon Redshift

自動VACUUM実行中に手動VACUUMを実行するとどうなるか

自動VACUUM実行中に手動VACUUMを実行すると、自動VACUUMが停止し、手動VACUUMが実行される。

Automatic vacuum operations pause if any of the following conditions are met:

  • A user runs a data definition language (DDL) operation, such as ALTER TABLE, that requires an exclusive lock on a table that automatic vacuum is currently working on.
  • A user triggers VACUUM on any table in the cluster (only one VACUUM can run at a time).
  • A period of high cluster load.
VACUUM - Amazon Redshift