Hatena::ブログ(Diary)

ablog このページをアンテナに追加 RSSフィード Twitter

2017-08-13

ソースDBがASMの場合、DMS の CDC でレプリケーションするには Oracle Binary Reader を使う

DMS の CDCOracle Database からレプリケーションをする場合*1REDOログを読む方法は以下の2種類があり、

データベースファイルを ASM に置いている場合は、Oracle Binary Reader を使う必要があるようだ。

具体的には Extra connection attributes で以下のように設定する必要がある。

useLogminerReader=N;asm_user=<asm_username>;asm_server=<first_RAC_server_ip_address>/+ASM

参考

AWS Database Migration Service User Guide (Version API Version 2016-01-01)
  • Using Oracle LogMiner or Oracle Binary Reader for Change Data Capture (CDC)

By default, AWS DMS uses Oracle LogMiner for change data capture (CDC). Alternatively, you can choose to use the Oracle Binary Reader. The Oracle Binary Reader bypasses LogMiner and reads the logs directly. To enable the Binary Reader, you need to modify your source connection to include the following extra connection parameters:

useLogminerReader=N; useBfile=Y                                                                                         

To enable Logminer, you need to modify your source connection to include the following extra connection parameter or leave the Extra Connection Attribute blank (Logminer is the default):

useLogminerReader=Y

If the Oracle source database is using Oracle ASM (Automatic Storage Management), the extra connection parameter needs to include the asm username and asm server address. The password field will also need to have both passwords, the source user password, as well as the ASM password.

useLogminerReader=N;asm_user=<asm_username>;asm_server=<first_RAC_server_ip_address>/+ASM

If the Oracle source database is using Oracle ASM (Automatic Storage Management), the endpoint password field needs to have both the Oracle user password and the ASM password, separated by a comma.

Example: <oracle_user_password>,<asm_user_password>
Using an Oracle Database as a Source for AWS Database Migration Service - AWS Database Migration Service

2017/08/13 太平洋上空にて

*1:ソースDBOracle Database の場合

2017-08-11

Amazon Redshift からテーブル作成DDLを自動生成する

Amazon Redshift 自体にはクラスターに存在するテーブルなどのオブジェクトDDLを生成する機能はないが、以下のスクリプトやツールで生成することができる。

つまり、Oracle Database の DataPump の CONTENT=METADATA_ONLYexp の rows=n ような機能はないが、DBMS_METADATA.GET_DDL 相当のことができるスクリプトやツールは存在する。

以下のツールでDDL文を生成できる*1


具体的な手順例は以下の通り。

SQL Workbench/J を使う

f:id:yohei-a:20170812000948p:image:w640

f:id:yohei-a:20170812000944p:image:w360 f:id:yohei-a:20170812000937p:image:w640

25. SQL Workbench/J to generate DDL commands を見るとコマンドラインからでも同じことができそう。


Amazon Redshift Utilities のスクリプトを使う方法

$ git clone https://github.com/awslabs/amazon-redshift-utils.git
$ cd amazon-redshift-utils/src/AdminViews
$ psql "host=******.******.ap-northeast-1.redshift.amazonaws.com user=master dbname=mydb port=5439"
Password:
psql (9.6.2, server 8.0.2)
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: on)
Type "help" for help.
mydb=#
mydb=# CREATE SCHEMA admin;
CREATE SCHEMA
  • v_generate_tbl_ddl ビューを作成する。
mydb=# \i ./v_generate_tbl_ddl.sql
CREATE VIEW
  • pager を OFF にする。
mydb=# \pset pager
Pager usage is off.
mydb=# select * from admin.v_generate_tbl_ddl where schemaname='pg_catalog' limit 30;
 schemaname |      tablename      |    seq    |                              ddl
------------+---------------------+-----------+---------------------------------------------------------------
 pg_catalog | padb_config_harvest |         0 | --DROP TABLE "pg_catalog"."padb_config_harvest";
 pg_catalog | padb_config_harvest |         2 | CREATE TABLE IF NOT EXISTS "pg_catalog"."padb_config_harvest"
 pg_catalog | padb_config_harvest |         5 | (
 pg_catalog | padb_config_harvest | 100000001 |         "name" CHAR(136) NOT NULL  ENCODE lzo
 pg_catalog | padb_config_harvest | 100000002 |         ,"harvest" INTEGER NOT NULL  ENCODE lzo
 pg_catalog | padb_config_harvest | 100000003 |         ,"archive" INTEGER NOT NULL  ENCODE lzo
 pg_catalog | padb_config_harvest | 100000004 |         ,"directory" CHAR(500) NOT NULL  ENCODE lzo
 pg_catalog | padb_config_harvest | 299999999 | )
 pg_catalog | padb_config_harvest | 300000001 | DISTSTYLE EVEN
 pg_catalog | padb_config_harvest | 600000000 | ;
 pg_catalog | pg_aggregate        |         0 | --DROP TABLE "pg_catalog"."pg_aggregate";
 pg_catalog | pg_aggregate        |         2 | CREATE TABLE IF NOT EXISTS "pg_catalog"."pg_aggregate"
 pg_catalog | pg_aggregate        |         5 | (
 pg_catalog | pg_aggregate        | 100000001 |         "aggfnoid" REGPROC NOT NULL
 pg_catalog | pg_aggregate        | 100000002 |         ,"aggtransfn" REGPROC NOT NULL
 pg_catalog | pg_aggregate        | 100000003 |         ,"aggfinalfn" REGPROC NOT NULL
 pg_catalog | pg_aggregate        | 100000004 |         ,"aggtranstype" OID NOT NULL
 pg_catalog | pg_aggregate        | 100000005 |         ,"agginitval" TEXT
 pg_catalog | pg_aggregate        | 299999999 | )
 pg_catalog | pg_aggregate        | 300000001 | DISTSTYLE EVEN
 pg_catalog | pg_aggregate        | 600000000 | ;
 pg_catalog | pg_am               |         0 | --DROP TABLE "pg_catalog"."pg_am";
 pg_catalog | pg_am               |         2 | CREATE TABLE IF NOT EXISTS "pg_catalog"."pg_am"
 pg_catalog | pg_am               |         5 | (
 pg_catalog | pg_am               | 100000001 |         "amname" NAME NOT NULL
 pg_catalog | pg_am               | 100000002 |         ,"amowner" INTEGER NOT NULL
 pg_catalog | pg_am               | 100000003 |         ,"amstrategies" SMALLINT NOT NULL
 pg_catalog | pg_am               | 100000004 |         ,"amsupport" SMALLINT NOT NULL
 pg_catalog | pg_am               | 100000005 |         ,"amorderstrategy" SMALLINT NOT NULL
 pg_catalog | pg_am               | 100000006 |         ,"amcanunique" BOOLEAN NOT NULL
(30 rows)
  • 出力先ファイル名を設定する。
mydb=# \o ddl.sql
  • DDL のみ生成する。
mydb=# select ddl from admin.v_generate_tbl_ddl where schemaname='pg_catalog';
  • 切断する。
mydb=# \q
  • 出力ファイルを確認する。
$ head -10 ddl.sql 
                                         ddl
--------------------------------------------------------------------------------------
 --DROP TABLE "pg_catalog"."padb_config_harvest";
 CREATE TABLE IF NOT EXISTS "pg_catalog"."padb_config_harvest"
 (
         "name" CHAR(136) NOT NULL  ENCODE lzo
         ,"harvest" INTEGER NOT NULL  ENCODE lzo
         ,"archive" INTEGER NOT NULL  ENCODE lzo
         ,"directory" CHAR(500) NOT NULL  ENCODE lzo
 )
$ tail -10 ddl.sql
 CREATE TABLE IF NOT EXISTS "pg_catalog"."systable_topology"
 (
         "hash" INTEGER NOT NULL  ENCODE lzo
         ,"slice" INTEGER NOT NULL  ENCODE lzo
         ,"mirror" INTEGER NOT NULL  ENCODE lzo
 )
 DISTSTYLE EVEN
 ;
(4815 rows)

Aginity Workbench for Amazon Redshift


参考

SQL Workbench/J のコンソールモードを使う

環境


インストール


設定

  • GUIで起動する。
$ cd Workbench-Build122
$ sh ./sqlworkbench.sh

コンソールモードで接続する

$ cd Workbench-Build122
$ sh ./sqlwbconsole.sh -profile=Redshift ★GUI で設定したプロファイル名を指定
readlink: illegal option -- f
usage: readlink [-n] [file ...]

SQL Workbench/J (122) console interface started.
Enter exit to quit.
Enter WbHelp for a list of SQL Workbench/J specific commands
Config directory: /Users/******/.sqlworkbench

Connection to "User=master, Schema=public, URL=jdbc:redshift://***.***.ap-northeast-1.redshift.amazonaws.com:5439/mydb" successful

master@public> select * from PG_TABLE_DEF limit 10;

schemaname | tablename                | column       | type           | encoding | distkey | sortkey | notnull
-----------+--------------------------+--------------+----------------+----------+---------+---------+--------
pg_catalog | padb_config_harvest      | name         | character(136) | lzo      | false   |       0 | true
pg_catalog | padb_config_harvest      | harvest      | integer        | lzo      | false   |       0 | true
pg_catalog | padb_config_harvest      | archive      | integer        | lzo      | false   |       0 | true
pg_catalog | padb_config_harvest      | directory    | character(500) | lzo      | false   |       0 | true
pg_catalog | pg_aggregate             | aggfnoid     | regproc        | none     | false   |       0 | true
pg_catalog | pg_aggregate             | aggtransfn   | regproc        | none     | false   |       0 | true
pg_catalog | pg_aggregate             | aggfinalfn   | regproc        | none     | false   |       0 | true
pg_catalog | pg_aggregate             | aggtranstype | oid            | none     | false   |       0 | true
pg_catalog | pg_aggregate             | agginitval   | text           | none     | false   |       0 | false
pg_catalog | pg_aggregate_fnoid_index | aggfnoid     | regproc        | none     | false   |       0 | false

(10 Rows)
SELECT executed successfully

master@public>
master@public> \q
$

補足

$ cat sqlwbconsole.sh
#!/bin/sh
# Start SQL Workbench/J in console mode

SCRIPT_PATH="$(dirname "$(readlink -f "$0")")"

JAVACMD="java"

if [ -x "$SCRIPT_PATH/jre/bin/java" ]
then
  JAVACMD="$SCRIPT_PATH/jre/bin/java"
elif [ -x "$WORKBENCH_JDK/bin/java" ]
then
  JAVACMD="$WORKBENCH_JDK/bin/java"
elif [ -x "$JAVA_HOME/jre/bin/java" ]
then
  JAVACMD="$JAVA_HOME/jre/bin/java"
elif [ -x "$JAVA_HOME/bin/java" ]
then
  JAVACMD="$JAVA_HOME/bin/java"
fi

cp=$SCRIPT_PATH/sqlworkbench.jar
cp=$cp:$SCRIPT_PATH/dom4j-1.6.1.jar
cp=$cp:$SCRIPT_PATH/poi-ooxml-schemas.jar
cp=$cp:$SCRIPT_PATH/poi-ooxml.jar
cp=$cp:$SCRIPT_PATH/poi.jar
cp=$cp:$SCRIPT_PATH/stax-api-1.0.1.jar
cp=$cp:$SCRIPT_PATH/resolver.jar
cp=$cp:$SCRIPT_PATH/serializer.jar
cp=$cp:$SCRIPT_PATH/simple-odf.jar
cp=$cp:$SCRIPT_PATH/ext/*

$JAVACMD -Djava.awt.headless=true \
         -Xmx1024m \
         -Dvisualvm.display.name=SQLWorkbench \
         -cp $cp workbench.console.SQLConsole "$@"

参考

SQL Workbench/J で Amazon Redshift に接続する

SQL Workbench/J をインストールして Amazon Redshift に接続するまでの手順をメモ。SQL Workbench/J は Java で書かれた GUI および CUI から SQL を実行できるツールで、様々な RDBMS に使える。

SQL Workbench/J is a free, DBMS-independent, cross-platform SQL query tool. It is written in Java and should run on any operating system that provides a Java Runtime Environment.

Its main focus is on running SQL scripts (either interactively or as a batch) and export/import features. Graphical query building or more advanced DBA tasks are not the focus and are not planned

SQL Workbench/J -  Home

環境


インストール

SQL Workbench/J
Amazon Redshift JDBC Driver

設定


Redshift クラスターに接続する


使ってみる

  • [Tools]-[ShowDbTree] を選択する。
  • 表などを選択して、右クリックして[Put Select Into]-[StatementN] を選択する。
  • SQL選択して、実行ボタンをクリックする。

f:id:yohei-a:20170811175736p:image


補足

SQL Workbench/J can also be used from the command line without starting the GUI, e.g. when you only have a console window (Putty, SSH) to access the database. In that case you can either run scripts using the batch mode, or start SQL Workbench/J in console mode, where you can run statements interactively, similar to the GUI mode (but of course with less comfortable editing possibilities).

19. Using SQL Workbench/J in console mode
$ sudo find / -type f -name sqlworkbench.jar
/Applications/SQLWorkbenchJ.app/Contents/Java/sqlworkbench.jar

参考

*1:以下は一例で他にもある

*2:[Download package for MacOS] ではない

2017-08-01

SAP HANA で SQL 実行時に実行されるスレッド数について

HANA には1つの SQL に対して1スレッド使われる SQL Executor とSQL を並列*1タスクに分解して処理する Job Worker スレッドがあり、max_concurrency は1のSQL あたりの並列実行スレッド数を制御するパラメータ。max_concurrency_hint は直近のワークロードに基づいて動的に導出されるスレッド数で、max_concurrency より大きくはできない。

従って、HANA で SQL の実行に使われる最大スレッド数はmax_sql_executor + max_sql_executors * max_concurrency となる。CPU使用率の上限を厳密に抑えたいのであれば、max_concurrency_hint だけでなく max_concurrency も設定することが望ましいのではないかと思う。

CPU使用率スパイク時にM_SERVICE_THREADSシステムビューで STATEMENT_HASH が同じスレッドが多数実行されている場合は、max_concurrency で抑制できるが、異なる STATEMENT_HASH のスレッドが多数実行されている場合は max_sql_executors で抑制する必要があると思う。


参考

Parameters for Job Executor

You can change the default settings in the configuration editor of the SAP HANA studio (recommended) or directly in the execution section of the global.ini or indexserver.ini system properties file.

  • max_concurrency - sets the target number of threads that can be used.

Threads are created as required up to the limit defined here. New threads are created, for example, if a thread is blocked waiting for I/O or network. Set this to a reasonable value between the number of logical cores per CPU up to the overall number of logical cores in the system. Does not require a restart. Available since SPS 09. Note that from SPS12, this parameter should only need to be modified for multitenant database container installations.

  • max_concurrency_hint - limit concurrency hint even if more active job workers would be available.

The JobExecutor proposes the number of jobs to create for parallel processing based on the recent load on the system. This parameter limits decisions to parallelize on a low level of code. Multiple parallelization steps may result in far more jobs being created for a statement (and hence higher concurrency) than this parameter. The default is 0 (no limit is given, but hint is never greater than max_concurrency). On large systems (that is more than 4 sockets) setting this parameter to the number of logical cores of one socket may result in better performance. Performance tests with the target workload are needed to confirm this.

Does not require a restart. Available since SPS 08 (revision 85).

SAP Help Portal

SAP HANA Studio に表示される Database Resident Memory と HANA Used Memory の関係

一言で言うと、Database Resident Memory は HANA のプロセスが使っている物理メモリサイズで、HANA Used Memory は HANA から見て論理的に使っているメモリサイズ。つまり、Database Resident Memory - HANA Used Memory の差分はOSから見ると物理メモリを使っているけど、HANA から見ると空きメモリで利用可能な領域ということだと思う。

When memory is required for table growth or for temporary computations, the SAP HANA code obtains it from the existing memory pool. When the pool cannot satisfy the request, the HANA memory manager will request and reserve more memory from the operating system. At this point, the virtual memory size of the HANA processes grows.

Once a temporary computation completes or a table is dropped, the freed memory is returned to the memory manager, who recycles it to its pool, usually without informing Linux. Thus, from SAP HANA’s perspective, the amount of Used Memory shrinks, but the process’ virtual and resident sizes are not affected. This creates a situation where the Used Memory may even shrink to below the size of SAP HANA’s resident memory, which is perfectly normal.

The following illustration shows the relationship between physical memory, Linux virtual and resident memory, and SAP HANA’s pool and Used Memory indicators. Note how changes in Used Memory do not affect the processes’ virtual and resident sizes.

f:id:yohei-a:20170801235104p:image

https://www.edge-solutions.com/wp-content/uploads/2014/03/HANA_Memory_Usage_v2.pdf

以下のスライドで説明すると HANA Used Memory = Code and Stack + Table Data + Database Management で Allocated Memory Pool は含まない。Database Resident Memory > HANA Used Memory となる状況では、再利用可能な Allocated Memory Pool (Free) をプロセスが物理メモリを解放せずにそのままにしている(必要になったら上書き再利用する)と思われる。


Database Resident Memory の情報ソースは

3.1 - Hana Studio

3.1.1 - Database Resident

For the SAP Hana process:

SELECT 
   round(SUM(PHYSICAL_MEMORY_SIZE)/1024/1024/1024,2) AS "Database Resident Memory (Gb)" 
FROM 
  M_SERVICE_MEMORY
https://gerardnico.com/wiki/hana/memory/resident_memory

M_SERVICE_MEMORY システムビューの PHYSICAL_MEMORY_SIZE の合計で、

M_SERVICE_MEMORY System View

Detailed information on memory utilization by services.

Structure
Column nameData typeUnitDescription
PHYSICAL_MEMORY_SIZEBIGINTBytePhysical/resident memory size (operating system perspective)
SAP Help Portal

このシステムビューはおそらく /proc/[PID]/smaps などからプロセスが使用しているメモリサイズを取得しているのではないかと思う。

/proc/[pid]/smaps (since Linux 2.6.14)

This file shows memory consumption for each of the process's

mappings. (The pmap(1) command displays similar information,

in a form that may be easier for parsing.) For each mapping

there is a series of lines such as the following:

 00400000-0048a000 r-xp 00000000 fd:03 960637       /bin/bash
 Size:                552 kB
 Rss:460 kB
 Pss:100 kB
 Shared_Clean:        452 kB
 Shared_Dirty:          0 kB
 Private_Clean:         8 kB
 Private_Dirty:         0 kB
 Referenced:          460 kB
 Anonymous:             0 kB
 AnonHugePages:         0 kB
 ShmemHugePages:        0 kB
 ShmemPmdMapped:        0 kB
 Swap: 0 kB
 KernelPageSize:        4 kB
 MMUPageSize:           4 kB
 KernelPageSize:        4 kB
 MMUPageSize:           4 kB
 Locked:                0 kB
 ProtectionKey:         0
 VmFlags: rd ex mr mw me dw

The first of these lines shows the same information as is

displayed for the mapping in /proc/[pid]/maps. The following

lines show the size of the mapping, the amount of the mapping

that is currently resident in RAM ("Rss"), the process's

proportional share of this mapping ("Pss"), the number of

clean and dirty shared pages in the mapping, and the number of

clean and dirty private pages in the mapping.

proc(5) - Linux manual page

補足

Linux などの OS のメモリ管理はデマンドページングを行なっているものが多い。ユーザープロセスにメモリ領域を割当てた時点では仮想メモリアドレス空間が割当てられるだけで物理メモリを使っておらず、メモリにデータを書いた時点で初めて物理メモリを使用する。ps コマンドで VSZ(Virtual Size in Kbytes) が仮想メモリサイズで、RSS(Resident Set Size) が物理メモリサイズ。RSS が実際に使っている物理メモリサイズになる。

*1:ここではあえて並行ではなく並列と書いている。同時に ON CPU で実行できない場合、並列にはならないが、基本的にマルチプロセッサで並列処理を意図しているため。