Hatena::ブログ(Diary)

urekatのスカンク日記3 このページをアンテナに追加 RSSフィード

2011-10-29

cassandra1.0.0のconf/schema-sample.txtを読み解く

/*This file contains an example Keyspace that can be created using the
cassandra-cli command line interface as follows.

bin/cassandra-cli -host localhost --file conf/schema-sample.txt

The cassandra-cli includes online help that explains the statements below. You can
accessed the help without connecting to a running cassandra instance by starting the
client and typing "help;"
*/
WARNING: [{}] strategy_options syntax is deprecated, please use {}

Line 3 => No enum const class org.apache.cassandra.cli.CliClient$ColumnFamilyArgument.MEMTABLE_FLUSH_AFTER

schema-sample.txtに2つバグがあるので修正。

  [{replication_factor:1}] → {replication_factor:1}

  and memtable_flush_after = 59 → /*  and memtable_flush_after = 59 */
create keyspace Keyspace1
    with strategy_options={replication_factor:1}
    and placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy';

use Keyspace1;

keyspaceはネームスペースみたいなもの。mysqlで言うならdatabase名。

replication_factorはレプリカの数。

placement_strategyはレプリカの配置の仕方のアルゴリズム選択。

これらは、keyspaceごとに指定するんですね。

% bin/cassandra-cli -host localhost -port 9160
Connected to: "TakeruCluster001" on localhost/9160
Welcome to the Cassandra CLI.

Type 'help;' or '?' for help.
Type 'quit;' or 'exit;' to quit.

[default@unknown] help create keyspace;
create keyspace <keyspace>;
create keyspace <keyspace> with <att1>=<value1>;
create keyspace <keyspace> with <att1>=<value1> and <att2>=<value2> ...;

Create a keyspace with the specified attributes.

Required Parameters:
- keyspace: Name of the new keyspace, "system" is reserved for
  Cassandra internals. Names may only contain letters, numbers and
  underscores.

Keyspace Attributes (all are optional):
- placement_strategy: Class used to determine how replicas
  are distributed among nodes. Defaults to NetworkTopologyStrategy with
  one datacenter defined with a replication factor of 1 ("[datacenter1:1]").

  Supported values are:
    - org.apache.Cassandra.locator.SimpleStrategy
    - org.apache.Cassandra.locator.NetworkTopologyStrategy
    - org.apache.Cassandra.locator.OldNetworkTopologyStrategy

  SimpleStrategy merely places the first replica at the node whose
  token is closest to the key (as determined by the Partitioner), and
  additional replicas on subsequent nodes along the ring in increasing
  Token order.

  Supports a single strategy option 'replication_factor' that
  specifies the replication factor for the cluster.

  With NetworkTopologyStrategy, for each datacenter, you can specify
  how many replicas you want on a per-keyspace basis. Replicas are
  placed on different racks within each DC, if possible.

  Supports strategy options which specify the replication factor for
  each datacenter. The replication factor for the entire cluster is the
  sum of all per datacenter values. Note that the datacenter names
  must match those used in conf/cassandra-topology.properties.

  OldNetworkToplogyStrategy [formerly RackAwareStrategy]
  places one replica in each of two datacenters, and the third on a
  different rack in in the first.  Additional datacenters are not
  guaranteed to get a replica.  Additional replicas after three are
  placed in ring order after the third without regard to rack or
  datacenter.

  Supports a single strategy option 'replication_factor' that
  specifies the replication factor for the cluster.

- strategy_options: Optional additional options for placement_strategy.
  Options have the form {key:value}, see the information on each
  strategy and the examples.

- durable_writes: When set to false all RowMutations on keyspace will by-pass CommitLog.
  Set to true by default.

Examples:
create keyspace Keyspace2
    with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'
    and strategy_options = {replication_factor:4};
create keyspace Keyspace3
    with placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy'
    and strategy_options={DC1:2, DC2:2};
create keyspace Keyspace4
    with placement_strategy = 'org.apache.cassandra.locator.OldNetworkTopologyStrategy'
    and strategy_options = {replication_factor:1};


create column family Standard1
    with comparator = BytesType
    and keys_cached = 10000
    and rows_cached = 1000
    and row_cache_save_period = 0
    and key_cache_save_period = 3600
    and memtable_throughput = 255
    and memtable_operations = 0.29;

keyspaceの中に"column family"をつくる。mysqlで言えばtable。

comparator: "column"の"name"の型と比較順を決める。

cache*: キャッシュの設定。

memtable_*: この設定はなくなったのではないか?

[default@unknown] help create column family;
create column family <name>;
create column family <name> with <att1>=<value1>;
create column family <name> with <att1>=<value1> and <att2>=<value2>...;

Create a column family in the current keyspace with the specified
attributes.

Required Parameters:
- name: Name of the new column family. Names may only contain letters,
  numbers and underscores.

column family Attributes (all are optional):
- column_metadata: Defines the validation and indexes for known columns in
  this column family.

  Columns not listed in the column_metadata section will use the
  default_validator to validate their values.

  Column Required parameters:
    - name: Binds a validator (and optionally an indexer) to columns
      with this name in any row of the enclosing column family.

    - validator: Validator to use for values for this column.

      Supported values are:
        - AsciiType
        - BytesType
        - CounterColumnType (distributed counter column)
        - Int32Type
        - IntegerType (a generic variable-length integer type)
        - LexicalUUIDType
        - LongType
        - UTF8Type

      It is also valid to specify the fully-qualified class name to a class
      that extends org.apache.Cassandra.db.marshal.AbstractType.

  Column Optional parameters:
    - index_name: Name for the index. Both an index name and
      type must be specified.

    - index_type: The type of index to be created.

      Suported values are:
        - KEYS: a ColumnFamily backed index
        - CUSTOM: a user supplied index implementaion. You must supply a
          'class_name' field in the index_options with the full classname 
          of the implementation.
    
    - index_options: Optional additional options for index_type.
      Options have the form {key:value}.
           
- column_type: Type of columns this column family holds, valid values are
  Standard and Super. Default is Standard.

- comment: Human readable column family description.

- comparator: Validator to use to validate and compare column names in
  this column family. For Standard column families it applies to columns, for
  Super column families applied to  super columns. Also see the subcomparator
  attribute. Default is BytesType, which is a straight forward lexical
  comparison of the bytes in each column.

  Supported values are:
    - AsciiType
    - BytesType
    - CounterColumnType (distributed counter column)
    - Int32Type
    - IntegerType (a generic variable-length integer type)
    - LexicalUUIDType
    - LongType
    - UTF8Type

  It is also valid to specify the fully-qualified class name to a class that
  extends org.apache.Cassandra.db.marshal.AbstractType.

- default_validation_class: Validator to use for values in columns which are
  not listed in the column_metadata. Default is BytesType which applies
  no validation.

  Supported values are:
    - AsciiType
    - BytesType
    - CounterColumnType (distributed counter column)
    - Int32Type
    - IntegerType (a generic variable-length integer type)
    - LexicalUUIDType
    - LongType
    - UTF8Type

  It is also valid to specify the fully-qualified class name to a class that
  extends org.apache.Cassandra.db.marshal.AbstractType.

- key_validation_class: Validator to use for keys.
  Default is BytesType which applies no validation.

  Supported values are:
    - AsciiType
    - BytesType
    - Int32Type
    - IntegerType (a generic variable-length integer type)
    - LexicalUUIDType
    - LongType
    - UTF8Type

  It is also valid to specify the fully-qualified class name to a class that
  extends org.apache.Cassandra.db.marshal.AbstractType.

- gc_grace: Time to wait in seconds before garbage collecting tombstone
  deletion markers. Default value is 864000 or 10 days.

  Set this to a large enough value that you are confident that the deletion
  markers will be propagated to all replicas by the time this many seconds
  has elapsed, even in the face of hardware failures.

  See http://wiki.apache.org/Cassandra/DistributedDeletes

- keys_cached: Maximum number of keys to cache in memory. Valid values are
  either a double between 0 and 1 (inclusive on both ends) denoting what
  fraction should be cached. Or an absolute number of rows to cache.
  Default value is 200000.

  Each key cache hit saves 1 seek and each row cache hit saves 2 seeks at the
  minimum, sometimes more. The key cache is fairly tiny for the amount of
  time it saves, so it's worthwhile to use it at large numbers all the way
  up to 1.0 (all keys cached). The row cache saves even more time, but must
  store the whole values of its rows, so it is extremely space-intensive.
  It's best to only use the row cache if you have hot rows or static rows.

- key_cache_save_period: Duration in seconds after which Cassandra should
  safe the keys cache. Caches are saved to saved_caches_directory as
  specified in conf/Cassandra.yaml. Default is 14400 or 4 hours.

  Saved caches greatly improve cold-start speeds, and is relatively cheap in
  terms of I/O for the key cache. Row cache saving is much more expensive and
  has limited use.

- read_repair_chance: Probability (0.0-1.0) with which to perform read
  repairs for any read operation. Default is 0.1.

  Note that disabling read repair entirely means that the dynamic snitch
  will not have any latency information from all the replicas to recognize
  when one is performing worse than usual.

- rows_cached: Maximum number of rows whose entire contents we
  cache in memory. Valid values are either a double between 0 and 1 (
  inclusive on both ends) denoting what fraction should be cached. Or an
  absolute number of rows to cache. Default value is 0, to disable row
  caching.

  Each key cache hit saves 1 seek and each row cache hit saves 2 seeks at the
  minimum, sometimes more. The key cache is fairly tiny for the amount of
  time it saves, so it's worthwhile to use it at large numbers all the way
  up to 1.0 (all keys cached). The row cache saves even more time, but must
  store the whole values of its rows, so it is extremely space-intensive.
  It's best to only use the row cache if you have hot rows or static rows.

- row_cache_save_period: Duration in seconds after which Cassandra should
  safe the row cache. Caches are saved to saved_caches_directory as specified
  in conf/Cassandra.yaml. Default is 0 to disable saving the row cache.

  Saved caches greatly improve cold-start speeds, and is relatively cheap in
  terms of I/O for the key cache. Row cache saving is much more expensive and
  has limited use.

- subcomparator:  Validator to use to validate and compare sub column names
  in this column family. Only applied to Super column families. Default is
  BytesType, which is a straight forward lexical comparison of the bytes in
  each column.

  Supported values are:
    - AsciiType
    - BytesType
    - CounterColumnType (distributed counter column)
    - Int32Type
    - IntegerType (a generic variable-length integer type)
    - LexicalUUIDType
    - LongType
    - UTF8Type

  It is also valid to specify the fully-qualified class name to a class that
  extends org.apache.Cassandra.db.marshal.AbstractType.

- max_compaction_threshold: The maximum number of SSTables allowed before a
minor compaction is forced. Default is 32, setting to 0 disables minor
compactions.

Decreasing this will cause minor compactions to start more frequently and
be less intensive. The min_compaction_threshold and max_compaction_threshold
boundaries are the number of tables Cassandra attempts to merge together at
once.

- min_compaction_threshold: The minimum number of SSTables needed
to start a minor compaction. Default is 4, setting to 0 disables minor
compactions.

Increasing this will cause minor compactions to start less frequently and
be more intensive. The min_compaction_threshold and max_compaction_threshold
boundaries are the number of tables Cassandra attempts to merge together at
once.

- replicate_on_write: Replicate every counter update from the leader to the
follower replicas. Accepts the values true and false.

- row_cache_provider: The provider for the row cache to use for this
column family. 

Supported values are:
    - ConcurrentLinkedHashCacheProvider
    - SerializingCacheProvider (requires JNA)

It is also valid to specify the fully-qualified class name to a class
that implements org.apache.cassandra.cache.IRowCacheProvider.

row_cache_provider defaults to SerializingCacheProvider if you have JNA
enabled, otherwise ConcurrentLinkedHashCacheProvider.
SerializingCacheProvider serialises the contents of the row and stores
it in native memory, i.e., off the JVM Heap. Serialized rows take
significantly less memory than "live" rows in the JVM, so you can cache
more rows in a given memory footprint.  And storing the cache off-heap
means you can use smaller heap sizes, reducing the impact of GC pauses.

- compression_options: Options related to compression.
  Options have the form {key:value}.
  The main recognized options are:
    - sstable_compression: the algorithm to use to compress sstables for
      this column family. If none is provided, compression will not be
      enabled. Supported values are SnappyCompressor, DeflateCompressor or
      any custom compressor. It is also valid to specify the fully-qualified
      class name to a class that implements org.apache.cassandra.io.ICompressor.

    - chunk_length_kb: specify the size of the chunk used by sstable
      compression (default to 64, must be a power of 2).

  To disable compression just set compression_options to null like this
  `compression_options = null`.

Examples:
create column family Super4
    with column_type = 'Super'
    and comparator = 'AsciiType'
    and rows_cached = 10000;
create column family Standard3
    with comparator = 'LongType'
    and rows_cached = 10000;
create column family Standard4
    with comparator = AsciiType
    and column_metadata =
    [{
        column_name : Test,
        validation_class : IntegerType,
        index_type : 0,
        index_name : IdxName},
    {
        column_name : 'other name',
        validation_class : LongType
    }];
create column family Standard2
    with comparator = UTF8Type
    and read_repair_chance = 0.1
    and keys_cached = 100
    and gc_grace = 0
    and min_compaction_threshold = 5
    and max_compaction_threshold = 31;

create column family StandardByUUID1
    with comparator = TimeUUIDType;

read_repair_chance: ?

gc_grace: ?

compaction_*: コンパクションの調整。



create column family Super1
    with column_type = Super
    and comparator = BytesType
    and subcomparator = BytesType;

create column family Super2
    with column_type = Super
    and subcomparator = UTF8Type
    and rows_cached = 10000
    and keys_cached = 50
    and comment = 'A column family with supercolumns, whose column and subcolumn names are UTF8 strings';

create column family Super3
    with column_type = Super
    and comparator = LongType
    and comment = 'A column family with supercolumns, whose column names are Longs (8 bytes)';

column_type = Superのときは、

comparatorが"super column"の設定で、

subcomparatorが"column"の設定になる。

create column family Indexed1
    with comparator = UTF8Type
    and default_validation_class = LongType
    and column_metadata = [{
        column_name : birthdate,
        validation_class : LongType,
        index_name : birthdate_idx,
        index_type : 0}
    ];

default_validation_class: "value"の型、バリデータのデフォルト

column_metadata: カラムの設定。indexを張れる!!

index_type: KEYS(a ColumnFamily backed index) or CUSTOM. KEYSは単純なeqクエリができる。

create column family Counter1
    with default_validation_class = CounterColumnType;

create column family SuperCounter1
    with column_type = Super
    and default_validation_class = CounterColumnType;

CounterColumnType (distributed counter column)

これは、なんだろう。

スパム対策のためのダミーです。もし見えても何も入力しないでください
ゲスト

コメントを書くには、なぞなぞ認証に回答する必要があります。

トラックバック - http://d.hatena.ne.jp/urekat/20111029/1319889723
リンク元