Darts-clone Q&A

Q: What is the limit number of string in darts clone?

A double-array uses an array and its size must be less than 2^29 (=536M). The array size is greater than the number of keys. So, the maximum number of keys is less than 2^29.

The actual limit depends on keys and values. In general, the array size is proportional to #keys and longer keys require a larger array. Additionally, the number of distinct values affects the array size. If there are few distinct values, the array size will be small.

You can estimate the actual limit by using a part of your keys.

The following are examples (`keys`: the number of keys, `size`: the array size):

Word keys (the average length is 13 bytes).

## Unique values.
$ mkdarts -t ~/corpus/1gm.zero.1m 1gm.zero.1m.darts
keys: 1000000
total: 12740688
Making double-array: 100% |*******************************************|
size: 1861632
total_size: 7446528
## All values are zero.
$ mkdarts ~/corpus/1gm.uniq.1m 1gm.uniq.1m.darts
keys: 1000000
total: 12740688
Making double-array: 100% |*******************************************|
size: 4885248
total_size: 19540992

URL keys (the average length is 53 bytes).

## Unique values.
$ mkdarts ~/corpus/urls.uniq.1m urls.uniq.1m.darts
keys: 1000000
total: 53166751
Making double-array: 100% |*******************************************|
size: 18637312
total_size: 74549248
## All values are zero.
$ mkdarts -t ~/corpus/urls.zero.1m urls.zero.1m.darts
keys: 1000000
total: 53166751
Making double-array: 100% |*******************************************|
size: 11225344
total_size: 44901376