Darts-clone Q&A
Q: What is the limit number of string in darts clone?
A double-array uses an array and its size must be less than 2^29 (=536M). The array size is greater than the number of keys. So, the maximum number of keys is less than 2^29.
The actual limit depends on keys and values. In general, the array size is proportional to #keys and longer keys require a larger array. Additionally, the number of distinct values affects the array size. If there are few distinct values, the array size will be small.
You can estimate the actual limit by using a part of your keys.
The following are examples (`keys`: the number of keys, `size`: the array size):
Word keys (the average length is 13 bytes).
## Unique values. $ mkdarts -t ~/corpus/1gm.zero.1m 1gm.zero.1m.darts keys: 1000000 total: 12740688 Making double-array: 100% |*******************************************| size: 1861632 total_size: 7446528
## All values are zero. $ mkdarts ~/corpus/1gm.uniq.1m 1gm.uniq.1m.darts keys: 1000000 total: 12740688 Making double-array: 100% |*******************************************| size: 4885248 total_size: 19540992
URL keys (the average length is 53 bytes).
## Unique values. $ mkdarts ~/corpus/urls.uniq.1m urls.uniq.1m.darts keys: 1000000 total: 53166751 Making double-array: 100% |*******************************************| size: 18637312 total_size: 74549248
## All values are zero. $ mkdarts -t ~/corpus/urls.zero.1m urls.zero.1m.darts keys: 1000000 total: 53166751 Making double-array: 100% |*******************************************| size: 11225344 total_size: 44901376