HPACK 05 has following compression mechanisms:
- By sharing learning dictionary of headers, a header can be represented as a small integer.
- Reference set
- By comparing a set of the previous indices and that of the current indices, difference can be calculated.
- Huffman code
- Strings (header names and header values) themselves can be shorten.
Since "reference set" depends on "index", we can consider 6 kinds of HPACK encoding. For convenience, I give a name for each.
To implement Naive and Linear, an HPACK encoder first sends index 0 to clear reference set and encodes the headers in order (see Appendix in detail). The algorithm for Diff was posted to ietf-http-wg ML by Tsujikawa, the author of "nghttp2". The last "H" means using huffman encoding.
Correction: it appeared that multiple :authority values are mixed together. We should make another experiment with better input.
We can also use these test cases to calculate compression ratio. Compression ration is calculated by dividing the length of result representation by that of header set.
Here is the results.
I confirmed the following things.
- The "hpack" library in Go uses NaiveH. I verified that my NaiveH encoding produces the same results as it.
- "nghttp2" uses DiffH. I verified that my DiffH encoding produces the same results as it.
Here is a chart to visualize the results:
- Diff(H) saves only 1.57 byte per HEADERS frame in average against Linear(H).
- Huffman encoding saves 31.36 bytes per HEADERS frame in average against non-huffman.
So, I would suggest the designers of HPACK to remove reference set. This has the following advantages:
- Implementation of HPACK encoder becomes much easier.
- Inter-operability is improved.
- The order of headers are always kept.
- The compression ratio does not go worse.
I first implemented HPACK with simple data structures and algorithms. Profiling tells me there were three bottlenecks:
- Converting a header to an index in HPACK encoding.
- Converting bit sequence to a character in Huffman decoding.
- Concatenating bit sequences in Huffman encoding.
Linear search is slow for the purpose of 1). To solve this problem, I created a reverse index. To avoid hash DOS attacks, I used search trees to implement it. Since we need to find either an index for a pair of header key and header value OR an index for a header key, search trees should be nested. When the outer search tree is looked up with a header key, an inner search tree cab be obtained if exists. Then the inner search tree is looked up with a header value.
My choice is as follows:
Bit-by-bit traversing on a binary tree is slow for the purpose of 2). So, I adopted Fast Prefix Code Processing. I prepared state transition tables for 4 bits. The reason why I chose 4 bits, not 8 bits, is that 4 bits can ensure one character is generated at most.
To solve problem 3, I prepared N shifted bytes for bit sequences in advance where N is from 0 to 7. Concatenation can be done just by copying bytes and ORed a seam point if necessary.
- It has good compression ratio.
- Fast implementation is possible.
- Reference set
- It has poor compression ratio.
- It should be removed from HPACK.
- Huffman encoding
- Its compression ratio is so so.
- Fast implementation is possible.
key1: value1 key2: value2 key3: value3
Both Naive and Linear encodes this to:
Indexed 0 Literal NotAdd (Lit "key1") "value1" Literal NotAdd (Lit "key2") "value2" Literal NotAdd (Lit "key3") "value3"
Then suppose that the next header set is as follows:
key1: value1 key2: value2 key3: value3'
Naive encodes this to:
Indexed 0 Literal NotAdd (Lit "key1") "value1" Literal NotAdd (Lit "key2") "value2" Literal NotAdd (Lit "key3") "value3'"
Linear encodes this to:
Indexed 0 Indexed 3 Indexed 2 Literal Add (Idx 1) "value3'"
- 417 https://www.google.co.jp/
- 101 http://www.google.co.jp/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0CCoQFjAA&url=http://d.hatena.ne.jp/kazu-yamamoto/20121107/1352259739&ei=m-XpUvO3DMfNkQWcloDABQ&usg=AFQjCNGQfLztt7JQx6gfSz8lnMnZp9eC4A&bvm=bv.60444564,d.dGI
- 67 http://www.google.co.jp/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&ved=0CC4QFjAB&url=http://d.hatena.ne.jp/kazu-yamamoto/20081027/1225082163&ei=sfrpUpm6OI7PkgWS5IDYCQ&usg=AFQjCNGWvuAHxOdwFaYsa4eiDIqn8cFv5Q&bvm=bv.60444564,d.dGI
- 48 http://www.google.co.jp/url?sa=t&rct=j&q=&esrc=s&frm=1&source=web&cd=4&cad=rja&ved=0CD8QFjAD&url=http://d.hatena.ne.jp/kazu-yamamoto/20130308/1362724125&ei=xebpUuCzBsyQkwWNtICQBQ&usg=AFQjCNFLj_65vQVwZmUbI8OL3td9Tr5H9A&bvm=bv.60444564,d.dGI
- 39 http://www.google.co.jp/url?sa=t&rct=j&q=&esrc=s&source=web&cd=3&ved=0CCEQFjAF&url=http://d.hatena.ne.jp/kazu-yamamoto/20140129/1391057824&ei=dvzpUsesEZHGqAPYuQE&usg=AFQjCNEFAxPAMZNSTs598NPozJ3-ECxMkw
- 36 http://www.google.co.jp/url?sa=t&rct=j&q=&esrc=s&source=web&cd=4&ved=0CD8QFjAD&url=http://d.hatena.ne.jp/kazu-yamamoto/20110526/1306399383&ei=E-fpUpaBEoGHlAXX9YCYAg&usg=AFQjCNGuPuUDHjQMlefyh-cXJzutgTLlNw&bvm=bv.60444564,d.dGI
- 30 https://www.google.com/
- 24 http://www.google.co.jp/url?sa=t&rct=j&q=&esrc=s&source=web&cd=5&ved=0CEkQFjAE&url=http://d.hatena.ne.jp/kazu-yamamoto/20110531/1306825905&ei=VfXpUu2ALoXwkAWmy4CoBw&usg=AFQjCNF20bMX8kixWZ14vHr8DlUPMKpukw&bvm=bv.60444564,d.dGI
- 20 http://d.hatena.ne.jp/
- 19 http://t.co/9yHUUIlF1m