論文『Attention Is All You Need』のScaled Dot-Product Attentionの説明部分 引用 The two most commonly used attention functions are additive attention [2], and dot-product (multiplicative) attention. Dot-product attention is identical to our algorithm, except for the scaling factor of √ 1 dk . Additive attention …