ClojureのtrampolineをCLで

Programing Clojureを読んでいたところ、相互再帰のためのツールとしてtrampolineという関数が紹介されていました。クロージャが返される限り、返されたクロージャを呼び続ける関数のようです。

現状のClojureは末尾呼び出しの最適化を行わないため、trampolineが必要なようです。

CLで書くとこんな感じでしょうか？

(defun trampoline (func &rest args)
  (loop 
     :for result = (apply func args) :then (funcall result)
     :unless (functionp result)
     :do (return result)))

(defun odd-p (n)
  (if (= n 0)
      nil
      (lambda () (even-p (- n 1)))))

(defun even-p (n)
  (if (= n 0)
      t
      (lambda () (odd-p (- n 1)))))

Mac OS X 10.5.7上のCCL-1.3での結果です。

CL-USER> (time (trampoline #'even-p 1000000))
(TRAMPOLINE #'EVEN-P 1000000) took 242,967 microseconds (0.242967 seconds) to run 
                    with 2 available CPU cores.
During that period, 176,438 microseconds (0.176438 seconds) were spent in user mode
                    33,438 microseconds (0.033438 seconds) were spent in system mode
41,280 microseconds (0.041280 seconds) was spent in GC.
 64,000,016 bytes of memory allocated.
T
CL-USER> (time (trampoline #'odd-p 1000000))
(TRAMPOLINE #'ODD-P 1000000) took 316,594 microseconds (0.316594 seconds) to run 
                    with 2 available CPU cores.
During that period, 219,069 microseconds (0.219069 seconds) were spent in user mode
                    47,793 microseconds (0.047793 seconds) were spent in system mode
49,062 microseconds (0.049062 seconds) was spent in GC.
 64,000,016 bytes of memory allocated.
NIL
CL-USER>

ちゃんと動作しました。

ちなみに、以下のように普通に相互再帰として定義した場合、

(defun odd-p (n)
  (if (= n 0)
      nil
      (even-p (- n 1))))

(defun even-p (n)
  (if (= n 0)
      t
      (odd-p (- n 1))))

結果は、trampolineを使うよりかなり速くなりました。

CL-USER> (time (even-p 1000000))
(EVEN-P 1000000) took 9,237 microseconds (0.009237 seconds) to run 
                    with 2 available CPU cores.
During that period, 5,285 microseconds (0.005285 seconds) were spent in user mode
                    98 microseconds (0.000098 seconds) were spent in system mode
T
CL-USER> (time (odd-p 1000000))
(ODD-P 1000000) took 7,651 microseconds (0.007651 seconds) to run 
                    with 2 available CPU cores.
During that period, 7,327 microseconds (0.007327 seconds) were spent in user mode
                    57 microseconds (0.000057 seconds) were spent in system mode
NIL
CL-USER>

disassembleの結果を見ると、CCLは末尾呼び出しを最適化してくれるようです。

悔しいので、末尾呼び出し最適化をしない処理系が無いか模索。
ACL8.1評価版で試してみます。

CL-USER> (time (odd-p 1000000))

デバッガが立ち上がりました。期待通りスタックオーバーフローしたようです。

Stack overflow (signal 1000)
   [Condition of type SYNCHRONOUS-OPERATING-SYSTEM-SIGNAL]

ここで先ほどのtrampolineを使うと…

CL-USER> (time (trampoline #'even-p 1000000))
; cpu time (non-gc) 70 msec user, 0 msec system
; cpu time (gc)     30 msec user, 0 msec system
; cpu time (total)  100 msec user, 0 msec system
; real time  108 msec
; space allocation:
;  2 cons cells, 24,000,248 other bytes, 0 static bytes
T
CL-USER> (time (trampoline #'odd-p 1000000))
; cpu time (non-gc) 60 msec user, 0 msec system
; cpu time (gc)     50 msec user, 0 msec system
; cpu time (total)  110 msec user, 0 msec system
; real time  109 msec
; space allocation:
;  2 cons cells, 24,000,024 other bytes, 0 static bytes
NIL

ちゃんと計算出来ました。

なお、SBCL 1.0.28も末尾呼び出しの最適化をしてくれました。
ecl-0.9jは末尾呼び出しの最適化をしないようです。