ブログトップ 記事一覧 ログイン 無料ブログ開設

モーリスのシステム開発日記 このページをアンテナに追加 RSSフィード

2011-11-28 プロセス監視ツール God で delayed job を監視しよう。

プロセス監視ツール God で delayed job を監視しよう。

プロセス監視ツール の God で rails の非同期処理を行う delayed job の監視をしたいと思います。

まずは delayed_job を動かしましょう。capistrano を使って10プロセス動かします。

# cap deploy:start_delayed_job

deploy.rb に設定してあるタスクはこんな感じです。

...
namespace :deploy do
  task :start_delayed_job, :roles => :backyard_app do
    run <<-EOB
      cd #{deploy_to}/current;
      sudo RAILS_ENV=#{rails_env} BUNDLE_GEMFILE=#{current_path}/Gemfile bundle exec './script/delayed_job' -n 10 start
EOB
  end
end
...

次に god をインストール

gem install god --no-ri --no-rdoc

次に設定ファイル

RAILS_ROOT/config/god/delayed_job.god

を作りましょう。中身はこんな感じです。

RAILS_ENV = "production"
APPLICATION_ROOT = "/var/rails/xxxx"

God::Contacts::Email.defaults do |d|
  d.from_email = 'god@xxxx.com'
  d.from_name = 'God'
  d.delivery_method = :smtp
end

God.contact(:email) do |c|
  c.name = 'tech'
  c.group = 'tech_group'
  c.to_email = 'tech@xxxx.com'
end

10.times do |num|
  God.watch do |w|
    w.name = "dj-#{num}"
    w.group = 'delayed_job'
    w.interval = 60.seconds
    w.uid = 'root'
    w.gid = 'root'

    w.start = "/bin/bash -c 'cd #{APPLICATION_ROOT}/current; RAILS_ENV=#{RAILS_ENV} BUNDLE_GEMFILE=#{APPLICATION_ROOT}/current/Gemfile bundle exec ./script/delayed_job --identifier=#{num} start'"
    w.log = "#{APPLICATION_ROOT}/shared/log/delayed_job.log"

    w.start_grace = 30.seconds
    w.restart_grace = 30.seconds

    w.behavior(:clean_pid_file)

    w.pid_file = "#{APPLICATION_ROOT}/shared/pids/delayed_job.#{num}.pid"

    w.transition(:up, :start) do |on|
      on.condition(:process_exits) do |c|
        c.notify = 'tech'
      end
    end
  end
end

delayed_job のプロセスが亡くなったら、復活させて、メールでお知らせするわけです。

まずは、デーモン化せずに動作させてみましょう。この方が、ログが表示されてわかりやすいですよ。

$ god -D -c config/god/delayed_job.god

[root@xxxx]# god -D -c config/god/delayed_job.god
I [2011-11-28 16:33:39]  INFO: Loading config/god/delayed_job.god
I [2011-11-28 16:33:39]  INFO: Syslog enabled.
I [2011-11-28 16:33:39]  INFO: Using pid file directory: /var/run/god
I [2011-11-28 16:33:40]  INFO: Started on drbunix:///tmp/god.17165.sock
I [2011-11-28 16:33:40]  INFO: dj-0 move 'unmonitored' to 'up'
I [2011-11-28 16:33:40]  INFO: dj-0 registered 'proc_exit' event for pid 18024
I [2011-11-28 16:33:40]  INFO: dj-0 moved 'unmonitored' to 'up'
I [2011-11-28 16:33:40]  INFO: dj-1 move 'unmonitored' to 'up'
I [2011-11-28 16:33:40]  INFO: dj-1 registered 'proc_exit' event for pid 18030I [2011-11-28 16:33:40]  INFO: dj-2 move 'unmonitored' to 'up'
I [2011-11-28 16:33:40]  INFO: dj-2 registered 'proc_exit' event for pid 18036
I [2011-11-28 16:33:40]  INFO: dj-2 moved 'unmonitored' to 'up'
I [2011-11-28 16:33:40]  INFO: dj-3 move 'unmonitored' to 'up'

I [2011-11-28 16:33:40]  INFO: dj-3 registered 'proc_exit' event for pid 18042
I [2011-11-28 16:33:40]  INFO: dj-3 moved 'unmonitored' to 'up'
I [2011-11-28 16:33:40]  INFO: dj-1 moved 'unmonitored' to 'up'
I [2011-11-28 16:33:40]  INFO: dj-4 move 'unmonitored' to 'up'I [2011-11-28 16:33:40]  INFO: dj-6 move 'unmonitored' to 'up'

I [2011-11-28 16:33:40]  INFO: dj-4 registered 'proc_exit' event for pid 18048
I [2011-11-28 16:33:40]  INFO: dj-4 moved 'unmonitored' to 'up'
I [2011-11-28 16:33:40]  INFO: dj-8 move 'unmonitored' to 'up'
I [2011-11-28 16:33:40]  INFO: dj-6 registered 'proc_exit' event for pid 18060
I [2011-11-28 16:33:40]  INFO: dj-6 moved 'unmonitored' to 'up'
I [2011-11-28 16:33:40]  INFO: dj-5 move 'unmonitored' to 'up'
I [2011-11-28 16:33:40]  INFO: dj-5 registered 'proc_exit' event for pid 18054
I [2011-11-28 16:33:40]  INFO: dj-8 registered 'proc_exit' event for pid 18072
I [2011-11-28 16:33:40]  INFO: dj-8 moved 'unmonitored' to 'up'
I [2011-11-28 16:33:40]  INFO: dj-5 moved 'unmonitored' to 'up'I [2011-11-28 16:33:40]  INFO: dj-9 move 'unmonitored' to 'up'
I [2011-11-28 16:33:40]  INFO: dj-9 registered 'proc_exit' event for pid 18327

I [2011-11-28 16:33:40]  INFO: dj-7 move 'unmonitored' to 'up'
I [2011-11-28 16:33:40]  INFO: dj-7 registered 'proc_exit' event for pid 18066
I [2011-11-28 16:33:40]  INFO: dj-7 moved 'unmonitored' to 'up'
I [2011-11-28 16:33:40]  INFO: dj-9 moved 'unmonitored' to 'up'
I [2011-11-28 16:33:41]  INFO: dj-0 [ok] process is running (ProcessRunning)
I [2011-11-28 16:33:41]  INFO: dj-2 [ok] process is running (ProcessRunning)
I [2011-11-28 16:33:41]  INFO: dj-3 [ok] process is running (ProcessRunning)
I [2011-11-28 16:33:41]  INFO: dj-1 [ok] process is running (ProcessRunning)
I [2011-11-28 16:33:41]  INFO: dj-4 [ok] process is running (ProcessRunning)
I [2011-11-28 16:33:41]  INFO: dj-8 [ok] process is running (ProcessRunning)
I [2011-11-28 16:33:41]  INFO: dj-6 [ok] process is running (ProcessRunning)
I [2011-11-28 16:33:41]  INFO: dj-5 [ok] process is running (ProcessRunning)
I [2011-11-28 16:33:41]  INFO: dj-7 [ok] process is running (ProcessRunning)
I [2011-11-28 16:33:41]  INFO: dj-9 [ok] process is running (ProcessRunning)
I [2011-11-28 16:33:46]  INFO: dj-0 [ok] process is running (ProcessRunning)
I [2011-11-28 16:33:46]  INFO: dj-2 [ok] process is running (ProcessRunning)
I [2011-11-28 16:33:46]  INFO: dj-3 [ok] process is running (ProcessRunning)
I [2011-11-28 16:33:46]  INFO: dj-1 [ok] process is running (ProcessRunning)
I [2011-11-28 16:33:46]  INFO: dj-4 [ok] process is running (ProcessRunning)
I [2011-11-28 16:33:46]  INFO: dj-8 [ok] process is running (ProcessRunning)
I [2011-11-28 16:33:46]  INFO: dj-6 [ok] process is running (ProcessRunning)
I [2011-11-28 16:33:46]  INFO: dj-5 [ok] process is running (ProcessRunning)
I [2011-11-28 16:33:46]  INFO: dj-7 [ok] process is running (ProcessRunning)
I [2011-11-28 16:33:46]  INFO: dj-9 [ok] process is running (ProcessRunning)

10プロセス監視していますね。dj-0 から dj-9 まで名前がつけてあります。

1プロセス殺してみましょう。

# ps aux |grep job
root     21797  0.0  2.2 289848 90544 ?        Sl   16:41   0:00 delayed_job.0
root     21803  0.0  2.2 289848 90564 ?        Sl   16:41   0:00 delayed_job.1
root     21809  0.0  2.2 289988 90612 ?        Sl   16:41   0:00 delayed_job.2
root     21815  0.0  2.2 289988 90632 ?        Sl   16:41   0:00 delayed_job.3
root     21821  0.0  2.2 289988 90652 ?        Sl   16:41   0:00 delayed_job.4
root     21827  0.0  2.2 289988 90672 ?        Sl   16:41   0:00 delayed_job.5
root     21833  0.0  2.2 289988 90692 ?        Sl   16:41   0:00 delayed_job.6
root     21839  0.0  2.2 289988 90712 ?        Sl   16:41   0:00 delayed_job.7
root     21845  0.0  2.2 290132 90732 ?        Sl   16:41   0:00 delayed_job.8
root     21851  0.0  2.2 290132 90752 ?        Sl   16:41   0:00 delayed_job.9
root     21854  1.2  0.4 263060 18920 pts/0    Sl+  16:41   0:00 /opt/ruby-1.9.2-p290/bin/ruby /opt/ruby-1.9.2-p290/bin/god -D -c config/god/delayed_job.god
root     21946  0.0  0.0  65464   856 pts/1    R+   16:42   0:00 grep job

# kill 21851

あ、仕事をし始めましたよ。先ほど god を動かしたウィンドウに何か出ましたよ。

I [2011-11-28 16:42:35]  INFO: dj-8 [ok] process is running (ProcessRunning)
I [2011-11-28 16:42:35]  INFO: dj-7 [ok] process is running (ProcessRunning)
I [2011-11-28 16:42:35]  INFO: dj-9 sent email to tech@xxxx.com via sendmail (Email)
I [2011-11-28 16:42:35]  INFO: dj-9 move 'up' to 'start'
I [2011-11-28 16:42:35]  INFO: dj-9 deregistered 'proc_exit' event for pid 21851
I [2011-11-28 16:42:35]  INFO: dj-9 before_start: no pid file to delete (CleanPidFile)
I [2011-11-28 16:42:35]  INFO: dj-9 start: /bin/bash -c 'cd /var/rails/xxxx/current; RAILS_ENV=production BUNDLE_GEMFILE=/var/rails/xxxx/current/Gemfile bundle exec ./script/delayed_job --identifier=9 start'
I [2011-11-28 16:42:40]  INFO: dj-0 [ok] process is running (ProcessRunning)
I [2011-11-28 16:42:40]  INFO: dj-1 [ok] process is running (ProcessRunning)

どうでしょう。復活したでしょうか。

[root@xxxx]# ps aux |grep job
root     21797  0.0  2.2 289848 90544 ?        Sl   16:41   0:00 delayed_job.0
root     21803  0.0  2.2 289848 90564 ?        Sl   16:41   0:00 delayed_job.1
root     21809  0.0  2.2 289988 90612 ?        Sl   16:41   0:00 delayed_job.2
root     21815  0.0  2.2 289988 90632 ?        Sl   16:41   0:00 delayed_job.3
root     21821  0.0  2.2 289988 90652 ?        Sl   16:41   0:00 delayed_job.4
root     21827  0.0  2.2 289988 90672 ?        Sl   16:41   0:00 delayed_job.5
root     21833  0.0  2.2 289988 90692 ?        Sl   16:41   0:00 delayed_job.6
root     21839  0.0  2.2 289988 90712 ?        Sl   16:41   0:00 delayed_job.7
root     21845  0.0  2.2 290132 90732 ?        Sl   16:41   0:00 delayed_job.8
root     21854  0.3  0.4 263060 19016 pts/0    Sl+  16:41   0:00 /opt/ruby-1.9.2-p290/bin/ruby /opt/ruby-1.9.2-p290/bin/god -D -c config/god/delayed_job.god
root     21974  0.0  2.2 345276 90748 ?        Sl   16:42   0:00 delayed_job.9
root     21980  0.0  0.0  65464   856 pts/1    S+   16:43   0:00 grep job

元に戻りましたね。

これで安心ですね。

例えば、delayed_job はDBコネクション取得出来ないと死にますからね。

なかなか、簡単で便利そうですね。

PR: 帳票生成がクラウドで出来る会計ソフトはツバイソ

スパム対策のためのダミーです。もし見えても何も入力しないでください
ゲスト


画像認証

トラックバック - http://d.hatena.ne.jp/maurice38/20111128/1322469725
リンク元