■
スペル修正プログラムはどう書くかをrubyで書いてみる。*1
require 'set' class Correct def initialize (filename) File.open(filename) {|f| @nWords = train(words(f.read))} end def words (text) text.downcase.split(/\s|[,."]\s*/) end def train (feature) model = Hash.new(1) feature.each do |word| model[word] += 1 end model end def edits1 (word) wordList = Set.new (0...word.length).each do |n| wordList << word[0...n] + word[n+1..-1] wordList << word[0...n] + word[n+1..n+1] + word[n..n] + word[n+2..-1] unless n == word.length - 1 'abcdefghijklmnopqrstuvwxyz'.split(//).each do |c| wordList << word[0...n] + c + word[n + 1..-1] wordList << word[0...n] + c + word[n..-1] wordList << word[0..-1] + c if n == word.length - 1 end end wordList end def edits2 (word) wordList = Set.new edits1(word).each do |wordword| edits1(wordword).each do |wordwordword| wordList << wordwordword if @nWords.include?(wordwordword) end end wordList end def known(words) wordList = Set.new words.each {|w| wordList << w if @nWords.include?(w)} wordList end def correct(word) candidates = known([word]) | known(edits1(word)) | edits2(word) | [word] candidates.max{|c1, c2| @nWords[c1] <=> @nWords[c2]} end end correct = Correct.new('big.txt') while w = gets p correct.correct(w) end
実行
% ruby correct.rb
acess
"access"
forbiden
"forbidden"
supposidly
"supposedly"
いけてるっぽい。
自分の書き方もあるんだろうけど、pythonよりも行数が多くなってしまった。
でもまあいろいろ勉強になった気もするし良いとする。setなんて知らなかったし。
やっぱりpythonよりrubyがいいや。*2