Hatena::ブログ(Diary)

kesnke学習帳 このページをアンテナに追加 RSSフィード

2012-07-26

[]行同士の内積を計算するプログラム

仕事でCで書いてたプログラムHaskellで書いてみる。

やりたいことは以下。

1. 入力は10000 * 30の実数値が入ったtsvファイル

2. 各行ごとにその他の全行との内積をとる

3. 2の結果のうち上位n件を行番号とともに出力する

で、書いてみた。

import System.Environment (getArgs)
import Data.List (sortBy)
import Data.List.Split (splitOn)
import qualified Data.Vector.Unboxed as U

type RowNum    = Int
type RowVec    = U.Vector Double
type RowData   = (RowNum, RowVec)
type Score     = Double
type PairScore = (RowNum, RowNum, Score)

scoreLimit = 5

main = do
    args <- getArgs
    contents <- readFile $ head args
    let rds = parseContents contents
    putStr $ unlines $ map showPairScore $ rowScore rds

parseContents :: String -> [RowData]
parseContents contents = zip [0..] (map parseLine $ lines contents)
    where
        parseLine = U.fromList . map read . filter (/= "") . splitOn "\t"

showPairScore :: PairScore -> String
showPairScore (n1, n2, score) = (show n1) ++ "\t" ++ (show n2) ++ "\t" ++ (show score)

dot :: RowVec -> RowVec -> Score
dot xs ys = U.foldr (+) 0 $ U.zipWith (*) xs ys

genPairScore :: RowData -> RowData -> PairScore
genPairScore rd1 rd2 = (fst rd1, fst rd2, dot (snd rd1) (snd rd2))

rowScore :: [RowData] -> [PairScore]
rowScore rds = concatMap (rowScore1 rds) rds

rowScore1 :: [RowData] -> RowData -> [PairScore]
rowScore1 rds rd1 = take scoreLimit . sortBy cmpScoreR . map (genPairScore rd1) . filter (not . isSameRow) $ rds
    where
        isSameRow rd2 = (fst rd1) == (fst rd2)
        cmpScoreR (_, _, sa) (_, _, sb) = sb `compare` sa

コンパイルはこんな感じ。

ghc -O2 -o ip ip.hs 

実行時間はこんなもの。

% time ./ip data.tsv > /dev/null                                                                  [~/work/haskell/in_prd]
./ip data.tsv > /dev/null  102.40s user 1.00s system 99% cpu 1:43.47 total

工夫したところといえば、リストの代わりにVectorを使ったことくらい。

もっと高速化できないだろうか?

プロファイルとってみると、dotが実行時間の多くを占めている。

単体で遅いというよりも実行回数が多くて時間を取っている様子。

1回の実行をより早くするにも、このコードではどうしたものかちょっと見当つかない。

もう少し調べてみる。

        Thu Jul 26 04:54 2012 Time and Allocation Profiling Report  (Final)

           ip +RTS -p -RTS data.tsv

        total time  =      119.72 secs   (119715 ticks @ 1000 us, 1 processor)
        total alloc = 23,697,084,912 bytes  (excludes profiling overheads)

COST CENTRE         MODULE  %time %alloc

dot                 Main     63.5   16.9
rowScore1           Main     20.3   63.3
parseContents       Main      8.7   15.4
rowScore1.cmpScoreR Main      4.8    0.0
showPairScore       Main      1.0    1.8
main                Main      0.4    2.5


                                                                               individual     inherited
COST CENTRE               MODULE                             no.     entries  %time %alloc   %time %alloc

MAIN                      MAIN                                69           0    0.0    0.0   100.0  100.0
 main                     Main                               139           0    0.4    2.5   100.0  100.0
  showPairScore           Main                               153       50000    1.0    1.8     1.0    1.8
  rowScore                Main                               145           1    0.0    0.0    89.9   80.2
   rowScore1              Main                               146       10000   20.3   63.3    89.9   80.2
    genPairScore          Main                               150    99990000    1.0    0.0    64.5   16.9
     dot                  Main                               151    99990000   63.5   16.9    63.5   16.9
    rowScore1.cmpScoreR   Main                               149   141793430    4.8    0.0     4.8    0.0
    rowScore1.isSameRow   Main                               148   100000000    0.3    0.0     0.3    0.0
  main.rds                Main                               140           1    0.0    0.0     8.7   15.4
   parseContents          Main                               141           1    8.7   15.4     8.7   15.4
 CAF:main1                Main                               136           0    0.0    0.0     0.0    0.0
  main                    Main                               138           1    0.0    0.0     0.0    0.0
 CAF:a_r3Qz               Main                               135           0    0.0    0.0     0.0    0.0
  parseContents.parseLine Main                               152           1    0.0    0.0     0.0    0.0
 CAF:lvl1_r3Qu            Main                               134           0    0.0    0.0     0.0    0.0
  main                    Main                               142           0    0.0    0.0     0.0    0.0
   main.rds               Main                               143           0    0.0    0.0     0.0    0.0
    parseContents         Main                               144           0    0.0    0.0     0.0    0.0
 CAF:scoreLimit_rR0       Main                               133           0    0.0    0.0     0.0    0.0
  scoreLimit              Main                               147           1    0.0    0.0     0.0    0.0
 CAF                      GHC.IO.Handle.FD                   115           0    0.0    0.0     0.0    0.0
 CAF                      GHC.Float                          111           0    0.0    0.0     0.0    0.0
 CAF                      GHC.IO.Encoding                    109           0    0.0    0.0     0.0    0.0
 CAF                      GHC.Conc.Signal                    102           0    0.0    0.0     0.0    0.0
 CAF                      Text.Read.Lex                       96           0    0.0    0.0     0.0    0.0
 CAF                      GHC.IO.Encoding.Iconv               95           0    0.0    0.0     0.0    0.0
 CAF                      GHC.IO.FD                           94           0    0.0    0.0     0.0    0.0
 CAF                      GHC.Integer.Logarithms.Internals    77           0    0.0    0.0     0.0    0.0

2012-04-26

[][]Scheme手習いをHaskellで 第4章

「第4章 数遊び」

まず基本的な演算を作る。

isZero :: S -> Bool
isZero = eq (Number 0)

add1 :: S -> S
add1 (Number n) = Number (n + 1)

sub1 :: S -> S
sub1 (Number n) = Number (n - 1)

plus :: S -> S -> S
plus n (Number 0) = n
plus n m = add1 (plus n (sub1 m))

minus :: S -> S -> S
minus n (Number 0) = n
minus n m = sub1 (minus n (sub1 m)

数のリストに対する操作。

addtup :: [S] -> S
addtup [] = Number 0
addtup (x:xs) = plus x (addtup xs)

tupPlus :: [S] -> [S] -> [S]
tupPlus xs [] = xs
tupPlus [] ys = ys
tupPlus (x:xs) (y:ys) = (plus x y) : tupPlus xs ys

さらに演算を作っていく。

multi :: S -> S -> S
multi n (Number 0) = Number 0
multi n m = plus n (multi n (sub1 m))

lt :: S -> S -> Bool
lt _ (Number 0) = False
lt (Number 0) _ = True
lt n m = lt (sub1 n) (sub1 m)

gt :: S -> S -> Bool
gt (Number 0) _ = False
gt _ (Number 0) = True
gt n m = gt (sub1 n) (sub1 m)

expt :: S -> S -> S
expt _ (Number 0) = Number 1
expt n m = multi n (expt n (sub1 m))

dv :: S -> S -> S
dv n m
    | lt n m = Number 0
    | otherwise = add1 (dv (minus n m) m)

数を使った再帰

len :: [S] -> S
len []     = Number 0
len (x:xs) = add1 (len xs)

pick :: S -> [S] -> S
pick n (x:xs)
    | sub1 n == Number 0 = x
    | otherwise = pick (sub1 n) xs

rempick :: S -> [S] -> [S]
rempick n (x:xs)
    | sub1 n == Number 0 = xs
    | otherwise = x : rempick (sub1 n) xs

occur :: S -> [S] -> S
occur _ [] = Number 0
occur a (x:xs)
    | eq a x = add1 (occur a xs)
    | otherwise = occur a xs

isOne :: S -> Bool
isOne = eq (Number 1)

rempick' :: S -> [S] -> [S]
rempick' n (x:xs)
    | isOne n   = xs
    | otherwise = x : rempick' (sub1 n) x

数と文字が混ざったリストの操作。

isNumber :: S -> Bool
isNumber (Number _) = True
isNumber _          = False

noNums :: [S] -> [S]
noNums [] = []
noNums (x:xs)
    | isNumber x = noNums xs
    | otherwise  = x : noNums xs

allNums :: [S] -> [S]
allNums [] = []
allNums (x:xs)
    | isNumber x = x : allNums xs
    | otherwise  = allNums xs

eqan = eq

2012-04-22

[][]Scheme手習いをHaskellで 第1,2章改

48時間でSchemeを書こうを見ると、リストはそのままHaskellのリストにしているようだ。これのほうがわかりやすいので、Hakellのリストを使った版に今までのを書き直してみる。

それはそうと、今9章まで読み進めてきたんだが、だんだん複雑にえらいことになってきている。Haskellで書いてみるこの試み、無事に最後まで行けるかなぁ。

data S = Atom String | Number Integer | List [S] deriving (Show, Eq)
type L = [S]

car :: L -> S
car = head

cdr :: L -> L
cdr = tail

cons :: S -> L -> L
cons = (:)

isNull :: L -> Bool
isNull [] = True
isNull _  = False

isAtom :: S -> Bool
isAtom (List _) = False
isAtom _        = True

eq :: S -> S -> Bool
eq (Atom a1) (Atom a2) = a1 == a2
eq (Number n1) (Number n2) = n1 == n2
eq (List l1) (List l2) = l1 == l2
eq _ _ = False

isLat :: L -> Bool
isLat [] = True
isLat (x:xs)
    | isAtom x  = isLat xs
    | otherwise = False

isMember :: S -> L -> Bool
isMember _ []     = False
isMember s (x:xs) = or [(eq s x), (isMember s xs)]

[][]Scheme手習いをHaskellで 第3章

「偉大なるCons」。パターンに慣れてしまうと簡単。


rember :: S -> [S] -> [S]
rember _ [] = []
rember a (x:xs)
    | a == x    = xs
    | otherwise = x:(rember a xs)

firsts :: [[S]] -> [S]
firsts [] = []
firsts ((x:xs):xss) = x : firsts xss

insertR :: S -> S -> [S] -> [S]
insertR _ _ [] = []
insertR new old (x:xs)
    | old == x  = x : new : xs
    | otherwise = x : insertR new old xs

insertL :: S -> S -> [S] -> [S]
insertL _ _ [] = []
insertL new old (x:xs)
    | old == x  = new : x : xs
    | otherwise = x : insertL new old xs

subst :: S -> S -> [S] -> [S]
subst _ _ [] = []
subst new old (x:xs)
    | old == x  = new : xs
    | otherwise = x : subst new old xs

subst2 :: S -> S -> S -> [S] -> [S]
subst2 _ _ _ [] = []
subst2 new o1 o2 (x:xs)
    | or [o1 == x, o2 == x] = new : xs
    | otherwise 

これだと最初にあったものしか変更できないので、全体に対して変更するmultiXXXを作る。

multirember :: S -> [S] -> [S]
multirember _ [] = []
multirember a (x:xs)
    | a == x    = multirember a xs
    | otherwise = x:(multirember a xs)

multiinsertR :: S -> S -> [S] -> [S]
multiinsertR _ _ [] = []
multiinsertR new old (x:xs)
    | old == x  = x : new : multiinsertR new old xs
    | otherwise = x : multiinsertR new old xs

multiinsertL :: S -> S -> [S] -> [S]
multiinsertL _ _ [] = []
multiinsertL new old (x:xs)
    | old == x  = new : x : multiinsertL new old xs
    | otherwise = x : multiinsertL new old xs

multisubst :: S -> S -> [S] -> [S]
multisubst _ _ [] = []
multisubst new old (x:xs)
    | old == x  = new : multisubst new old xs
    | otherwise = x : multisubst new old xs

2012-04-18

[][]Scheme手習いをHaskellで 第2章

今日はこれだけ。

でもなんか無理がきてる感じがするなぁ。

isLat :: L -> Bool
isLat Nil = True
isLat l
    | isAtom (car l) = isLat (cdr l)
    | otherwise      = False

isMember :: S -> L -> Bool
isMember _ Nil = False
isMember s l = or [(eq s (car l)), (isMember s (cdr l))]

2012-04-17

[][]Scheme手習いをHaskellで 第1章

思うところあって、Scheme手習いを買ったので、これをHaskellで実装してみる。

Haskellだと型の違うリストを持てないので(多分。。)、データ型を定義してみた。

SとLに分かれているところが不格好な気がする。おいおい見なおしていこうと思う。

data S = Atom String | Number Integer | List L deriving (Show, Eq)
data L = Nil | Cons S L deriving (Show, Eq)

car :: L -> S
car (Cons s _) = s

cdr :: L -> L
cdr (Cons _ l) = l

cons :: S -> L -> L
cons s l = Cons s l

isNull :: L -> Bool
isNull Nil = True
isNull _   = False

isAtom :: S -> Bool
isAtom (Atom   _) = True
isAtom (Number _) = True
isAtom _          = False

eq :: S -> S -> Bool
eq (Atom a1) (Atom a2) = a1 == a2
eq (Number n1) (Number n2) = n1 == n2
eq (List l1) (List l2) = l1 == l2
eq _ _ = False