2011-03-11 53 views
6

我还在学习Haskell和我写了下面的基数排序功能。它似乎工作正常,但问题在于它存储效率低下。如果使用ghc编译,内存已经超过500MB,输入列表的大小为10000个元素。优化基数排序在Haskell

所以我要问你怎么可以在下面的算法/代码改进,使其在速度和内存方面更有效。什么是最好的开始?

import System.Random 

-- radixsort for positive integers. uses 10 buckets 
radixsort :: [Int] -> [Int] 
radixsort [] = [] 
radixsort xs = 
    -- given the data, get the number of passes that are required for sorting 
    -- the largest integer 
    let maxPos = floor ((log (fromIntegral (foldl max 0 xs))/log 10) + 1) 

     -- start sorting from digit on position 0 (lowest position) to position 'maxPos' 
     radixsort' ys pos 
     | pos < 0 = ys 
     | otherwise = let sortedYs = radixsort' ys (pos - 1) 
          newBuckets = radixsort'' sortedYs [[] | i <- [1..10]] pos 
         in [element | bucket <- newBuckets, element <- bucket] 

     -- given empty buckets, digit position and list, sort the values into 
     -- buckets 
     radixsort'' []  buckets _ = buckets 
     radixsort'' (y:ys) buckets pos = 
      let digit = div (mod y (10^(pos + 1))) (10^pos) 
       (bucketsBegin, bucketsEnd) = splitAt digit buckets 
       bucket = head bucketsEnd 
       newBucket = bucket ++ [y] 
      in radixsort'' ys (bucketsBegin ++ [newBucket] ++ (tail bucketsEnd)) pos 
    in radixsort' xs maxPos 

-- get an random array given an seed 
getRandIntArray :: Int -> [Int] 
getRandIntArray seed = (randomRs (0, div (maxBound :: Int) 2) (mkStdGen seed)) 

main = do 
     value <- (\x -> return x) (length (radixsort (take 10000 (getRandIntArray 0)))) 
     print value 
+1

你有没有考虑使用数组从IO单子? – Gabe 2011-03-11 20:43:11

+0

谢谢,我一定会尽快检查出其他数据类型,因为我觉得更舒服哈斯克尔基础。 – Timo 2011-03-11 21:56:21

+0

在'maxPos'你应该使用'foldl'',而不是'foldl'。另外,不是'floor(x + 1)'更好地表达为'ceiling x'? – 2011-03-12 06:34:42

回答

6

主要的问题是你的函数radixsort'',因为++为O(n),每次它拷贝给定的第一个参数列表。

pack (-1) r' _ = r' 
pack n r' relems = 
    let getn = (map snd) . (filter ((n==) . fst)) 
    in pack (n - 1) ((getn relems):r') relems 
radixsort'' elems pos = 
    let digit = \y -> div (mod y (10^(pos + 1))) (10^pos) 
     relems = zip (map digit elems) elems 
    in pack 9 [] relems 
+0

感谢您指出并提供您自己的解决方案。我真的很喜欢你设法将价值分类成桶。我有很多东西需要学习:) – Timo 2011-03-11 22:08:18

+2

''++是*为O(n)*,其中'N'是第一个列表的长度。使用它可能导致* O(n^2)*算法,您期望* O(n)*个算法。 – luqui 2011-03-12 05:06:05

+0

是的,对不起,我写得太快了。然而,在这个问题中,它被使用了几次,导致了一个二次算法。 (固定) – Kru 2011-03-12 13:55:40