哈斯克尔堆栈溢出

我正在写一个遗传算法来生成字符串“helloworld”。但是当n为10,000或更多时，evolve函数会产生堆栈溢出。哈斯克尔堆栈溢出

module Genetics where 

import Data.List (sortBy) 
import Random (randomRIO) 
import Control.Monad (foldM) 

class Gene g where 
    -- How ideal is the gene from 0.0 to 1.0? 
    fitness :: g -> Float 

    -- How does a gene mutate? 
    mutate :: g -> IO g 

    -- How many species will be explored? 
    species :: [g] -> Int 

orderFitness :: (Gene g) => [g] -> [g] 
orderFitness = reverse . sortBy (\a b -> compare (fitness a) (fitness b)) 

compete :: (Gene g) => [g] -> IO [g] 
compete pool = do 
    let s = species pool 
    variants <- (mapM (mapM mutate) . map (replicate s)) pool 
    let pool' = (map head . map orderFitness) variants 
    return pool' 

evolve :: (Gene g) => Int -> [g] -> IO [g] 
evolve 0 pool = return pool 
evolve n pool = do 
    pool' <- compete pool 
    evolve (n - 1) pool'

随着species pool = 8，8个基因池复制到8组。每组进行变异，并选择适合每个组进行进一步进化（回到8个基因）。

GitHub

来源

2011-05-10 mcandre

假设你正在使用'GHC -O2'为你的编译器，你的第一个版本没有在'evolve'功能的堆栈溢出。由于我们看不到“竞争”的实施，这就是可以说的。 – 2011-05-10 21:25:19

上面提供的GitHub链接（https://github.com/mcandre/genetics）指定了“竞争”以及确实使用'ghc -O2'的Makefile。我认为第一个例子中的<-'足以防止堆栈溢出。我不确定问题出在哪里。 – mcandre 2011-05-10 21:32:37

>> =应该等于符号，那里没有区别。 – alternative 2011-05-10 21:36:12

感谢唐的deepseq的建议，我能够将问题缩小到mapM mutate这使得太多thunk。新版本有mutate'，它使用seq来防止thunking。

module Genetics where 

import Data.List (maximumBy) 
import Random (randomRIO) 

class Gene g where 
    -- How ideal is the gene from 0.0 to 1.0? 
    fitness :: g -> Float 

    -- How does a gene mutate? 
    mutate :: g -> IO g 

    -- How many species will be explored in each round? 
    species :: [g] -> Int 

best :: (Gene g) => [g] -> g 
best = maximumBy (\a b -> compare (fitness a) (fitness b)) 

-- Prevents stack overflow 
mutate' :: (Gene g) => g -> IO g 
mutate' gene = do 
    gene' <- mutate gene 
    gene' `seq` return gene' 

drift :: (Gene g) => [[g]] -> IO [[g]] 
drift = mapM (mapM mutate') 

compete :: (Gene g) => [g] -> IO [g] 
compete pool = do 
    let islands = map (replicate (species pool)) pool 
    islands' <- drift islands 
    let representatives = map best islands' 
    return representatives 

evolve :: (Gene g) => Int -> [g] -> IO [g] 
evolve 0 pool = return pool 
evolve n pool = compete pool >>= evolve (n - 1)

GitHub

来源

2011-05-10 23:36:58 mcandre

如果你有兴趣的表现，我会使用一个快速随机数发生器，如：

The mersenne-random package

其次，compete看起来很可疑，因为它完全是懒惰的，尽管构建了一些潜在的大型结构。请尝试重写它是有点严格，使用deepseq锤：

import Control.DeepSeq  

compete :: (Gene g, NFData g) => [g] -> IO [g] 
compete pool = do 
    let s = species pool 
    variants <- (mapM (mapM mutate) . map (replicate s)) pool 
    let pool' = (map head . map orderFitness) variants 
    pool' `deepseq` return pool'

这个东西没有需要在IO，不过，（单独的问题）。像Rand单子可能是more appropriate。

来源

2011-05-10 22:39:12

@Don'deepseq'锤解决了问题（upvote）。但它感觉就像在作弊。我想知道在哪里竞争堆栈溢出发生。 – mcandre 2011-05-10 22:53:21

可能的候选者是使用嵌套'mapM'。但是，可以肯定：简介。像这样：http://stackoverflow.com/questions/5939630/best-way-of-looping-over-a-2-d-array-using-repa/5940537#5940537 – 2011-05-10 23:06:56

@mcandre你可能感兴趣[RWH ＃空间剖析]（http://book.realworldhaskell.org/read/profiling-and-optimization.html#id678078） – 2011-05-10 23:07:58

而不是使用(map head . map orderFitness)其中orderFitness是sortBy你可以使用maximumBy和单一map的。这并不会节省太多（因为你从O（n log n）到O（n），并且可能从消除双映射中获得另一个因子2），但是至少可以更简单和更高效。你也可以摆脱呼叫来撤消。

我怀疑这可以修复这个问题，而不是deepseq，但它应该是一个改进。

编辑：如果标准库和GHC是完美的，那么head . sortBy会产生相同的代码maximumBy和map head . map sortBy会产生对map (head . sortBy)相同的代码黯然没有这些东西很可能在实践中是真实的。 sortBy将倾向于做一堆额外的内存分配，因为它是一个分而治之的算法。结合地图是你有时得到的一种优化，但不应该依靠。

更重要的是，使用maximumBy更具说明性。很容易看到代码的作用和需要多长时间。利用优化也应该更容易，因为我们知道目标是什么，而不仅仅是我们如何得到它。

来源

2011-05-10 23:08:44

谢谢！它不能解决问题，但我会在解决方案中包含您的建议。 – mcandre 2011-05-10 23:40:09

哈斯克尔堆栈溢出

回答

相关问题