2015-03-19 80 views
1

我是编程新手,F#是我的第一个.NET语言。微妙的类型错误

我在Rosalind.info上尝试this problem。基本上,给定一个DNA字符串,我应该返回四个整数,计算符号'A','C','G'和'T'出现在字符串中的次数。

这里是到目前为止我所编写的代码:

open System.IO 
open System 

type DNANucleobases = {A: int; C: int; G: int; T: int} 

let initialLetterCount = {A = 0; C = 0; G = 0; T = 0} 

let countEachNucleobase (accumulator: DNANucleobases)(dnaString: string) = 
    let dnaCharArray = dnaString.ToCharArray() 
    dnaCharArray 
    |> Array.map (fun eachLetter -> match eachLetter with 
            | 'A' -> {accumulator with A = accumulator.A + 1} 
            | 'C' -> {accumulator with C = accumulator.C + 1} 
            | 'G' -> {accumulator with G = accumulator.G + 1} 
            | 'T' -> {accumulator with T = accumulator.T + 1} 
            | _ -> accumulator) 

let readDataset (filePath: string) = 
    let datasetArray = File.ReadAllLines filePath 
    String.Join("", datasetArray) 

let dataset = readDataset @"C:\Users\Unnamed\Desktop\Documents\Throwaway Documents\rosalind_dna.txt" 
Seq.fold countEachNucleobase initialLetterCount dataset 

但是,我已收到以下错误消息:

CountingDNANucleotides.fsx(23,10):错误FS0001:类型匹配。 期待 DNANucleobases - >字符串 - > DNANucleobases但鉴于一个 DNANucleobases - >字符串 - > DNANucleobases []类型 'DNANucleobases' 不匹配类型 'DNANucleobases []'

出了什么问题?我应该做什么改变来纠正我的错误?

+0

我认为你的'Array.map'应该是'array.iter',你需要返回累加器 – 2015-03-19 11:37:57

+0

你必须''折叠'dnaCharArray'数组,因为'countEachNucleobase'需要'DNANucleobases'的累加值,不是一个数组。 – 2015-03-19 12:03:19

回答

3

countEachNucleobase返回一个数组的累加器类型,而不仅仅是作为其第一个参数的累加器。因此,Seq.fold找不到'State参数的有效解决方案:它只是输入上的记录,而是输出上的数组。用于折叠的函数必须将累加器类型作为其第一个输入和它的输出。

在地方的Array.map在问题的代码,你可能已经使用Array.fold

let countEachNucleobase (accumulator: DNANucleobases) (dnaString: string) = 
    let dnaCharArray = dnaString.ToCharArray() 
    dnaCharArray 
    |> Array.fold (fun (accumulator : DNANucleobases) eachLetter -> 
     match eachLetter with 
     | 'A' -> {accumulator with A = accumulator.A + 1} 
     | 'C' -> {accumulator with C = accumulator.C + 1} 
     | 'G' -> {accumulator with G = accumulator.G + 1} 
     | 'T' -> {accumulator with T = accumulator.T + 1} 
     | _ -> accumulator) accumulator 

,然后在最后一行的呼叫将变为:

countEachNucleobase initialLetterCount dataset 

较短的版本

let readChar accumulator = function 
    | 'A' -> {accumulator with A = accumulator.A + 1} 
    | 'C' -> {accumulator with C = accumulator.C + 1} 
    | 'G' -> {accumulator with G = accumulator.G + 1} 
    | 'T' -> {accumulator with T = accumulator.T + 1} 
    | _ -> accumulator 

let countEachNucleobase acc input = Seq.fold readChar acc input 

由于字符串是字符序列,因此input将采用字符串以及字符数组或其他字符序列。

+0

感谢Vandroiy的回复。当我试图用你的Array.fold的建议,我得到一个错误消息说: “这个表达,预计将有 类型char [] 但这里有类型 DNANucleobases” – 2015-03-19 15:02:22

+0

@MY_G这很奇怪。你确定你正在使用这个代码吗?'countEachNucleobase'和最后一行被替换了吗?当我在互动中尝试它时,它运行良好。 – Vandroiy 2015-03-19 17:36:12

+0

Vandroiy,它现在有效。谢谢你的帮助。 :-) – 2015-03-20 09:17:05