2012-04-13 96 views
1

我的问题是如何使用map和doseq重新编写以下reduce方案?以下解决方案我遇到了很多麻烦。迁移减少到地图

该解决方案是为了解决以下问题。具体来说,我有两个由clojure-csv解析的csv文件。每个向量向量可以被称为bene-data和gic-data。我想在每行bene-data的列中取值,看看这个值是否是gic-data中一行中的另一列。我想将那些在gic-data中找不到的bene-data值累加到一个向量中。我最初试图积累到一张地图中,并且在尝试调试打印时从堆栈溢出开始。最终,我想将这些数据与一些静态文本结合起来,然后将其吐出到报告文件中。

以下功能:

(defn is-a-in-b 
    "This is a helper function that takes a value, a column index, and a 
    returned clojure-csv row (vector), and checks to see if that value 
    is present. Returns value or nil if not present." 
    [cmp-val col-idx csv-row] 

    (let [csv-row-val (nth csv-row col-idx nil)] 
     (if (= cmp-val csv-row-val) 
      cmp-val 
      nil))) 

(defn key-pres? 
    "Accepts a value, like an index, and output from clojure-csv, and looks 
    to see if the value is in the sequence at the index. Given clojure-csv 
    returns a vector of vectors, will loop around until and if the value 
    is found." 

    [cmp-val cmp-idx csv-data] 
    (reduce 
     (fn [ret-rc csv-row] 
      (let [temp-rc (is-a-in-b cmp-val cmp-idx csv-row)] 
       (if-not temp-rc 
        (conj ret-rc cmp-val)))) 
     [] 
     csv-data)) 


(defn test-key-inclusion 
    "Accepts csv-data param and an index, a second csv-data param and an index, 
    and searches the second csv-data instances' rows (at index) to see if 
    the first file's data is located in the second csv-data instance." 

    [csv-data1 pkey-idx1 csv-data2 pkey-idx2 lnam-idx fnam-idx] 

    (reduce 
     (fn [out-log csv-row1] 
      (let [cmp-val (nth csv-row1 pkey-idx1 nil) 
        lnam (nth csv-row1 lnam-idx nil) 
        fnam (nth csv-row1 fnam-idx) 
        temp-rc (first (key-pres? cmp-val pkey-idx2 csv-data2))] 

      (println (vector temp-rc cmp-val lnam fnam)) 
      (into out-log (vector temp-rc cmp-val lnam fnam)))) 
     [] 
     csv-data1)) 

代表我试图解决这个问题。我通常碰到试图使用doseq和map的墙,因为我无处积累所得数据,除非我使用循环重复。

回答

2

此解决方案将第2列的所有内容都读入一个集合(因此,它非惰性)以便于编写。对于第1列的每个值,它也应该比重新扫描第2列要好。如果第2列太大而无法在内存中读取,则根据需要进行调整。

(defn column 
    "extract the values of a column out of a seq-of-seqs" 
    [s-o-s n] 
    (map #(nth % n) s-o-s)) 

(defn test-key-inclusion 
    "return all values in column1 that arent' in column2" 
    [column1 column2] 
    (filter (complement (into #{} column2)) column1)) 

user> (def rows1 [[1 2 3] [4 5 6] [7 8 9]]) 
#'user/rows1 

user> (def rows2 '[[a b c] [d 2 f] [g h i]]) 
#'user/rows2 

user> (test-key-inclusion (column rows1 1) (column rows2 1)) 
(5 8) 
+0

谢谢。我正在测试它。 – octopusgrabbus 2012-04-13 15:28:49

+0

也许'(defn test-key-inclusion [column1 column2](remove(set column2)column1))'?帮助你自己。 – Thumbnail 2014-04-15 13:45:20