您必须首先对所有值进行归一化。用正则表达式来做。
(def months ["JAN" "FEB" "MAR" "APR"
"MAY" "JUN" "JUL" "AUG"
"SEP" "OCT" "NOV" "DEC"])
(defn normalize-underscored [value]
(let [[_ text val month year]
(re-matches #"(.+?)_([\d.]+)_(\d+)/\d+/(\d{4})" value)]
[text
(Float/parseFloat val)
(months (dec (Long/parseLong month)))
year]))
(defn normalize-spaced [value]
(let [[_ text val month year]
(re-matches #"(.+?)\s([\d.]+)\s(\w{3})(\d{2,4})" value)]
[text (Float/parseFloat val) month
(if (== 2 (count year)) (str "20" year) year)]))
是如何规范化:
user> (normalize-underscored "XX_2.5_10/23/2015")
["XX" 2.5 "OCT" "2015"]
user> (normalize-spaced "XXX 1.000 OCT15")
["XXX" 1.0 "OCT" "2015"]
user> (normalize-spaced "ZZZ 3.500 JAN2016")
["ZZZ" 3.5 "JAN" "2016"]
,然后就比较标准化的版本:
(def underscored '("XXX_1_10/22/2015" "YYY_1.5_11/22/2015"
"XX_2.5_10/23/2015" "YY_5_11/26/2015"))
(def spaced #{"XXX 1.000 OCT15" "XX 2.500 OCT2015"
"ZZZ 3.500 JAN2016"})
(for [uv (map normalize-underscored underscored)
s spaced
:when (= uv (normalize-spaced s))]
s)
输出:
("XXX 1.000 OCT15" "XX 2.500 OCT2015")
或更好的格式化结果到更一致的形式,如这样的:
(map (partial apply format "%s %.3f %s%s")
(keep (set (map normalize-spaced spaced))
(map normalize-underscored underscored)))
输出:
("XXX 1.000 OCT2015" "XX 2.500 OCT2015")
你的第二个数据格式'{...}'是*地图*。当然应该是'#{}' - 一个* set *。 – Thumbnail
我无法更改数据结构中的格式。我测试过使用(保持#set列表)比较两者的方法,并且它返回通用值。但格式化日期以使它们相似是我面临的问题。 – Sri
随时添加该代码。它使得愿意回答的人更容易建立它 – cfrick