在Haskell,ridge regression可以表示为:岭回归需要多少空间?
import Numeric.LinearAlgebra
createReadout :: Matrix Double → Matrix Double → Matrix Double
createReadout a b = oA <\> oB
where
μ = 1e-4
oA = (a <> (tr a)) + (μ * (ident $ rows a))
oB = a <> (tr b)
然而,该操作是非常昂贵的存储器。这是一个简约的例子,需要在我的机器上超过2GB,并需要3分钟执行。
import Numeric.LinearAlgebra
import System.Random
createReadout :: Matrix Double -> Matrix Double -> Matrix Double
createReadout a b = oA <\> oB
where
mu = 1e-4
oA = (a <> (tr a)) + (mu * (ident $ rows a))
oB = a <> (tr b)
teacher :: [Int] -> Int -> Int -> Matrix Double
teacher labelsList cols' correctRow = fromBlocks $ f <$> labelsList
where ones = konst 1.0 (1, cols')
zeros = konst 0.0 (1, cols')
rows' = length labelsList
f i | i == correctRow = [ones]
| otherwise = [zeros]
glue :: Element t => [Matrix t] -> Matrix t
glue xs = fromBlocks [xs]
main :: IO()
main = do
let n = 1500 -- <- The constant to be increased
m = 10000
cols' = 12
g <- newStdGen
-- Stub data
let labels = take m . map (`mod` 10) . randoms $ g :: [Int]
a = (n >< (cols' * m)) $ take (cols' * m * n) $ randoms g :: Matrix Double
teachers = zipWith (teacher [0..9]) (repeat cols') labels
b = glue teachers
print $ maxElement $ createReadout a b
return()
$小集团EXEC GHC - -O2 Test.hs
$时间./Test
./Test 190.16s用户5.22s系统106%的CPU 3:03.93总
的问题是增加恒定ñ,至少到n = 4000,而RAM由5GB限制。理论上矩阵求逆运算所需的最小空间是什么?这个操作如何在空间上得到优化?可以用更便宜的方法有效地替代岭回归?
我在读这个权利,'a'是一个1500 x 120000矩阵? –
完全正确。它可能会更大。 – penkovsky
矩阵[稀疏](https://en.wikipedia.org/wiki/Sparse_matrix)?这可以为你节省大量的时间和空间(但你可能需要像[共轭梯度](https://en.wikipedia.org/wiki/Conjugate_gradient_method))这样的专用算法。 – leftaroundabout