2016-09-27 82 views
3

我试图计算每行中我的元素的频率,我将解释: 我从包含诸如“pos,chr,ref,alt,id_disease”等元素的表中进行选择。 。计算每一行的频率

从这些我将不得不提取我REF频率,ALT即:

num_occurrencies_of(ref='A' and alt='C')/total number of rows

与此查询我得到几乎接近我的目标,其实id不计算正确的频率,它返回总是a constant

SELECT pos, chr, upper(ref||' '||alt) AS refalt, id_disease AS lvl15, t1.tot_var, t1.freq 
FROM varianti 
JOIN (SELECT count(*) AS tot_var,(count(*)::numeric/sum(count(*)) over()) as freq 
     FROM varianti)t1 ON TRUE 
WHERE length(ref)=1 AND length(alt)=1 AND chr similar to 'chr[\d X Y]*' 

所有我想要的是获取这样的数据:

chr pos refalt lvl15 freq tot_var 
1 120 AT  15 0.3 1000 
1 150 CG  30 0.01 1000 

tot_var =计数我所需要的行总数(它不能是1我计算每一行!)

ref和alt都可以有这些值(A, T,C,G)在每一个可能的排列,AA,AT,TA,TC,CT等。

我在代码中丢失了什么?

告诉我,如果你想了解更多的相关信息varianti的


例子:

chr pos ref alt id_disease 
chr1 152 A C 15 
chr3 487 T T 74 

这里是我的查询的输出:

pos   chr refalt lvl15 tot_var freq 
124338543 chr11 G A  69  1  0.000000677833751782702767 
124338595 chr11 C T  28  1  0.000000677833751782702767 
124361862 chr11 C .  53  1  0.000000677833751782702767 
124361899 chr11 T A  20  1  0.000000677833751782702767 
+2

你可以举一些你的表'varianti'的示例行吗? th – maximilienAndile

+0

在这里,您可以使用示例进行更新。 – xCloudx8

回答

1

根据这些信息,您所提供

SELECT DISTINCT chr, pos, 
upper(ref||' '||alt) AS refalt, id_disease AS lvl15, 
SUM(CASE WHEN (ref == 'A' AND alt == 'C')THEN 1 ELSE 0 END)/COUNT(*) AS 'freq', 
COUNT(*) AS 'tot_var' 
FROM varianti 

我还不确定'tot_var'是什么。获取实际的数据样本以及该数据样本本身的预期输出将非常有用。

编辑1:要获得每对夫妇的频率数据集中

SELECT DISTINCT upper(ref||' '||alt) AS refalt, 
COUNT(chr)/COUNT(*) AS 'freq' 
FROM varianti 
GROUP BY refalt 

编辑2:基于需求

SELECT varianti.chr, varianti.pos, 
upper(varianti.ref||' '||varianti.alt) AS refalt, varianti.id_disease AS lvl15, COUNT(*) AS 'tot_var', 
FROM varianti 
JOIN 
(SELECT DISTINCT upper(ref||' '||alt) AS refalt, 
    COUNT(chr)/COUNT(*) AS 'freq' 
    FROM varianti 
    GROUP BY refalt 
) refalt_table ON refalt_table.refalt = varianti.refalt 

编辑3更新查询:更新查询基于错误

SELECT chr, pos, upper(ref||' '||alt) as refalt, id_disease AS lvl15, refalt_table.freq as 'freq', (SELECT COUNT(*) FROM varianti tot where tot.pos = v.pos) as 'tot_var' 
FROM varianti v 
LEFT JOIN 
(SELECT DISTINCT UPPER(ref) as 'ref',UPPER(alt) as 'alt', 
    COUNT(pos)/(SELECT COUNT(*) FROM varianti vcount) AS 'freq' 
    FROM varianti 
    GROUP BY ref,alt 
) refalt_table ON refalt_table.ref = v.ref and refalt_table.alt = v.alt 
+0

嗯,没关系,但我怎么能得到每对夫妇的频率? – xCloudx8

+0

是啊,我不明白如何查询输入数据示例而不是您提供的SQL查询。如果我们可以得到输入样本并输出查询数据样本,那将是非常好的。 – woodhead92

+1

@ xCloudx8我编辑了解决方案以获取频率。 – woodhead92