2017-10-06 57 views
-2

我有一个titanic.txt数据集。它是在形式 - PassengerId,活了下来,Pclass,姓名,性别,年龄,SibSp,烘干,票务,票价,机舱,踏上 1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S 2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.2833,C85,C 3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2. 3101282,7.925,,S 4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35,1,0,113803,53.1,C123,S如何在unix中计算逐行比例

如果存活列是1,那么乘客幸存。登船就是乘客搭乘的港口。

我想计算登船港口中幸存者占总乘客的比例。 这怎么可以使用awk命令完成?

预期输出 - 1 C 0.553571 Q 0.38961 S 0.336957

+0

您能在这里添加预期的输出吗?那么我们就更容易引导。 – RavinderSingh13

+0

@ RavinderSingh13我已添加预期的输出 –

+0

@KarthikK,您的输出不符合您的条件。更新你的输出结果或者详细说明你的条件 – RomanPerekhrest

回答

0

这样的事情,没有测试

awk -F, 'NR>1 {sum[$NF]+=$2} 
     END {for(k in sum) print k,sum[k]/(NR-1)}' file 

然而,由于分母是总的乘客,计数本身可能更有意义。也许你想拥有每个港口的生存率?如果是这样,请添加count[$NF]++并将其除以END块中的值。

+1

你得到了错字,使用了数组'总和',访问了,数组'count' –

+0

对,固定... – karakfa

0

也许这将有助于在预期输出,其中从你得到Q 0.38961,你应该解释清楚你需要什么,这样你会得到提前反应不知道,否则会引起混乱:

$ cat f 
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S 
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.2833,C85,C 
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2. 3101282,7.925,,S 
4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35,1,0,113803,53.1,C123,S 

# denominator- total passengers of all ports with percentage 
# example : overall there were 3 passengers survived across all port, 
# in that port wise 
$ awk -F, '{sum[$NF]+=$2; total+=$2}END{for(k in sum)print k,sum[k]/total, (sum[k]/total)*100 }' f 
C 0.333333 33.3333 
S 0.666667 66.6667 

# denominator- total records of each port, with percentage 
# example : for port S, there were 3 passengers, 2 survived, so 66.66% 
awk -F, '{sum[$NF]+=$2; oc[$NF]++}END{for(k in sum)print k,sum[k]/oc[k],(sum[k]/oc[k])*100 }' f 
C 1 100 
S 0.666667 66.6667 

# denominator- total records in file, which karakfa suggested 
$ awk -F, '{sum[$NF]+=$2}END{for(k in sum)print k,sum[k]/NR }' f 
C 0.25 
S 0.5 
0

这计划,每次登船时,计算这艘登船人员的生活情况。

awk '{sum[NF]+=$2; tot[NF]++} END {for (emb in sum) print(emb, sum[emb]/tot[emb])}' file 
0
$ awk -F, '$2==1{a[$NF]++} END{for(i in a){print i,a[i]/NR}}' file 

$NF对应于最后字段即CS
a[$NF]创建带有按键的关联数组作为$NF和每当$2==1即第二个字段Survived是1

递增1的值

输出:

C 0.25 
S 0.5