2011-04-21 44 views
0

我有一个csv文件,我试图在bash中进行解析。每行的第一个字段是格式为yyyy-mm-dd hh:mm:ss的时间戳。每10分钟生成六行,我在下面添加了一个小样本。使用bash解析文件,查找第一个唯一值

我想要做的是获得每天的前6行。每天的第一项可以在00:00:xx和00:10:xx之间随时进行,因此“00:0”的grep不起作用。

2010-04-23 00:04:39,0,0,4666724,3217665,28866,28866,0.92,65,
2010-04-23 00:04:39.1,0.4666724, 3217663,20832,20832,0.62,65,
2010-04-23 00:04:42.2,0.4666724,3217662,14702,14702,0.46,65,
2010-04-23 00:04: 430.3,0.4666724,3217664,27739,27739,0.92,65,
2010-04-23 00:04:39,4,0,4666724,3217664,25105,25105,0.77,65,
2010 -04-23 00:04:430.5,0,4666724,3217664,24546,24546,0.77,65,
2010-04-23 00:14:40.0,0.4666724,3217665,29226,29226 ,0.92,65,
2010-04-23 00:14:430.1,0,4666724,3217663,21552,21552,0.62,65,
2010-04-23 00:14:42,90,90,62,62,63,66,65 ,
2010-04-23 00:14:430.3,0,4666724,3217664,28459,28459,0.92,65,
2010-04-23 00:14:430.4,0,4666724,3217664 ,25825,25825,0.77,65,
2010-04-23 00:14:35,906,4666724,3217664,25266,25266,0.77,65,
2010-04-23 00:24:43 ,0.0,0,4666724,3217665,29586,29586,0.92,65,
2010-04-23 00:24:430.1,0,4666724,3217663,22272,22272,0.77,65,

2010-04-24 00:05:02,0.0,0,4666724,3217701,71388,71388,2.31,65,
2010-04-24 00:05:02,0.1,0,4666724,3217701,70264,70264,2.31,65,
2010-04-24 00:05:02,0.2,0,4666724,3217700, 61254,61254,2.00,65,
2010-04-24 00:05:02,0.3,0,4666724,3217701,71011,71011,2.31,65,
2010-04-24 00:05:02, 0.4,0,4666724,3217701,68111,68111,2.15,65,
2010-04-24 00:05:02,0.5,0,4666724,3217702,69904,69904,2.31,65,

思路, 注释? 鲍勃

回答

1

它可以是如用grep 2个图案作为简单:

grep -e " 00:0" -e " 00:10" myFIle.csv 

第一模式会匹配00:0000:09之间和第二图案会发现00:10

+0

这是好的,但在天,当有在00:00的条目,它也将在00:10拿起条目。感谢提醒我关于-e – Jay 2011-04-21 19:41:10

1

应该很容易用Perl:

perl -ane '$l = 0 if $F[0] ne $d; print if $l++ < 6; $d = $F[0]' file 
+0

好的解决方案...我真的需要学习一些perl的一天。 – Jay 2011-04-21 19:42:49

1

下面使用read与自定义IFS(=输入字段分隔符)设定为分割输入线到日期时间字段,其余的,则使用bash'substring操作符从ISO日期时间提取日期,然后基本继续打印下N行。在echo的位置,您可能想要对结果进行任何处理,因为read + echo不会完全保留输入。

function first_n_of_each_day() { 
    local N="$1" 
    local lastDateTime="" 
    local I=0 
    while IFS=',' read DATETIME OTHER ; do 
     local DATE="${DATETIME:0:10}" 
     if [ "$DATE" != "$lastDateTime" ] ; then 
      I=0 
      lastDateTime="$DATE" 
     fi 
     if [ $I -lt "$N" ] ; then 
      let ++I 
      # line matches: 
      echo "$DATETIME,$OTHER" 
     fi 
    done 
} 
first_n_of_each_day 6 < file.csv 
+0

这就是它!我的解决方案开始是这样的,但是我的大脑在这个过程中转向了木薯粉。谢谢! – Jay 2011-04-21 19:42:23

2

的AWK版本eugene y的回答

awk ' 
    $1 != date {count = 0; date = $1} 
    ++count <= 6 {print} 
' filename 
+0

+1这真是一个简单和干净的解决方案,以解决这个问题。 – anubhava 2011-04-22 03:24:37