2012-02-13 133 views
2

我试图将HHMMSS转换为HH:MM:SS,我可以成功转换它,但由于文件大小,我的脚本需要2个小时才能完成。有没有更好的办法(最快的方式)来完成这一任务如何将HHMMSS转换为HH:MM:SS Unix?

Data File 
data.txt 

10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,,, 
10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,,071600, 
10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,072200,072200, 
10,SRI,AA,20091210,8503,ABCXYZ,D,N,TAB,072600,072600, 
10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,073200,073200, 
10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,073500,073500, 
10,SRI,AA,20091210,8503,ABCXYZ,D,N,MRO,073700,073700, 
10,SRI,AA,20091210,8503,ABCXYZ,D,N,CPT,073900,073900, 
10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,074400,, 
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,,, 
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,,090200, 
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,090900,090900, 
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,091500,091500, 
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TAB,091900,091900, 
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,092500,092500, 
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,092900,092900, 
10,SRI,AA,20091210,8505,ABCXYZ,D,N,MRO,093200,093200, 
10,SRI,AA,20091210,8505,ABCXYZ,D,N,CPT,093500,093500, 
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,094500,, 
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 
10,SRI,AA,20091210,8506,ABCXYZ,U,N,CPT,,, 
10,SRI,AA,20091210,8506,ABCXYZ,U,N,MRO,,, 
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TAB,,, 
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,,170100, 
10,SRI,AA,20091210,8510,ABCXYZ,U,N,CPT,170400,170400, 
10,SRI,AA,20091210,8510,ABCXYZ,U,N,MRO,170700,170700, 
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,171000,171000, 
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,171500,171500, 
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TAB,171900,171900, 
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,172500,172500, 
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,172900,172900, 
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,173500,173500, 
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,174100,, 

我的代码:script.sh

#!/bin/bash 
awk -F"," '{print $5}' Data.txt > tmp.txt # print first line first string before , to tmp.txt i.e. all Numbers will be placed into tmp.txt 
sort tmp.txt | uniq -d > Uniqe_number.txt # unique values be stored to Uniqe_number.txt 
rm tmp.txt # removes tmp file 
while read line; do 
echo $line 
cat Data.txt | grep ",$line," > Numbers/All/$line.txt # grep Number and creats files induvidtually 
awk -F"," '{print $5","$4","$7","$8","$9","$10","$11}' Numbers/All/$line.txt > Numbers/All/tmp_$line.txt 
mv Numbers/All/tmp_$line.txt Numbers/Final/Final_$line.txt 
done < Uniqe_number.txt 
ls Numbers/Final > files.txt 
dos2unix files.txt 
bash time_replace.sh  

当你执行上面的脚本,它会调用time_replace.sh脚本

我code for time_replace.sh

#!/bin/bash 
for i in `cat files.txt` 
do 
while read aline 
do 
TimeDep=`echo $aline | awk -F"," '{print $6}'` 
#echo $TimeDep 
finalTimeDep=`echo $TimeDep | awk '{for(i=1;i<=length($0);i+=2){printf("%s:",substr($0,i,2))}}'|awk '{sub(/:$/,"")};1'` 
#echo $finalTimeDep 
########## 
TimeAri=`echo $aline | awk -F"," '{print $7}'` 
#echo $TimeAri 
finalTimeAri=`echo $TimeAri | awk '{for(i=1;i<=length($0);i+=2){printf("%s:",substr($0,i,2))}}'|awk '{sub(/:$/,"")};1'` 
#echo $finalTimeAri 
sed -i 's/',$TimeDep'/',$finalTimeDep'/g' Numbers/Final/$i 
sed -i 's/',$TimeAri'/',$finalTimeAri'/g' Numbers/Final/$i 
############################ 
done < Numbers/Final/$i 
done 

任何更好的解决方案?

感谢任何帮助。

感谢 斯里兰卡

+0

所以,你改变'10,SRI,AA,20091210,8503,ABCXYZ,d,N,TMP,072200,072200,'到:10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,07:22:00,07:22:00,'? – 2012-02-13 22:14:41

+1

我很震惊,只需要2个小时就可以跑步。 – 2012-02-13 22:15:46

+0

是迈克这是正确的 – user790049 2012-02-13 22:19:16

回答

0

目前还不清楚你的所有排序和uniqing是。我假设您的数据文件每行只有一个条目,并且您需要将第10个和第11个逗号分隔的字段从HHMMSS更改为HH:MM:SS。

while IFS=, read -a line ; do 
    echo -n ${line[0]},${line[1]},${line[2]},${line[3]}, 
    echo -n ${line[4]},${line[5]},${line[6]},${line[7]}, 
    echo -n ${line[8]},${line[9]}, 
    if [ -n "${line[10]}" ]; then 
     echo -n ${line[10]:0:2}:${line[10]:2:2}:${line[10]:4:2} 
    fi 
    echo -n , 
    if [ -n "${line[11]}" ]; then 
     echo -n ${line[11]:0:2}:${line[11]:2:2}:${line[11]:4:2} 
    fi 
    echo "" 
done < data.txt 

执行部分是${variable:offset:length}结构,让您提取子出来的变量。

+0

谢谢克里斯,乔纳森和邪恶。我采用了邪恶的解决方案(这对我来说更容易理解)。 – user790049 2012-02-14 00:01:17

1

如果有文件量大,则管道可能是什么会比什么都重要影响性能 - 虽然过程可以便宜,如果你正在做一个巨大的量处理然后减少您通过管道传递数据的时间量可以获得红利。

所以你可能会更好的在awk(或perl)中编写整个脚本。例如,awk可以将输出发送到一个任意文件,所以你的第一个文件中的lop可以用一个awk脚本来替代,这个脚本可以实现这一点。您也不需要使用临时文件。

我认为排序只是为了跟踪进度,因为您知道有多少个数字。但是,如果你不喜欢的排序,你可以简单地这样做:

#!/bin/sh 
awk -F ',' ' 
{ 
    print $5","$4","$7","$8","$9","$10","$11 > Numbers/Final/Final_$line.txt 
}' datafile.txt 
ls Numbers/Final > files.txt 

另外,如果你需要排序,你可以做sort -t, -k5,4,10(或任何领域的排序关键字的实际需要来定)。

至于格式化日期时间,awk也做功能,所以你实际上可以有一个awk脚本,看起来像这样。这将取代这两个上面的脚本,同时保留相同的功能(至少,据我可以做一个快速分析)...(注意!未经测试,所以可能包含vauge语法错误):

#!/usr/bin/awk 
BEGIN { 
    FS="," 
} 
function formattime (t) 
{ 
    return substr(t,1,2)":"substr(t,3,2)":"substr(t,5,2) 
} 
{ 
    print $5","$4","$7","$8","$9","formattime($10)","formattime($11) > Numbers/Final/Final_$line.txt 
} 

可以保存,文件模式700,并直接调用为:

dostuff.awk filename 

其他awk的选项包括更改领域原位,所以如果你想保持整个原始文件,但与格式的日期时间,您可以对上述内容进行修改。在print块更改为:

{ 
    $10=formattime($10) 
    $11=formattime($11) 
    print $0 
} 

如果不这样做,你需要的一切,希望它给一些想法,这将有助于该代码。

0

在Perl中,这是接近孩子们的游戏:

#!/usr/bin/env perl 
use strict; 
use warnings; 
use English(-no_match_vars); 

local($OFS) = ","; 
while (<>) 
{ 
    my(@F) = split /,/; 
    $F[9] =~ s/(\d\d)(\d\d)(\d\d)/$1:$2:$3/ if defined $F[9]; 
    $F[10] =~ s/(\d\d)(\d\d)(\d\d)/$1:$2:$3/ if defined $F[10]; 
    print @F; 
} 

如果你不想使用English,你可以写local($,) = ",";代替;它控制输出字段分隔符,选择使用逗号。该代码读取文件中的每一行,拆分它的逗号,需要最后两个领域,从零算起,以及(如果他们不为空),在对数字之间插入冒号。我相信'Code Golf'解决方案会缩短很多,但如果您知道任何Perl,这个解决方案就会变得非常简单。

这将是迄今为止比脚本更快,这不仅是因为它没有进行排序任何东西,也因为所有的处理是在单次通过文件在一个单一的过程中完成的。每行输入运行多个进程(如代码中所示),当文件很大时,会造成性能灾难。

您所提供的样本数据的输出是:

10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,,, 
10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,,07:16:00, 
10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,07:22:00,07:22:00, 
10,SRI,AA,20091210,8503,ABCXYZ,D,N,TAB,07:26:00,07:26:00, 
10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,07:32:00,07:32:00, 
10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,07:35:00,07:35:00, 
10,SRI,AA,20091210,8503,ABCXYZ,D,N,MRO,07:37:00,07:37:00, 
10,SRI,AA,20091210,8503,ABCXYZ,D,N,CPT,07:39:00,07:39:00, 
10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,07:44:00,, 
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,,, 
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,,09:02:00, 
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,09:09:00,09:09:00, 
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,09:15:00,09:15:00, 
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TAB,09:19:00,09:19:00, 
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,09:25:00,09:25:00, 
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,09:29:00,09:29:00, 
10,SRI,AA,20091210,8505,ABCXYZ,D,N,MRO,09:32:00,09:32:00, 
10,SRI,AA,20091210,8505,ABCXYZ,D,N,CPT,09:35:00,09:35:00, 
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,09:45:00,, 
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 
10,SRI,AA,20091210,8506,ABCXYZ,U,N,CPT,,, 
10,SRI,AA,20091210,8506,ABCXYZ,U,N,MRO,,, 
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TAB,,, 
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,,17:01:00, 
10,SRI,AA,20091210,8510,ABCXYZ,U,N,CPT,17:04:00,17:04:00, 
10,SRI,AA,20091210,8510,ABCXYZ,U,N,MRO,17:07:00,17:07:00, 
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,17:10:00,17:10:00, 
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,17:15:00,17:15:00, 
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TAB,17:19:00,17:19:00, 
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,17:25:00,17:25:00, 
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,17:29:00,17:29:00, 
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,17:35:00,17:35:00, 
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,17:41:00,, 
相关问题