2015-12-02 53 views
-1

我有一个看起来像如何找出壳列式,并获得差异

emp_id(int),name(string),age(int) 
1,hasa,34 
2,dafa,45 
3,fasa,12 
8f,123Rag,12 
8,fafl,12 

要求的示例文件:列数据类型指定为字符串和整数。 Emp_id应该是一个不是字符串的整数。这些条件对于名称和年龄列是相同的。

我的输出应该像#

Actual column Emp_id type is INT but string was found at the position 4, value is 8f 
Actual column name type is STRING but numbers were found at the position 4, value is 123Rag 

继续..

这里是我的代码 shell脚本

read input 
if [ $input -eq $input 2>/dev/null ] 
then 
    echo "$input is an integer" 
else 
    echo "$input is not an integer" 
fi 

在Python中,我与Isinstance尝试(OBJ ,类型),但它没有达到目的。 可以在这方面指导我,任何shell/python/perl脚本的帮助将不胜感激!

+0

您的代码与您的要求无关。至少表现出诚实的尝试。 – karakfa

+0

[BASH:测试字符串是否作为一个整数有效吗?]可能的重复(http://stackoverflow.com/questions/2210349/bash-test-whether-string-is-valid-as-an-integer) – tripleee

+0

什么是将数字放入字符串字段的问题? –

回答

1

这里是一个awk的解决方案:

awk -F"," 'NR==1{for(i=1; i <= NF; i++){ 
         split($i,a,"("); 
         name[i]=a[1]; 
         type[i] = ($i ~ "int" ? "INT" : "String")}next} 
      {for(i=1; i <= NF; i++){ 
       if($i != int($i) && type[i] == "INT"){error[i][NR] = $i} 
       if($i ~ /[0-9]+/ && type[i] == "String"){error[i][NR] = $i} 
      }} 
      END{for(i in error){ 
         for(key in error[i]){ 
          print "Actual column "name[i]" type is "type[i]\ 
            " but string was found at the position "key-1\ 
            ", value is "error[i][key]}}}' inputFile 

输出是 - 根据需要:

Actual column emp_id type is INT but string was found at the position 4, value is 8f 
Actual column name type is String but string was found at the position 4, value is 123Rag 

然而,在我看来123Rag是一个字符串,不应该被表示为一个不正确的项在第二列。

+0

你的INT测试错误IMO:值'1.1'会通过。这更好:'$ i!= int($ i)'。否则,我的想法也是如此。 –

+0

@glenn jackman:是的,你说得对,当然! '$ i == $ i + 0'测试这个值是否是一个数字(int或double无关紧要)。我不知何故忘记了'int'限制。 –

0

随着perl我会解决它像这样:

  • 定义匹配/不字符串内容相匹配的正则表达式的一些模式。
  • 挑出标题行 - 将其分为名称和类型。 (可选地报告类型是否不匹配)。
  • 迭代你的领域,通过柱匹配,找出类型和应用正则表达式来验证

喜欢的东西:

#!/usr/bin/env perl 

use strict; 
use warnings; 
use Data::Dumper; 

#define regex to apply for a given data type 
my %pattern_for = (
    int => qr/^\d+$/, 
    string => qr/^[A-Z]+$/i, 
); 

print Dumper \%pattern_for; 

#read the first line. 
# <> is a magic filehandle, that reads files specified as arguments 
# or piped input - like grep/sed do. 
my $header_row = <>; 
#extract just the names, in order. 
my @headers = $header_row =~ m/(\w+)\(/g; 
#create a type lookup for the named headers. 
my %type_for = $header_row =~ m|(\w+)\((\w+)\)|g; 

print Dumper \@headers; 
print Dumper \%type_for; 

#iterate input again 
while (<>) { 
    #remove trailing linefeed 
    chomp; 

    #parse incoming data into named fields based on ordering. 
    my %fields; 
    @fields{@headers} = split /,/; 
    #print for diag 
    print Dumper \%fields; 

    #iterate the headers, applying the looked up 'type' regex 
    foreach my $field_name (@headers) { 
     if ($fields{$field_name} =~ m/$pattern_for{$type_for{$field_name}}/) { 
      print 
       "$field_name => $fields{$field_name} is valid, $type_for{$field_name} matching $pattern_for{$type_for{$field_name}}\n"; 
     } 
     else { 
      print "$field_name $fields{$field_name} not valid $type_for{$field_name} matching $pattern_for{$type_for{$field_name}}\n"; 
     } 
    } 
} 

这让您的输入(不仅仅是残疾人为了简洁):

name 123Rag not valid string matching (?^i:^[A-Z]+$) 
emp_id 8f not valid int matching (?^:^\d+$) 

注意 - 它仅支持“简单”的CSV风格(没有嵌套逗号或引号),但可以很容易地适应使用Text::CSV模块。