2016-05-16 67 views
1

我在文本文件中具有如下的数据。将文本文件拆分为3个数据集/表格

如何将文本文件分成3个数据集/表?

1与收入数据,第二与赎回数据,第三与过期数据。他们每个人都有很多行,我只提到他们每个人只有3-4行。我正在尝试使用Infile声明,但不知道如何拆分。这里有这样的想法:首先,将读取初始数据(earnings),并且每当sas识别单词redemptions它必须停止并且其余数据必须到达第二数据集,并且每当sas识别单词Expirations时,该关键字下面的数据必须去第三个数据集。有什么建议么 ?

Earnings 
abc 123 xyz abjjdd 
bhb edw ajd jnjnjknn 
ebc ecc cec cecekckk 
.... 
redemptions 
abc 123 xyz abjjdd 
bhb edw ajd jnjnjknn 
ebc ecc cec cecekckk 
Expirations 
abc 123 xyz abjjdd 
bhb edw ajd jnjnjknn 
ebc ecc cec cecd ccsdc 
djc c djc cjdcjjnc 

回答

1

使用retain变量将帮助您实现此目的。

使用下面的代码,只需使用文件名替换infile语句中的datalines并设置正确的infile参数。

data rawImport; 
    infile datalines dsd delimiter=' ' truncover; 
    informat C1-C4 $32.; 
    input C1-C4; 
    datalines; 
Earnings 
abc 123 xyz abjjdd 
bhb edw ajd jnjnjknn 
ebc ecc cec cecekckk 
Redemptions 
abc 234 xyz abjjdd 
bhb edw ajd jnjnjknn 
ebc ecc cec cecekckk 
Expirations 
abc 345 xyz abjjdd 
bhb edw ajd jnjnjknn 
ebc ecc cec cecd ccsdc 
djc c djc cjdcjjnc 
; 

通过使用retain变量,我们现在可以将行调度到适当的数据集。

data Earnings Redemptions Expirations; 
    set rawImport; 
    length outputDS $ 12; 
    retain outputDS; 

    * Determine output dataset; 
    if C1 = "Earnings" then do; 
    outputDS = "Earnings"; 
    delete; 
    end; 
    else if C1 = "Redemptions" then do; 
    outputDS = "Redemptions"; 
    delete; 
    end; 
    else if C1 = "Expirations" then do; 
    outputDS = "Expirations"; 
    delete; 
    end; 

    * output to appropriate dataset; 
    if outputDS = "Earnings" then output Earnings; 
    else if outputDS = "Redemptions" then output Redemptions; 
    else if outputDS = "Expirations" then output Expirations; 

    drop outputDS; 
run; 

日志现在显示:

NOTE: There were 13 observations read from the data set WORK.RAWIMPORT. 
NOTE: The data set WORK.EARNINGS has 3 observations and 4 variables. 
NOTE: The data set WORK.REDEMPTIONS has 3 observations and 4 variables. 
NOTE: The data set WORK.EXPIRATIONS has 4 observations and 4 variables.