2013-01-09 49 views
3

我有一个数据集,其中一个病人可以有多个(未知)值某些变量最终看起来是这样的:拼合多观察在SAS

ID Var1 Var2 Var3 Var4 
    1 Blue Female 17  908 
    1 Blue Female 17  909 
    1 Red Female 17  910 
    1 Red Female 17  911 
... 
    99 Blue Female 14  908 
    100 Red Male 28  911 

我想下来收拾这个数据,以便每个ID只有一个条目,并在其原始条目中存在或不存在其中一个值。所以,举个例子,像这样:

ID YesBlue Var2  Var3 Yes911 
1 1   Female 17  1 
99 1   Female 14  0 
100 0   Male  28  1 

在SAS中有这样一个简单的方法吗?或者说,在Access中(数据来自哪里),我真的不知道如何使用。

回答

3

如果您的数据集被称为PATIENTS1,也许是这样的:

proc sql noprint; 
    create table patients2 as 
    select * 
     ,case(var1) 
      when "Blue" then 1 
      else 0 
     end as ablue 
     ,case(var4) 
      when 911 then 1 
      else 0 
     end as a911 
     ,max(calculated ablue) as yesblue 
     ,max(calculated a911) as yes911 
    from patients1 
    group by id 
    order by id; 
quit; 

proc sort data=patients2 out=patients3(drop=var1 var4 ablue a911) nodupkey; 
    by id; 
run; 
2

这里有一个数据步解决方案。我假设Var2和Var3的值对于给定的ID总是相同的。

data have; 
input ID Var1 $ Var2 $ Var3 Var4; 
cards; 
1 Blue Female 17  908 
1 Blue Female 17  909 
1 Red Female 17  910 
1 Red Female 17  911 
99 Blue Female 14  908 
100 Red Male 28  911 
; 
run; 

data want (drop=Var1 Var4 _:); 
set have; 
by ID; 
if first.ID then do; 
    _blue=0; 
    _911=0; 
end; 
_blue+(Var1='Blue'); 
_911+(Var4=911); 
if last.ID then do; 
    YesBlue=(_blue>0); 
    Yes911=(_911>0); 
    output; 
end; 
run; 
1

编辑:看起来像基思说的只是写作不同。

这应做到:

data test; 
input id Var1 $ Var2 $ Var3 Var4; 
datalines; 
1 Blue Female 17  908 
1 Blue Female 17  909 
1 Red Female 17  910 
1 Red Female 17  911 
99 Blue Female 14  908 
100 Red Male 28  911 
run; 

data flatten(drop=Var1 Var4); 
set test; 
retain YesBlue; 
retain Yes911; 
by id; 

if first.id then do; 
    YesBlue = 0; 
    Yes911 = 0; 
end; 

if Var1 eq "Blue" then YesBlue = 1; 
if Var4 eq 911 then Yes911 = 1; 

if last.id then output; 
run; 
1

PROC SQL非常适合这样的事情。这是一个类似于DavB的答案,但省去了额外的排序:

data have; 
input ID Var1 $ Var2 $ Var3 Var4; 
cards; 
1 Blue Female 17  908 
1 Blue Female 17  909 
1 Red Female 17  910 
1 Red Female 17  911 
99 Blue Female 14  908 
100 Red Male 28  911 
; 
run; 

proc sql; 
    create table want as 
    select ID 
     , max(case(var1) 
       when 'Blue' 
       then 1 
       else 0 end) as YesBlue 
     , max(var2)   as Var2 
     , max(var3)   as Var3 
     , max(case(var4) 
       when 911 
       then 1 
       else 0 end) as Yes911 
    from have 
    group by id 
    order by id; 
quit; 

它也安全地通过ID变量降低自己的原始数据,但可能出现的错误的风险,如果源不是完全按照你描述。