2013-02-19 62 views
1

在合并多个数据集的过程中,我试图删除某个特定变量缺少值的数据框的所有行(我想让NAs保持在当前的一些其他专栏)。我用下面的一行:!is.na在其他列中创建NDA

data.frame <- data.frame[!is.na(data.frame$year),] 

这成功地消除了与NAS进行year,(没有人)都行,但其他列,而此前有数据,现在完全是来港定居。换句话说,非缺失值正被转换为NA。关于这里发生了什么的任何想法?我已经试过这些替代方案,得到了相同的结果:我使用is.na不当

data.frame <- subset(data.frame, !is.na(year)) 

data.frame$x <- ifelse(is.na(data.frame$year) == T, 1, 0); 
data.frame <- subset(data.frame, x == 0) 

是谁?在这种情况下是否有任何替代is.na?任何帮助将不胜感激!

编辑下面是代码,应该重现该问题:

#data 
tc <- read.csv("http://dl.dropbox.com/u/4115584/tc2008.csv") 
frame <- read.csv("http://dl.dropbox.com/u/4115584/frame.csv") 

#standardize NA codes 
tc[tc == "."] <- NA 
tc[tc == -9] <- NA 

#standardize spatial units 
colnames(frame)[1] <- "loser" 
colnames(frame)[2] <- "gainer" 
frame$dyad <- paste(frame$loser,frame$gainer,sep="") 
tc$dyad <- paste(tc$loser,tc$gainer,sep="") 
drops <- c("loser","gainer") 
tc <- tc[,!names(tc) %in% drops] 
frame <- frame[,!names(frame) %in% drops] 
rm(drops) 

#merge tc into frame 
data <- merge(tc, frame, by.x = "year", by.y = "dyad", all.x=T, all.y=T) #year column is duplicated in  this process. I haven't had this problem with nearly identical code using other data. 

rm(tc,frame) 

#the first column in the new data frame is the duplicate year, which does not actually contain years. I'll rename it. 
colnames(data)[1] <- "double" 

summary(data$year) #shows 833 NA's 

summary(data$procedur) #note that at this point there are non-NA values 

#later, I want to create 20 year windows following the events in the tc data. For simplicity, I want to remove cases with NA in the year column. 

new.data <- data[!is.na(data$year),] 

#now let's see what the above operation did 
summary(new.data$year) #missing years were successfully removed 
summary(new.data$procedur) #this variable is now entirely NA's 
+2

请给我们一个可重复的数据。请不要将'data.frame'命名为'data.frame'。由于已经有一个名为'data.frame'的函数。 – Arun 2013-02-19 21:20:27

+1

@Arun但是他能命名他的'data.frame''函数吗,还是已经有'data.frame'叫'function'? :) – juba 2013-02-19 21:23:27

+1

:)我的头在旋转。大声笑。 – Arun 2013-02-19 21:38:39

回答

2

我认为实际的问题是您的merge

合并完并有数据data,如果你这样做:

# > table(data$procedur, useNA="always") 

# 1  2  3  4  5  6 <NA> 
# 122 112 356  59  39  19 192258 

你看有这么多(122+112...+19)值data$procedur。但是,所有这些值都对应于data$year = NA

> all(is.na(data$year[!is.na(data$procedur)])) 
# [1] TRUE # every value of procedur occurs where year = NA 

所以,基本上,的procedur所有值也将被删除,因为你删除了这些行中yearNA检查。

为了解决这个问题,我认为你应该使用merge为:如果该合并给你想要的结果

merge(tc, frame, all=T) # it'll automatically calculate common columns 
# also this will not result in duplicated year column. 

检查。

0

尝试complete.cases

data.frame.clean <- data.frame[complete.cases(data.frame$year),] 

...虽然,如上面提到的,你可能想选择一个更具描述性的名字。

+0

'is.na'的用法是对的。所以,我怀疑这会有什么不同。 – Arun 2013-02-19 21:54:07

+0

感谢您的建议。但是,结果是完全一样的。 – davy 2013-02-19 22:07:13