2017-04-06 64 views
1

我给出如下特征向量:更换日期的特征向量,以特定的格式

"On the evening of 2017-04-23, I was too tired" 
"to complete my homework that was due on 24.04.2017." 

我需要通过它来搜索日期的所有出现,并与格式MONTHNAME d,YYYY替换它们。

我知道一般格式应该是%B%d,%Y,我可能必须使用sub()函数,但我不太确定如何将两者结合在一起。

当我尝试像

sub("[0-9]{2}.[0-9]{2}.[0-9]{4}","%B %d, %Y",x) 

我刚刚得到以下结果

"On the evening of 2001-01-15, I was too tired to complete my homework that was due on %B %d, %Y." 

可能有人请帮助我弄清楚如何把它一起?


我与同伴stackoverflowers的帮助下新的代码如下:

streamlineDates(x) 
{ 
#set pattern to dates in form of YYYY-MM-DD or DD.MM.YYYY 
pattern <- "\\d{2,4}[.-]\\d{2}[.-]\\d{2,4}" 

y <- c(x) 

val <- unlist(regmatches(y, gregexpr(pattern, y))) 

val1 <- as.Date(val,format=c("%Y-%m-%d","%d.%m.%Y")) 
val2 <- format(val1,"%B %d, %Y") 

y1 <- list() 
for (i in 1:length(y)){ 
    y1[i] <- gsub(pattern,val2[i],y[i]) 
} 
} 

然而,当我只输入:

x <- "to complete my homework that was due on 24.04.2017." 

...它只返回NA。我已将问题范围缩小到gsub,其中替换值值,“如果NA,则结果中对应于匹配的所有元素将被设置为NA”。因此,当仅输入最后一行时缺少第一个日期,它仅返回NA。

我该如何让它接受一个或两个日期?

+0

数据格式(例如, '%B%d%Y')不能用在'sub'或'gsub'函数中,它必须用在'as.Date'中。 – emilliman5

+0

@ sooki-sooki看到我的解决方案,我希望这有助于。谢谢 – PKumar

回答

2

第一方法:

使用基础R溶液(不使用任何包):

pattern <- "\\d{2,4}[.-]\\d{2}[.-]\\d{2,4}" 
rep <- c("On the evening of 2017-04-23, I was too tired","to complete my homework that was due on 24.04.2017.") 


val <- unlist(regmatches(rep, gregexpr(pattern, rep))) 

val1 <- as.Date(val,format=c("%Y-%m-%d","%d.%m.%Y")) 
val2 <- format(val1,"%B %d, %Y") 
val2 
rep1 <- list() 
for (i in 1:length(rep)){ 
rep1[i] <- gsub(pattern,val2[i],rep[i]) 
} 

答案:

do.call("c",rep1) 

> do.call("c",rep1)             
[1] "On the evening of April 23, 2017, I was too tired"  
[2] "to complete my homework that was due on April 24, 2017." 
> 

第2种方法:

使用图书馆stringr

library(stringr) 
rep <- c("On the evening of 2017-04-23, I was too tired","to complete my homework that was due on 24.04.2017.") 
val <- str_extract(rep,"\\d{2,4}[.-]\\d{2}[.-]\\d{2,4}") 
val1 <- as.Date(val,format=c("%Y-%m-%d","%d.%m.%Y")) 
val2 <- format(val1,"%B %d, %Y") 
rep1 <- str_replace_all(rep,"\\d{2,4}[.-]\\d{2}[.-]\\d{2,4}",val2) 
rep1 

答:但它

> rep1 
[1] "On the evening of April 23, 2017, I was too tired"  
[2] "to complete my homework that was due on April 24, 2017." 
> 

编辑OP之后已经改变的问题一点,解决的办法是更通用,假设该月将始终处于中间位置,并且分隔符仅限于破折号( - )和点(。):

pattern <- "\\d{2,4}[.-]\\d{2}[.-]\\d{2,4}" 
rep <- c("On the evening of 2017-04-23, I was too tired","to complete my homework that was due on 24.04.2017.") 


val <- unlist(regmatches(rep, gregexpr(pattern, rep))) 

year <- regmatches(val, gregexpr("\\d{4}", val)) 

month <- regmatches(val, gregexpr("(?<=[.-])\\d{1,2}(?=[.-])", val,perl=T)) 

date <- regmatches(val, gregexpr("(?<=[.-])\\d{2}$|^\\d{2}(?=[.-])", val,perl=T)) 
#Extracting year month and date , assuming month always falls in middle string 

date1 <- paste0(year,"-",month,"-",date) 
date1 <- as.Date(date1,"%Y-%m-%d") 
val2 <- format(date1,"%B %d, %Y") 

rep1 <- list() 
for (i in 1:length(rep)){ 
    rep1[i] <- gsub(pattern,val2[i],rep[i]) 
} 


do.call("c",rep1) 
+0

这很棒,但是如果没有任何附加的库,也就是只有标准的预加载的R库,会不会有这种方法? –

+0

我刚刚对代码进行了一些进一步的测试,并注意到如果仅“完成2017年4月24日到期的作业”。作为输入提供,代码不起作用,只返回NA。 你可能知道如何解决这个问题吗? –

+1

请理解,如果你想为每一个场景取得正确的结果。你必须在'as.Date(val,format = c(“%Y-%m-%d”,“%d。%m。%Y”))中加上相应的正确格式,就像这里我们把两个两种不同日期戳的格式不同。一种格式不能与你拥有的每一种日期格式相关联。给我一些时间我试图使它通用。如果有可能,我不会,但肯定会尝试。 – PKumar

1

首先您需要指定日期的所有格式。然后转换为日期,使用的格式,让您所需的输出,即

#Note that I don't use any delimiter in the formatting simply because 
#I will use gsub to replace all except the numbers with '' from the string 
v1 <- c('%Y%m%d', '%d%m%Y') 

format(as.Date(gsub('\\D+', '', x), format = v1), "%B %d, %Y") 
#[1] "April 23, 2017" "April 24, 2017" 

可以使用(一个比较难看)从stringrstr_replace_all正则表达式,即

stringr::str_replace_all(x, '\\d+-\\d+-\\d+|\\d+\\.\\d+\\.\\d+', 
         format(as.Date(gsub('\\D+', '', x), format = v1), "%B %d, %Y")) 

#[1] "On the evening of April 23, 2017, I was too tired"  
#[2] "to complete my homework that was due on April 24, 2017."