2014-10-27 74 views
-2

我试图提取使用文本段落匹配部正则表达式stringr-文本之一是匹配R中

if returnValue is not null then 
1. if instrument type is "Bond" then 
     Status is equals to 138 if the instrument is sensible coupon, 
     coupon type is not null and not equals to "ZERO COUPON" and previous value 
     is not equals to current value, and iinstrument creation date is not D 
- Status is equals to 137 if the instrument is sensible bbg, previous value 
     is not equals to current value, and iinstrument creation date is not D or D-1 
- Status is equals to the previous status if the value is not manual 
     and previous status is 138, or 137 

2. if attribute SEC_PAYT_DTE is not null then 
    if attribute SEC_PAYT_DTE (typed as date) is fresher than 
     returnValue (typed as date) then 
    set status to 136 that is "Functional Error" 
3. if acrual date (DEBT_STRT_ACRL_DTE) is not null and instrument 
     category is "Structured Product", and acrual date is different 
     frorm return value then 
    set status to 150 that is "Non blocking functional error". 

我想提取什么是“状态138”,“137状态” ,'状态136','状态150'。我做的是str_extract_all(x,'(S | s)tatus [a-z \ s] {1,10} [0-9] {1,3} [^ \。'')。但它不起作用。

+0

这里有什么规则?请明确定义你想要正则表达式做什么 – 2014-10-27 20:37:11

+0

我希望正则表达式能够找到字符串的'S(s)status'+ 0-3位数字。例如'状态等于138',则正则表达式应该找到138.然而,'不是D或D-1'中的1不应该返回 – 2014-10-27 20:44:25

+0

那么该行和前一状态应该是138还是137返回? – 2014-10-27 20:56:06

回答

0

str_extract_all中的正则表达式匹配使用POSIX标准,该标准不会继续查找新行,因此您需要自行完成此操作。

matches <- sapply(strsplit(val, "\n")[[1]], 
    str_extract_all, "[Ss]tatus is(?: equals to)? [0-9]+") 
matches <- gsub(fixed = TRUE, "is ", "", gsub(fixed = TRUE, " equals to", "", 
    Filter(length, matches))) 
# [1] "Status 138" "Status 137" "status 138"