定期拆分字符串

我想定期拆分字符串。我的问题实际上与此相同：How to split a string into substrings of a given length?，只是我在数据集中有一列字符串，而不是一个字符串。定期拆分字符串

下面是一个例子的数据集：

df = read.table(text = " 
my.id X1  
010101 1 
010102 1 
010103 1 
010104 1 
020101 1 
020112 1 
021701 0 
021802 0 
133301 0 
133302 0 
241114 0 
241215 0 
", header = TRUE, colClasses=c('character', 'numeric'), stringsAsFactors = FALSE)

下面是所期望的结果。我宁愿删除前导零，如图所示：

desired.result = read.table(text = " 
A1 A2 A3 X1 
1 1 1 1 
1 1 2 1 
1 1 3 1 
1 1 4 1 
2 1 1 1 
2 1 12 1 
2 17 1 0 
2 18 2 0 
13 33 1 0 
13 33 2 0 
24 11 14 0 
24 12 15 0 
", header = TRUE, colClasses=c('numeric', 'numeric', 'numeric', 'numeric'), stringsAsFactors = FALSE)

这是一个循环似乎接近，也许我可以使用它。不过，我认为可能有更有效的方法。

for(i in 1:nrow(df)) { 
    print(substring(df$my.id[i], seq(1, 5, 2), seq(2, 6, 2))) 
}

这apply声明不起作用：

apply(df$my.id, 1, function(x) substring(df$my.id[x], seq(1, 5, 2), seq(2, 6, 2)) )

谢谢你的任何建议。我更喜欢在基地R的解决方案。

来源

2013-02-19 Mark Miller

用这个与str_extract_all结合我发现read.fwf应用于textConnection是最有效和易于理解的多种方式之一可能接近这一点。它具有内置于读取*函数中的自动类别检测的优点。

cbind(read.fwf(file=textConnection(df$my.id), 
       widths=c(2,2,2), col.names=paste0("A", 1:3)), 
    X1=df$X1) 
#----------- 
    A1 A2 A3 X1 
1 1 1 1 1 
2 1 1 2 1 
3 1 1 3 1 
4 1 1 4 1 
5 2 1 1 1 
6 2 1 12 1 
7 2 17 1 0 
8 2 18 2 0 
9 13 33 1 0 
10 13 33 2 0 
11 24 11 14 0 
12 24 12 15 0

（我相信我在大约6年前在加勒尔·格洛腾迪克在Rhelp学到了这些。）

如果你喜欢正则表达式策略，那么看看这个插入一个标签，每两个位置并通过read.table运行。非常紧凑：

read.table(text=gsub('(.{2})','\\1\t',df$my.id)) 
#--------- 
    V1 V2 V3 
1 1 1 1 
2 1 1 2 
3 1 1 3 
4 1 1 4 
5 2 1 1 
6 2 1 12 
7 2 17 1 
8 2 18 2 
9 13 33 1 
10 13 33 2 
11 24 11 14 
12 24 12 15

来源

2013-02-19 01:14:00

你几乎在那里。更改apply到sapply或vapply，并改变什么substring作品上：

splt <- sapply(df$my.id, function(x) substring(x, seq(1, 5, 2), seq(2, 6, 2)) ) 
#this will produce the same thing 
splt <- vapply(df$my.id, function(x) substring(x, seq(1, 5, 2), seq(2, 6, 2)),c("","","") ) 
#  010101 010102 010103 010104 020101 020112 021701 021802 133301 133302 241114 241215 
#[1,] "01" "01" "01" "01" "02" "02" "02" "02" "13" "13" "24" "24" 
#[2,] "01" "01" "01" "01" "01" "01" "17" "18" "33" "33" "11" "12" 
#[3,] "01" "02" "03" "04" "01" "12" "01" "02" "01" "02" "14" "15"

你想使这些数字。矩阵也应该转置为与数据帧一起工作。我们可以同时执行以下两个步骤：

splt <- apply(splt,1,as.numeric) 
    # [,1] [,2] [,3] 
# [1,] 1 1 1 
# [2,] 1 1 2 
# [3,] 1 1 3 
# [4,] 1 1 4 
# [5,] 2 1 1 
# [6,] 2 1 12 
# [7,] 2 17 1 
# [8,] 2 18 2 
# [9,] 13 33 1 
# [10,] 13 33 2 
# [11,] 24 11 14 
# [12,] 24 12 15

现在您需要将它与您的旧数据框放在一起。可能类似于以下内容。

df <- cbind(splt,df) 
# 1 2 3 my.id X1 
#1 1 1 1 010101 1 
#2 1 1 2 010102 1 
#3 1 1 3 010103 1 
#4 1 1 4 010104 1 
#5 2 1 1 020101 1 
#6 2 1 12 020112 1 
#7 2 17 1 021701 0 
#8 2 18 2 021802 0 
#9 13 33 1 133301 0 
#10 13 33 2 133302 0 
#11 24 11 14 241114 0 
#12 24 12 15 241215 0

您可以根据需要更改列名称，如names(df)[1:3] <- c("A1","A2","A3")。

来源

2013-02-19 01:00:07

使用gsub和一些正则表达式。我会做这样的事情（不是很优雅，但它做的工作）

cbind(
as.numeric(gsub('([0-9]{2})([0-9]{2})([0-9]{2})','\\1',df$my.id)), 
as.numeric(gsub('([0-9]{2})([0-9]{2})([0-9]{2})','\\2',df$my.id)), 
as.numeric(gsub('([0-9]{2})([0-9]{2})([0-9]{2})','\\3',df$my.id)), 
df$X1) 

    [,1] [,2] [,3] [,4] 
[1,] 1 1 1 1 
[2,] 1 1 2 1 
[3,] 1 1 3 1 
[4,] 1 1 4 1 
[5,] 2 1 1 1 
[6,] 2 1 12 1 
[7,] 2 17 1 0 
[8,] 2 18 2 0 
[9,] 13 33 1 0 
[10,] 13 33 2 0 
[11,] 24 11 14 0 
[12,] 24 12 15 0

编辑

我说这是不是很优雅，所以我加@mnel命题：

x <- gsub('([0-9]{2})([0-9]{2})([0-9]{2})','\\1-\\2-\\3',df$my.id) 
do.call(rbind, lapply(strsplit(x,'-'), as.numeric))

来源

2013-02-19 01:01:44 agstudy

我建议或许'x < - gsub（'（[0-9] {2}）（[0-9] {2}）（[0-9] {2} ）”， '\\ 1 - \\ 2 - \\ 3'，DF $ my.id）; do.call（rbind，lapply（strsplit（x，' - ），as.numeric））'以避免必须多次写出和执行'regex'。 – mnel 2013-02-19 01:11:59

非常好！我添加了一个击键：strsplit（x，' - '） – 2013-02-19 01:34:37

@mnel谢谢。我更新我的答案。 – agstudy 2013-02-19 14:42:17

您也可以使用regex来提取每个两位数的部分。

我从stringr

do.call(rbind,lapply(str_extract_all(as.character(df[['my.id']]), pattern = '[[:digit:]]{2}'), as.numeric))

来源

2013-02-19 01:05:44 mnel

如果你想要一个基础解决方案，你可以用'regmatches（gregexpr（pattern，x））'替换'str_extract_all'' – 2013-02-19 01:09:28

定期拆分字符串

回答

相关问题