示例数据帧:分开的不同组合到第一和最后使用dplyr,tidyr,和正则表达式
name <- c("Smith John Michael","Smith, John Michael","Smith John, Michael","Smith-John Michael","Smith-John, Michael")
df <- data.frame(name)
df
name
1 Smith John Michael
2 Smith, John Michael
3 Smith John, Michael
4 Smith-John Michael
5 Smith-John, Michael
我需要实现以下所需的输出:
name first.name last.name
1 Smith John Michael John Smith
2 Smith, John Michael John Smith
3 Smith John, Michael Michael Smith John
4 Smith-John Michael Michael Smith-John
5 Smith-John, Michael Michael Smith-John
的规则如下:如果字符串中有逗号,则以前的任何内容都是姓氏。在逗号后面的第一个单词是名字。如果字符串中没有逗号,第一个词是姓,第二个词是姓。带连字符的单词是一个单词。我宁愿用dplyr和regex来实现这一点,但我会采取任何解决方案。感谢您的帮助
见http://stackoverflow.com/questions/7069076/split-column-at-delimiter-in-data-frame –