2017-02-11 111 views
2

我有一个奇怪的数据框,其中玩家列有玩家的名字。问题是第一个名字显示两次。所以Roy SieversRoyRoy Sievers,我想这个名字显然是Roy Sievers删除R中同一数据帧列中的重复值/重复值

有人会知道如何做到这一点?

以下是完整的数据帧,它不是很长:

Year     Player     Team  Position 
1 1949   RoyRoy Sievers  St. Louis Browns  OF 
2 1950   WaltWalt Dropo   Boston Red Sox  1B 
3 1951   GilGil McDougald  New York Yankees  3B 
4 1952   HarryHarry Byrd Philadelphia Athletics  P 
5 1953  HarveyHarvey Kuenn   Detroit Tigers  SS 
6 1954    BobBob Grim  New York Yankees  P 
7 1955   HerbHerb Score  Cleveland Indians  P 
8 1956  LuisLuis Aparicio  Chicago White Sox  SS 
9 1957   TonyTony Kubek  New York Yankees  SS 
10 1958  AlbieAlbie Pearson Washington Senators  OF 
11 1959   BobBob Allison Washington Senators  OF 
12 1960   RonRon Hansen  Baltimore Orioles  SS 
13 1961   DonDon Schwall   Boston Red Sox  P 
14 1962    TomTom Tresh  New York Yankees  SS 
15 1963   GaryGary Peters  Chicago White Sox  P 
16 1964   TonyTony Oliva  Minnesota Twins  OF 
17 1965   CurtCurt Blefary  Baltimore Orioles  OF 
18 1966  TommieTommie Agee  Chicago White Sox  OF 
19 1967    RodRod Carew  Minnesota Twins  2B 
20 1968   StanStan Bahnsen  New York Yankees  P 
21 1969   LouLou Piniella  Kansas City Royals  OF 
22 1970 ThurmanThurman Munson  New York Yankees  C 
23 1971  ChrisChris Chambliss  Cleveland Indians  1B 
24 1972  CarltonCarlton Fisk   Boston Red Sox  C 
25 1973    AlAl Bumbry  Baltimore Orioles  OF 
26 1974  MikeMike Hargrove   Texas Rangers  1B 
27 1975   FredFred Lynn   Boston Red Sox  OF 
28 1976   MarkMark Fidrych   Detroit Tigers  P 
29 1977  EddieEddie Murray  Baltimore Orioles  DH 
30 1978   LouLou Whitaker   Detroit Tigers  2B 
31 1979*   JohnJohn Castino  Minnesota Twins  3B 
32 1979* AlfredoAlfredo Griffin  Toronto Blue Jays  SS 
33 1980  JoeJoe Charboneau  Cleveland Indians  OF 
34 1981  DaveDave Righetti  New York Yankees  P 
35 1982   CalCal Ripken  Baltimore Orioles  SS 
36 1983   RonRon Kittle  Chicago White Sox  OF 
37 1984   AlvinAlvin Davis  Seattle Mariners  1B 
38 1985  OzzieOzzie Guillén  Chicago White Sox  SS 
39 1986   JoseJose Canseco  Oakland Athletics  OF 
40 1987   MarkMark McGwire  Oakland Athletics  1B 
41 1988   WaltWalt Weiss  Oakland Athletics  SS 
42 1989   GreggGregg Olson  Baltimore Orioles  P 
43 1990   Sandy Alomar Jr  Cleveland Indians  C 
44 1991  ChuckChuck Knoblauch  Minnesota Twins  2B 
45 1992   PatPat Listach  Milwaukee Brewers  SS 
46 1993   TimTim Salmon  California Angels  OF 
47 1994   BobBob Hamelin  Kansas City Royals  DH 
48 1995  MartyMarty Cordova  Minnesota Twins  OF 
49 1996   DerekDerek Jeter  New York Yankees  SS 
50 1997 NomarNomar Garciaparra   Boston Red Sox  SS 
51 1998   BenBen Grieve  Oakland Athletics  OF 
52 1999  CarlosCarlos Beltrán  Kansas City Royals  OF 
53 2000 KazuhiroKazuhiro Sasaki  Seattle Mariners  P 
54 2001  IchiroIchiro Suzuki  Seattle Mariners  OF 
55 2002   EricEric Hinske  Toronto Blue Jays  3B 
56 2003  ÁngelÁngel Berroa  Kansas City Royals  SS 
57 2004  BobbyBobby Crosby  Oakland Athletics  SS 
58 2005  HustonHuston Street  Oakland Athletics  P 
59 2006 JustinJustin Verlander   Detroit Tigers  P 
60 2007  DustinDustin Pedroia   Boston Red Sox  2B 
61 2008  EvanEvan Longoria   Tampa Bay Rays  3B 
62 2009   Andrew Bailey  Oakland Athletics  P 
63 2010  NeftalíNeftalí Feliz   Texas Rangers  P 
64 2011 JeremyJeremy Hellickson   Tampa Bay Rays  P 
65 2012   MikeMike Trout  Los Angeles Angels  OF 
66 2013    WilWil Myers   Tampa Bay Rays  OF 
67 2014   JoséJosé Abreu  Chicago White Sox  1B 
68 2015  CarlosCarlos Correa   Houston Astros  SS 
69 2016 MichaelMichael Fulmer   Detroit Tigers  P 
+0

是否有任何中间名?我们总能期待? –

回答

3

您可以找到至少三个字母的重复模式和复印件1份这样的替换它解决这个问题:

gsub("(\\w{3,})\\1", "\\1", Players$Player) 

如果要覆盖旧版本,只是

Players$Player = gsub("(\\w{3,})\\1", "\\1", Players$Player) 
+0

为什么“至少3”规则是必要的? – jdobres

+0

只是为了消除虚假的匹配。例如,名字“哈利”出现在上面。你不想让它改变为哈里。 – G5W

+0

但假如资本化很重要,你会不会更好?例如:'gsub('([A-Z] [a-z] +)\\ 1','\\ 1',myData $ Player)' – jdobres

2

G5W的回答让你最在那里的方式,但会错过两个字母的名字,如“铝”。这个版本依赖于资本,而不是字符计数:

myData$Player <- gsub('([A-Z][a-z]+)\\1', '\\1', myData$Player) 
1

对于不太精通正则表达式---

library(stringr) 
    fun1<-function(string){ 
     g<-str_split(g," ") 
     h<-str_length(m<-g[[1]][1]) 
     l<-str_sub(m,start = 1,end = h/2) 
     return(paste(l,g[[1]][2])) 
    } 

fun1(df$Player)