2017-04-24 67 views
1

我试图清理低于R中 数据我有一个字符串矢量,看起来是这样的 -转换为句首字母R中

/organization/-fame 
    /ORGANIZATION/-QOUNTER 
    /organization/-qounter 
    /ORGANIZATION/-THE-ONE-OF-THEM-INC- 
    /organization/0-6-com 
    /ORGANIZATION/004-TECHNOLOGIES 
    /organization/01games-technology 
    /ORGANIZATION/0NDINE-BIOMEDICAL-INC 
    /organization/0ndine-biomedical-inc 
    /ORGANIZATION/0XDATA 
    /organization/0xdata 
    /ORGANIZATION/0XDATA 
    /organization/0xdata 
    /ORGANIZATION/1 
    /organization/1 
    /ORGANIZATION/1 
    /organization/1-2-3-listo 
    /ORGANIZATION/1-4-ALL 
    /organization/1-618-technology 
    /ORGANIZATION/1-800-DENTIST 
    /organization/1-800-doctors 
    /ORGANIZATION/1-800-PUBLICRELATIONS-INC- 
    /organization/1-mainstream 
    /ORGANIZATION/1-OF-99 
    /organization/10-20-media 
    /ORGANIZATION/10-20-MEDIA 

我要改变每一个字的情况下字符串判例。所以 改变它毕竟要像 -

/Organization/-Fame 
    /Organization/-Qounter 
    /Organization/-The-One-Of-Them-Inc- 
    /Organization/0-6-Com 
    /Organization/004-Technologies 
    /Organization/01Games-Technology 
    /Organization/0Ndine-Biomedical-Inc 
    /Organization/0Xdata 
    /Organization/1 
    /Organization/1-2-3-Listo 
    /Organization/1-4-All 
    /Organization/1-618-Technology 
    /Organization/1-800-Dentist 
    /Organization/1-800-Doctors 
    /Organization/1-800-Publicrelations-Inc- 
    /Organization/1-Mainstream 
    /Organization/1-Of-99 
    /Organization/10-20-Media 

回答

2

您可以使用正则表达式。与样品输入

x<-c("/organization/-fame", "/ORGANIZATION/-QOUNTER", "/organization/-qounter", 
"/ORGANIZATION/-THE-ONE-OF-THEM-INC-", "/organization/0-6-com", 
"/ORGANIZATION/004-TECHNOLOGIES", "/organization/01games-technology", 
"/ORGANIZATION/0NDINE-BIOMEDICAL-INC", "/organization/0ndine-biomedical-inc", 
"/ORGANIZATION/0XDATA", "/organization/0xdata", "/ORGANIZATION/0XDATA", 
"/organization/0xdata", "/ORGANIZATION/1", "/organization/1", 
"/ORGANIZATION/1", "/organization/1-2-3-listo", "/ORGANIZATION/1-4-ALL", 
"/organization/1-618-technology", "/ORGANIZATION/1-800-DENTIST", 
"/organization/1-800-doctors", "/ORGANIZATION/1-800-PUBLICRELATIONS-INC-", 
"/organization/1-mainstream", "/ORGANIZATION/1-OF-99", "/organization/10-20-media", 
"/ORGANIZATION/10-20-MEDIA") 

您可以运行

gsub("([[:alpha:]])([[:alpha:]]+)", "\\U\\1\\L\\2", x, perl=TRUE) 

得到

[1] "/Organization/-Fame"      
[2] "/Organization/-Qounter"     
[3] "/Organization/-Qounter"     
[4] "/Organization/-The-One-Of-Them-Inc-"  
[5] "/Organization/0-6-Com"     
[6] "/Organization/004-Technologies"   
[7] "/Organization/01Games-Technology"   
[8] "/Organization/0Ndine-Biomedical-Inc"  
[9] "/Organization/0Ndine-Biomedical-Inc"  
[10] "/Organization/0Xdata"      
[11] "/Organization/0Xdata"      
[12] "/Organization/0Xdata"      
[13] "/Organization/0Xdata"      
[14] "/Organization/1"       
[15] "/Organization/1"       
[16] "/Organization/1"       
[17] "/Organization/1-2-3-Listo"    
[18] "/Organization/1-4-All"     
[19] "/Organization/1-618-Technology"   
[20] "/Organization/1-800-Dentist"    
[21] "/Organization/1-800-Doctors"    
[22] "/Organization/1-800-Publicrelations-Inc-" 
[23] "/Organization/1-Mainstream"    
[24] "/Organization/1-Of-99"     
[25] "/Organization/10-20-Media"    
[26] "/Organization/10-20-Media"   
+0

上面的代码没有给我答案 - 它给了我一个警告味精, 警告消息: 在[< - 。factor>('* tmp *',1,value = c(NA,3L,2L,4L,5L,6L,7L,: 无效因子水平,产生NA 并将该字符串转换为NA。同样,许多字符串都转换为NA – snk

+0

您是否使用我提供的数据进行了尝试?这个错误听起来像是你有一个因子矢量而不是一个字符矢量,你试图重新指向那个矢量。这不是您原始文章中包含的信息。 – MrFlick

+0

因此,如果我将因子向量转换为字符向量,您的解决方案将工作吗?让我尝试一下。 – snk