2017-07-14 47 views
1
13-JUL-17                  


Bank User      Space Occupied(GB)        
------------------------------ ------------------        
CKYC_MNSB        .004211426        
CORE_AMARNATH_ASP      8.75262451        
CORE_AMBUJA       6.80389404        
CORE_AMBUJA_ASP      10.0085449        
CORE_ANAND_MERC_ASP     18.9866333        
CORE_BALOTRA       17.8280029        
CORE_BASODA       4.55432129        
CORE_CHHAPI_ASP      11.9767456        
CORE_DHANGDHRA_ASP      13.1849976        
CORE_IDAR_ASP       13.3209229        
CORE_JANTA_HALOL_ASP     12.7955933        

Bank User      Space Occupied(GB)        
------------------------------ ------------------        
CORE_JHALOD_URBAN_ASP     9.19219971        
CORE_MANINAGAR       5.36090088        
CORE_MANINAGAR_ASP      6.31414795        
CORE_SANKHEDA       20.4329834        
CORE_SMCB_ANAND_ASP     11.3191528        
CORE_TARAPUR_ASP      8.24627686        
CORE_VUCB        .000610352        
TBA_TEMP        5.39910889        
TEST_DUNIA        4.15698242        

20 rows selected. 


TABLESPACE NAME    Free Space in GB         
------------------------------ ----------------         
TBAPROJ        33.2736816         

我有上面的文本文件。如何转换和存储文本文件为csv

如何将CSV文件以列分隔存储?

我有加载文件,但很难从文件中删除空格。

回答

1

你想每一行由大写字母和下划线,然后空格,然后一个数字,中有一个小数点组成的单词的模式匹配。所以这grep将过滤出那些:

> file_raw <- readLines('file.txt') 
> read.table(
    text=paste(
     file_raw[ 
     grep("^[A-Z_].*\\s*\\.",file_raw) 
     ], 
     collapse="\n"), 
    sep="",head=FALSE) 
         V1   V2 
1    CKYC_MNSB 0.004211426 
2  CORE_AMARNATH_ASP 8.752624510 
3   CORE_AMBUJA 6.803894040 
4  CORE_AMBUJA_ASP 10.008544900 
5 CORE_ANAND_MERC_ASP 18.986633300 
6   CORE_BALOTRA 17.828002900 
7   CORE_BASODA 4.554321290 
8  CORE_CHHAPI_ASP 11.976745600 
9  CORE_DHANGDHRA_ASP 13.184997600 
10   CORE_IDAR_ASP 13.320922900 
11 CORE_JANTA_HALOL_ASP 12.795593300 
12 CORE_JHALOD_URBAN_ASP 9.192199710 
13  CORE_MANINAGAR 5.360900880 
14 CORE_MANINAGAR_ASP 6.314147950 
15   CORE_SANKHEDA 20.432983400 
16 CORE_SMCB_ANAND_ASP 11.319152800 
17  CORE_TARAPUR_ASP 8.246276860 
18    CORE_VUCB 0.000610352 
19    TBA_TEMP 5.399108890 
20   TEST_DUNIA 4.156982420 
21    TBAPROJ 33.273681600 

请注意,如果你期待任何第一令牌来匹配的模式,例如CORE_999lower_case那么你就需要调整格局。但是如果没有正式的规范,我们只能继续提供您所提供的内容。

1

可能有可能是一个更优雅的方式,但这样做的伎俩:

# read raw file in lines 
file_raw <- readLines('file.txt') 

# remove whitespace 
file_trim <- trimws(file_raw,which = 'both') 

# remove empty lines 
file_trim <- file_trim[file_trim != ''] 

# sub white space with separator , 
file_csv <- gsub('\\s{2,}',',',file_trim) 

最终仍会有一些事情没有像--分离器和20 rows selected.,但可以很容易地过滤掉如果你想,写之前或看完后:

file_clean <- file_csv[!grepl('(-){3,}|rows selected',file_csv)] 

write.csv(file_clean,'file_cleaned.csv') 




     > read.csv('file_cleaned.csv') 
    X        x 
1 1      13-JUL-17 
2 2  Bank User,Space Occupied(GB) 
3 3    CKYC_MNSB,.004211426 
4 4  CORE_AMARNATH_ASP,8.75262451 
5 5   CORE_AMBUJA,6.80389404 
6 6  CORE_AMBUJA_ASP,10.0085449 
7 7 CORE_ANAND_MERC_ASP,18.9866333 
8 8   CORE_BALOTRA,17.8280029 
9 9   CORE_BASODA,4.55432129 
10 10  CORE_CHHAPI_ASP,11.9767456 
11 11 CORE_DHANGDHRA_ASP,13.1849976 
12 12   CORE_IDAR_ASP,13.3209229 
13 13 CORE_JANTA_HALOL_ASP,12.7955933 
14 14  Bank User,Space Occupied(GB) 
15 15 CORE_JHALOD_URBAN_ASP,9.19219971 
16 16  CORE_MANINAGAR,5.36090088 
17 17 CORE_MANINAGAR_ASP,6.31414795 
18 18   CORE_SANKHEDA,20.4329834 
19 19 CORE_SMCB_ANAND_ASP,11.3191528 
20 20  CORE_TARAPUR_ASP,8.24627686 
21 21    CORE_VUCB,.000610352 
22 22    TBA_TEMP,5.39910889 
23 23   TEST_DUNIA,4.15698242 
24 24 TABLESPACE NAME,Free Space in GB 
25 25    TBAPROJ,33.2736816 
+2

感谢Val,当我保存文件的银行名称和占用的GB都存储在单列中如何分开它? – Ree

+0

@Ree请参阅我的编辑:我只是更改正则表达式,仅删除2个或更多个连续的空格,在单词之间留下单个空格,以使它们落入单个列 – Val