如何创建列描述（CD）文件Catboost

pool = Pool(features_file, CDfile)

凡CDfile包含文本和< \ T>分隔符：

0 Target 
1 Categ cat_reg 
97 Categ cat_dow 
98 Categ cat_nweek 
99 Categ cat_month 
100 Categ cat_hour 
101 Categ cat_is_month_start 
102 Categ cat_is_year_end 
103 Categ cat_is_year_start 
104 Categ cat_anomaly2016

我有这样的结果： 在第102列和第1行中的因子False被声明为数字，不能被解析为float。尝试更正列描述文件。

这里是全功能=数据帧（）

LEN（cat_features），LEN（features.columns）切片9 105

cat_columns   cat_positions values 
    cat_reg       1 1075 
    cat_dow      97  5 
    cat_nweek      98  17 
    cat_month      99  4 
    cat_hour      100  1 
    cat_is_month_start   101 False 
    cat_is_year_end    102 False 
    cat_is_year_start    103 False 
    cat_anomaly2016    104  0

有什么问题柱102？为什么它被声明为数字（而不是分类）功能？

来源

2017-12-27 Sergey Novozhilov

该包是CD文件中的冗余不可打印字符。我没有抓到哪一个。这里是生成CD文件的代码。

def catboostCD(fname, cat_features, cat_features_names, sep='\t'): 
    with open(fname,"w") as fout: 
     fout.write('0{0}Target'.format(sep)) 
     fout.write(''.join(['\n{0}{1}Categ{1}{2}'.format(el[0], sep, el[1]) for el in zip(cat_features, cat_features_names)]))

其中

cat_features = np.ravel(np.where(np.char.startswith(list(features.columns), prefix='cat_'))) 
cat_features_names = features.columns.values[ cat_features]

来源

2017-12-27 22:26:56

如何创建列描述（CD）文件Catboost

回答

相关问题