从Cassandra中的CSV导入时没有插入表中的行

我试图将CSV文件导入到Cassandra表中，但是我面临一个问题。当插入成功时，至少这是卡桑德拉所说的，我仍然看不到任何记录。这里是一个小的详细信息：从Cassandra中的CSV导入时没有插入表中的行

qlsh:recommendation_engine> COPY row_historical_game_outcome_data FROM '/home/adelin/workspace/docs/re_raw_data2.csv' WITH DELIMITER='|'; 

2 rows imported in 0.216 seconds. 
cqlsh:recommendation_engine> select * from row_historical_game_outcome_data; 

customer_id | game_id | time | channel | currency_code | game_code | game_name | game_type | game_vendor | progressive_winnings | stake_amount | win_amount 
-------------+---------+------+---------+---------------+-----------+-----------+-----------+-------------+----------------------+--------------+------------ 

(0 rows) 
cqlsh:recommendation_engine>

这是我的数据看起来像

'SomeName'|673|'SomeName'|'SomeName'|'TYPE'|'M'|123123|0.20000000000000001|0.0|'GBP'|2015-07-01 00:01:42.19700|0.0| 
'SomeName'|673|'SomeName'|'SomeName'|'TYPE'|'M'|456456|0.20000000000000001|0.0|'GBP'|2015-07-01 00:01:42.19700|0.0|

这是卡桑德拉版本Apache的卡桑德拉-2.2.0

编辑：

CREATE TABLE row_historical_game_outcome_data (
    customer_id int, 
    game_id int, 
    time timestamp, 
    channel text, 
    currency_code text, 
    game_code text, 
    game_name text, 
    game_type text, 
    game_vendor text, 
    progressive_winnings double, 
    stake_amount double, 
    win_amount double, 
    PRIMARY KEY ((customer_id, game_id, time)) 
) WITH bloom_filter_fp_chance = 0.01 
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}' 
    AND comment = '' 
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'} 
    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'} 
    AND dclocal_read_repair_chance = 0.1 
    AND default_time_to_live = 0 
    AND gc_grace_seconds = 864000 
    AND max_index_interval = 2048 
    AND memtable_flush_period_in_ms = 0 
    AND min_index_interval = 128 
    AND read_repair_chance = 0.0 
    AND speculative_retry = '99.0PERCENTILE';

我也尝试了以下建议uri2x

，但仍然没有：

select * from row_historical_game_outcome_data; 

customer_id | game_id | time | channel | currency_code | game_code | game_name | game_type | game_vendor | progressive_winnings | stake_amount | win_amount 
-------------+---------+------+---------+---------------+-----------+-----------+-----------+-------------+----------------------+--------------+------------ 

(0 rows) 
cqlsh:recommendation_engine> COPY row_historical_game_outcome_data ("game_vendor","game_id","game_code","game_name","game_type","channel","customer_id","stake_amount","win_amount","currency_code","time","progressive_winnings") FROM '/home/adelin/workspace/docs/re_raw_data2.csv' WITH DELIMITER='|'; 

2 rows imported in 0.192 seconds. 
cqlsh:recommendation_engine> select * from row_historical_game_outcome_data; 

customer_id | game_id | time | channel | currency_code | game_code | game_name | game_type | game_vendor | progressive_winnings | stake_amount | win_amount 
-------------+---------+------+---------+---------------+-----------+-----------+-----------+-------------+----------------------+--------------+------------ 

(0 rows)

来源

2015-08-28 Adelin

你能告诉我们你的'DESCRIBE TABLE吗？ – uri2x

你在这里我已经添加了表格说明。 – Adelin

似乎您的列顺序与CSV文件中的列顺序不同（第一列不是int，第三列不是日期等）。尝试使用COPY列名称来匹配CSV文件的顺序。 – uri2x

好吧，我不得不改变一些事情对你的数据文件，使这项工作：

SomeName|673|SomeName|SomeName|TYPE|M|123123|0.20000000000000001|0.0|GBP|2015-07-01 00:01:42|0.0 
SomeName|673|SomeName|SomeName|TYPE|M|456456|0.20000000000000001|0.0|GBP|2015-07-01 00:01:42|0.0

删除了尾随管。
截断时间缩短到秒。
删除所有单引号。

一旦我做到了，然后我执行：

[email protected]:stackoverflow> COPY row_historical_game_outcome_data 
(game_vendor,game_id,game_code,game_name,game_type,channel,customer_id,stake_amount, 
win_amount,currency_code , time , progressive_winnings) 
FROM '/home/aploetz/cassandra_stack/re_raw_data3.csv' WITH DELIMITER='|'; 

Improper COPY command.

这一个是有点棘手。我终于明白COPY不喜欢列名time。我调整了表使用的名称，而不是game_time，并重新跑COPY：

[email protected]:stackoverflow> DROP TABLE row_historical_game_outcome_data ; 
[email protected]:stackoverflow> CREATE TABLE row_historical_game_outcome_data (
      ...  customer_id int, 
      ...  game_id int, 
      ...  game_time timestamp, 
      ...  channel text, 
      ...  currency_code text, 
      ...  game_code text, 
      ...  game_name text, 
      ...  game_type text, 
      ...  game_vendor text, 
      ...  progressive_winnings double, 
      ...  stake_amount double, 
      ...  win_amount double, 
      ...  PRIMARY KEY ((customer_id, game_id, game_time)) 
      ...); 

[email protected]:stackoverflow> COPY row_historical_game_outcome_data 
(game_vendor,game_id,game_code,game_name,game_type,channel,customer_id,stake_amount, 
win_amount,currency_code , game_time , progressive_winnings) 
FROM '/home/aploetz/cassandra_stack/re_raw_data3.csv' WITH DELIMITER='|'; 

3 rows imported in 0.738 seconds. 
[email protected]:stackoverflow> SELECT * FROM row_historical_game_outcome_data ; 

customer_id | game_id | game_time    | channel | currency_code | game_code | game_name | game_type | game_vendor | progressive_winnings | stake_amount | win_amount 
-------------+---------+--------------------------+---------+---------------+-----------+-----------+-----------+-------------+----------------------+--------------+------------ 
     123123 |  673 | 2015-07-01 00:01:42-0500 |  M |   GBP | SomeName | SomeName |  TYPE | SomeName |     0 |   0.2 |   0 
     456456 |  673 | 2015-07-01 00:01:42-0500 |  M |   GBP | SomeName | SomeName |  TYPE | SomeName |     0 |   0.2 |   0 

(2 rows)

我不知道为什么它说：“3行进口，”所以我的猜测是，它是计数标题行。
您的密钥都是分区密钥。不知道你是否真的明白这一点。我只指出，因为我想不出指定多个分区键而没有的原因，它也指定了一个或多个集群键。
我在DataStax文档中找不到任何指示“时间”是保留字的内容。这可能是一个在cqlsh中的错误。但严重的是，您应该可能将基于时间的列名称指定为“时间”以外的其他名称。

来源

2015-08-28 19:03:48 Aaron

你调查的是真的，问题出在Informix DB生成的CSV上，但是CassandraDB应该有对其错误更详细 – Adelin

有迹象表明，在您的CSV文件打扰cqlsh两件事情：

删除尾随|在每个CSV行的末尾
从您的时间值中删除微秒（精度应至多为毫秒）。

来源

2015-08-28 13:04:58 uri2x

一个其他评论。CQL中的COPY增加了WITH HEADER = TRUE，这会导致CSV文件的标题行（第一行）被忽略。 “时间”不是CQL中的保留字（相信我，因为我只是在DataStax文档中自己更新了CQL保留字）。但是，您确实在COPY命令的列名称周围显示了列名“time”的空格，我认为这是问题所在。没有空格，只是逗号;在CSV文件中为所有行执行相同操作。（http://docs.datastax.com/en/cql/3.3/cql/cql_reference/keywords_r.html）

来源

2015-09-15 05:04:45 polandll

好点，CQLSH的COPY命令肯定可能会很棘手。 – Aaron

从Cassandra中的CSV导入时没有插入表中的行

回答

相关问题