2015-08-28 63 views
1

我试图将CSV文件导入到Cassandra表中,但是我面临一个问题。 当插入成功时,至少这是卡桑德拉所说的,我仍然看不到任何记录。这里是一个小的详细信息:从Cassandra中的CSV导入时没有插入表中的行

qlsh:recommendation_engine> COPY row_historical_game_outcome_data FROM '/home/adelin/workspace/docs/re_raw_data2.csv' WITH DELIMITER='|'; 

2 rows imported in 0.216 seconds. 
cqlsh:recommendation_engine> select * from row_historical_game_outcome_data; 

customer_id | game_id | time | channel | currency_code | game_code | game_name | game_type | game_vendor | progressive_winnings | stake_amount | win_amount 
-------------+---------+------+---------+---------------+-----------+-----------+-----------+-------------+----------------------+--------------+------------ 

(0 rows) 
cqlsh:recommendation_engine> 

这是我的数据看起来像

'SomeName'|673|'SomeName'|'SomeName'|'TYPE'|'M'|123123|0.20000000000000001|0.0|'GBP'|2015-07-01 00:01:42.19700|0.0| 
'SomeName'|673|'SomeName'|'SomeName'|'TYPE'|'M'|456456|0.20000000000000001|0.0|'GBP'|2015-07-01 00:01:42.19700|0.0| 

这是卡桑德拉版本Apache的卡桑德拉-2.2.0

编辑:

CREATE TABLE row_historical_game_outcome_data (
    customer_id int, 
    game_id int, 
    time timestamp, 
    channel text, 
    currency_code text, 
    game_code text, 
    game_name text, 
    game_type text, 
    game_vendor text, 
    progressive_winnings double, 
    stake_amount double, 
    win_amount double, 
    PRIMARY KEY ((customer_id, game_id, time)) 
) WITH bloom_filter_fp_chance = 0.01 
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}' 
    AND comment = '' 
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'} 
    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'} 
    AND dclocal_read_repair_chance = 0.1 
    AND default_time_to_live = 0 
    AND gc_grace_seconds = 864000 
    AND max_index_interval = 2048 
    AND memtable_flush_period_in_ms = 0 
    AND min_index_interval = 128 
    AND read_repair_chance = 0.0 
    AND speculative_retry = '99.0PERCENTILE'; 

我也尝试了以下建议uri2x

,但仍然没有:

select * from row_historical_game_outcome_data; 

customer_id | game_id | time | channel | currency_code | game_code | game_name | game_type | game_vendor | progressive_winnings | stake_amount | win_amount 
-------------+---------+------+---------+---------------+-----------+-----------+-----------+-------------+----------------------+--------------+------------ 

(0 rows) 
cqlsh:recommendation_engine> COPY row_historical_game_outcome_data ("game_vendor","game_id","game_code","game_name","game_type","channel","customer_id","stake_amount","win_amount","currency_code","time","progressive_winnings") FROM '/home/adelin/workspace/docs/re_raw_data2.csv' WITH DELIMITER='|'; 

2 rows imported in 0.192 seconds. 
cqlsh:recommendation_engine> select * from row_historical_game_outcome_data; 

customer_id | game_id | time | channel | currency_code | game_code | game_name | game_type | game_vendor | progressive_winnings | stake_amount | win_amount 
-------------+---------+------+---------+---------------+-----------+-----------+-----------+-------------+----------------------+--------------+------------ 

(0 rows) 
+0

你能告诉我们你的'DESCRIBE TABLE吗? – uri2x

+0

你在这里我已经添加了表格说明。 – Adelin

+2

似乎您的列顺序与CSV文件中的列顺序不同(第一列不是int,第三列不是日期等)。尝试使用COPY列名称来匹配CSV文件的顺序。 – uri2x

回答

1

好吧,我不得不改变一些事情对你的数据文件,使这项工作:

SomeName|673|SomeName|SomeName|TYPE|M|123123|0.20000000000000001|0.0|GBP|2015-07-01 00:01:42|0.0 
SomeName|673|SomeName|SomeName|TYPE|M|456456|0.20000000000000001|0.0|GBP|2015-07-01 00:01:42|0.0 
  • 删除了尾随管。
  • 截断时间缩短到秒。
  • 删除所有单引号。

一旦我做到了,然后我执行:

[email protected]:stackoverflow> COPY row_historical_game_outcome_data 
(game_vendor,game_id,game_code,game_name,game_type,channel,customer_id,stake_amount, 
win_amount,currency_code , time , progressive_winnings) 
FROM '/home/aploetz/cassandra_stack/re_raw_data3.csv' WITH DELIMITER='|'; 

Improper COPY command. 

这一个是有点棘手。我终于明白COPY不喜欢列名time。我调整了表使用的名称,而不是game_time,并重新跑COPY

[email protected]:stackoverflow> DROP TABLE row_historical_game_outcome_data ; 
[email protected]:stackoverflow> CREATE TABLE row_historical_game_outcome_data (
      ...  customer_id int, 
      ...  game_id int, 
      ...  game_time timestamp, 
      ...  channel text, 
      ...  currency_code text, 
      ...  game_code text, 
      ...  game_name text, 
      ...  game_type text, 
      ...  game_vendor text, 
      ...  progressive_winnings double, 
      ...  stake_amount double, 
      ...  win_amount double, 
      ...  PRIMARY KEY ((customer_id, game_id, game_time)) 
      ...); 

[email protected]:stackoverflow> COPY row_historical_game_outcome_data 
(game_vendor,game_id,game_code,game_name,game_type,channel,customer_id,stake_amount, 
win_amount,currency_code , game_time , progressive_winnings) 
FROM '/home/aploetz/cassandra_stack/re_raw_data3.csv' WITH DELIMITER='|'; 

3 rows imported in 0.738 seconds. 
[email protected]:stackoverflow> SELECT * FROM row_historical_game_outcome_data ; 

customer_id | game_id | game_time    | channel | currency_code | game_code | game_name | game_type | game_vendor | progressive_winnings | stake_amount | win_amount 
-------------+---------+--------------------------+---------+---------------+-----------+-----------+-----------+-------------+----------------------+--------------+------------ 
     123123 |  673 | 2015-07-01 00:01:42-0500 |  M |   GBP | SomeName | SomeName |  TYPE | SomeName |     0 |   0.2 |   0 
     456456 |  673 | 2015-07-01 00:01:42-0500 |  M |   GBP | SomeName | SomeName |  TYPE | SomeName |     0 |   0.2 |   0 

(2 rows) 
  • 我不知道为什么它说:“3行进口,”所以我的猜测是,它是计数标题行。
  • 您的密钥都是分区密钥。不知道你是否真的明白这一点。我只指出,因为我想不出指定多个分区键而没有的原因,它也指定了一个或多个集群键。
  • 我在DataStax文档中找不到任何指示“时间”是保留字的内容。这可能是一个在cqlsh中的错误。但严重的是,您应该可能将基于时间的列名称指定为“时间”以外的其他名称。
+0

你调查的是真的,问题出在Informix DB生成的CSV上,但是CassandraDB应该有对其错误更详细 – Adelin

0

有迹象表明,在您的CSV文件打扰cqlsh两件事情:

  1. 删除尾随|在每个CSV行的末尾
  2. 从您的时间值中删除微秒(精度应至多为毫秒)。
1

一个其他评论。CQL中的COPY增加了WITH HEADER = TRUE,这会导致CSV文件的标题行(第一行)被忽略。 “时间”不是CQL中的保留字(相信我,因为我只是在DataStax文档中自己更新了CQL保留字)。但是,您确实在COPY命令的列名称周围显示了列名“time”的空格,我认为这是问题所在。没有空格,只是逗号;在CSV文件中为所有行执行相同操作。 (http://docs.datastax.com/en/cql/3.3/cql/cql_reference/keywords_r.html

+0

好点,CQLSH的COPY命令肯定可能会很棘手。 – Aaron