字符串在卡桑德拉排序CQL

当查询在卡桑德拉CQL文本主键，字符串比较工作的什么人预期相反的方向，即字符串在卡桑德拉排序CQL

 
cqlsh:test> select * from sl; 

name      | data 
--------------------------+------ 
000000020000000000000003 | null 
000000010000000000000005 | null 
000000010000000000000003 | null 
000000010000000000000002 | null 
000000010000000000000001 | null 

cqlsh:test> select name from sl where token(name) < token('000000010000000000000005'); 
name 
-------------------------- 
000000020000000000000003 

(1 rows) 

cqlsh:test> select name from sl where token(name) > token('000000010000000000000005'); 
name 
-------------------------- 
000000010000000000000003 
000000010000000000000002 
000000010000000000000001 

(3 rows)

在constrast，这是我从字符串比较得到在Python（我认为在大多数其他语言）：

>>>'000000020000000000000003' < '000000010000000000000005' 
False

如果我查询，而不令牌功能，我得到以下错误：

 
cqlsh:test> select name from sl where name < '000000010000000000000005'; 
Bad Request: Only EQ and IN relation are supported on the partition key (unless you use the token() function)

表描述是：

CREATE TABLE sl (
    name text, 
    data blob, 
    PRIMARY KEY (name) 
) WITH 
    bloom_filter_fp_chance=0.010000 AND 
    caching='KEYS_ONLY' AND 
    comment='' AND 
    dclocal_read_repair_chance=0.000000 AND 
    gc_grace_seconds=864000 AND 
    index_interval=128 AND 
    read_repair_chance=0.100000 AND 
    replicate_on_write='true' AND 
    populate_io_cache_on_flush='false' AND 
    default_time_to_live=0 AND 
    speculative_retry='99.0PERCENTILE' AND 
    memtable_flush_period_in_ms=0 AND 
    compaction={'class': 'SizeTieredCompactionStrategy'} AND 
    compression={'sstable_compression': 'LZ4Compressor'};

有没有在我已经错过了或其他地方，为什么选择这样一个奇怪的字符串比较顺序的文档的解释，或者做字符串比较操作就不是我所期望它（即返回一些不相关的顺序，即将它们写入数据库时的顺序）。我使用Murmur3Partitioner分区程序以防万一。

来源

2014-09-30 alexk

在Cassandra中，行按其键值的散列排序。使用Random和Murmur3分割器时，散列值有一个随机元素，因此顺序为A）无意义，B）设计为均匀分布在环中。

因此，查询小于token('000000010000000000000005')的令牌不会基于字符串值“000000010000000000000005”进行比较。它将对散列标记值进行比较。根据您所看到的结果，字符串“000000020000000000000003”的标记值小于“000000010000000000000005”的标记值。

欲了解更多的信息，从DataStax检查此文档：Paging Through Unordered Partitioner Results。

假设你希望能够通过“名”的值来查询你的数据，你可以建一个表有点像这样：

CREATE TABLE sl (
    type text, 
    name text, 
    data blob, 
    PRIMARY KEY (type, name) 
)

我创建type作为分区键。我不确定您的数据是否有意义被“类型”（或其他任何事情）分开，所以它更多的是为了举例而不是其他任何事情。无论如何，与name作为聚集键（确定磁盘上的排序顺序）此查询会工作：

select * from sl where type='sometype' AND name < '000000010000000000000005';

同样它只是一个例子，但我希望可以帮助到你指出正确的方向。

来源

2014-09-30 14:51:54 Aaron

谢谢你，我很困惑，行似乎是在DESC顺序排序，但它看起来像一个纯粹的巧合。项目进行的方式我不需要太多的分区，所以我可能会使用有序的分区程序，或者完全使用应用程序级别的排序和比较。 – alexk 2014-09-30 15:14:43

@alexk只是警告，字节顺序分区程序已被弃用，应该*不能*被使用。 http://www.datastax.com/documentation/cassandra/2.1/cassandra/architecture/architecturePartitionerBOP_c.html – Aaron 2014-09-30 15:22:05

以下是关于令牌功能和相关分页文档的一些链接。为广泛的话题道歉。我不确切知道哪些可能有所帮助：

http://www.datastax.com/documentation/cql/3.1/cql/cql_using/paging_c.html通过无序分区程序结果进行分页意味着使用Murmur3Partitioner确实很重要。
http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/select_r.html?scroll=reference_ds_d35_v2q_xj__paging-through-unordered-results部分表示使用RandomPartitioner进行分页不会给您有意义的结果。 RandomPartitioner在这种情况下与Murmer3Partitioner是同步的。文档应该提及两者。
http://www.datastax.com/dev/blog/client-side-improvements-in-cassandra-2-0请参阅自动寻呼。
http://datastax.github.io/python-driver/query_paging.html
http://www.datastax.com/drivers/java/2.0/index.html请参阅ResultSet。

来源

2014-09-30 14:44:14 catpaws

字符串在卡桑德拉排序CQL

回答

相关问题