2010-07-10 91 views
8

任何人都可以指向我的排序算法在JavaScript中,将排序SQL Server的相同方式(对于nvarchar/unicode列)?JavaScript排序匹配SQL Server排序

作为参考,我以前对这种行为问题可以在这里找到:SQL Server 2008 - different sort orders on VARCHAR vs NVARCHAR values

,而不是试图改变在服务器端排序行为,是有办法,我可以在客户端匹配呢?我之前的问题特别提到了按排序顺序的破折号,但我会假设它比简单地忽略破折号更重要一点。

我已经在这里添加了一些额外的使用情况,以更好地展示问题

采样数据从SQL Server(2008)排序:

?test 
^&$Grails Found 
bags of Garbage 
Brochures distributed 
Calls Received 
exhibit visitors 
Exhibit Visitors 
-Exhibit Visitors 
--Exhibit Visitors 
Ëxhibit Visitors 
Grails Found 

我怎样才能得到JavaScript以相同的值进行排序一样的方法?

请让我知道,如果我可以进一步澄清。

+0

所以,从这个问题,你想要的JavaScript来排序,现在的Unicode'统一之前A''-A'? – 2010-07-11 00:42:50

+0

@Bock - 正确,虽然更具体地说,我想要一个与服务器端相匹配的javascript排序算法(我想除了“ - ”字符还有更多要考虑) – DanP 2010-07-11 01:16:00

回答

6

第一你的数据库整理是什么?我会假设它是SQL_Latin1_General_CP1_CS_ASSQL_Latin1_General_CP1_CI_AS。如果是这样,那么以下应该工作(还没有完全测试)。

它看起来像是写一个true Unicode分拣机是一项主要工作。我见过比规格更直接的税码。 ;-)它似乎总是涉及查找表和至少一个3级排序 - 修改字符和收缩来解释。

我限于以下的Latin 1Latin Extended-A,和Latin Extended-B表/归类。该算法应该在那些集合上工作得很好,但我没有完全测试它,也没有正确考虑修改字符(以节省速度和复杂性)。

查看它in action at jsbin.com

功能:

function bIgnoreForPrimarySort (iCharCode) 
{ 
    /*--- A bunch of characters get ignored for the primary sort weight. 
     The most important ones are the hyphen and apostrophe characters. 
     A bunch of control characters and a couple of odds and ends, make up 
     the rest. 
    */ 
    if (iCharCode < 9)             return true; 

    if (iCharCode >= 14 && iCharCode <= 31)       return true; 

    if (iCharCode >= 127 && iCharCode <= 159)       return true; 

    if (iCharCode == 39 || iCharCode == 45 || iCharCode == 173) return true; 

    return false; 
} 


function SortByRoughSQL_Latin1_General_CP1_CS_AS (sA, sB) 
{ 
    /*--- This Sorts Latin1 and extended Latin1 unicode with an approximation 
     of SQL's SQL_Latin1_General_CP1_CS_AS collation. 
     Certain modifying characters or contractions my be off (not tested), we trade-off 
     perfect accuracy for speed and relative simplicity. 

     True unicode sorting is devilishly complex and we're not getting paid enough to 
     fully implement it in Javascript. ;-) 

     It looks like a definative sort would require painstaking exegesis of documents 
     such as: http://unicode.org/reports/tr10/ 
    */ 
    //--- This is the master lookup table for Latin1 code-points. Here through the extended set \u02AF 
    //--- Make this static? 
    var aSortOrder = [ 
        -1, 151, 152, 153, 154, 155, 156, 157, 158, 2, 3, 4, 5, 6, 159, 160, 161, 162, 163, 164, 
        165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 0, 7, 8, 9, 10, 11, 12, 210, 
        13, 14, 15, 41, 16, 211, 17, 18, 65, 69, 71, 74, 76, 77, 80, 81, 82, 83, 19, 20, 
        42, 43, 44, 21, 22, 214, 257, 266, 284, 308, 347, 352, 376, 387, 419, 427, 438, 459, 466, 486, 
        529, 534, 538, 559, 576, 595, 636, 641, 647, 650, 661, 23, 24, 25, 26, 27, 28, 213, 255, 265, 
        283, 307, 346, 350, 374, 385, 418, 426, 436, 458, 464, 485, 528, 533, 536, 558, 575, 594, 635, 640, 
        646, 648, 660, 29, 30, 31, 32, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 
        190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 
         1, 33, 53, 54, 55, 56, 34, 57, 35, 58, 215, 46, 59, 212, 60, 36, 61, 45, 72, 75, 
        37, 62, 63, 64, 38, 70, 487, 47, 66, 67, 68, 39, 219, 217, 221, 231, 223, 233, 250, 276, 
        312, 310, 316, 318, 392, 390, 395, 397, 295, 472, 491, 489, 493, 503, 495, 48, 511, 599, 597, 601, 
        603, 652, 590, 573, 218, 216, 220, 230, 222, 232, 249, 275, 311, 309, 315, 317, 391, 389, 394, 396, 
        294, 471, 490, 488, 492, 502, 494, 49, 510, 598, 596, 600, 602, 651, 589, 655, 229, 228, 227, 226, 
        235, 234, 268, 267, 272, 271, 270, 269, 274, 273, 286, 285, 290, 287, 324, 323, 322, 321, 314, 313, 
        326, 325, 320, 319, 358, 357, 362, 361, 356, 355, 364, 363, 378, 377, 380, 379, 405, 404, 403, 402, 
        401, 400, 407, 406, 393, 388, 417, 416, 421, 420, 432, 431, 428, 440, 439, 447, 446, 444, 443, 442, 
        441, 450, 449, 468, 467, 474, 473, 470, 469, 477, 484, 483, 501, 500, 499, 498, 507, 506, 527, 526, 
        540, 539, 544, 543, 542, 541, 561, 560, 563, 562, 567, 566, 565, 564, 580, 579, 578, 577, 593, 592, 
        611, 610, 609, 608, 607, 606, 613, 612, 617, 616, 615, 614, 643, 642, 654, 653, 656, 663, 662, 665, 
        664, 667, 666, 574, 258, 260, 262, 261, 264, 263, 281, 278, 277, 304, 292, 289, 288, 297, 335, 337, 
        332, 348, 349, 369, 371, 382, 415, 409, 434, 433, 448, 451, 462, 476, 479, 509, 521, 520, 524, 523, 
        531, 530, 552, 572, 571, 569, 570, 583, 582, 581, 585, 632, 631, 634, 638, 658, 657, 669, 668, 673, 
        677, 676, 678, 73, 79, 78, 680, 644, 50, 51, 52, 40, 303, 302, 301, 457, 456, 455, 482, 481, 
        480, 225, 224, 399, 398, 497, 496, 605, 604, 626, 625, 620, 619, 624, 623, 622, 621, 334, 241, 240, 
        237, 236, 254, 253, 366, 365, 360, 359, 430, 429, 505, 504, 515, 514, 675, 674, 422, 300, 299, 298, 
        354, 353, 84, 85, 86, 87, 239, 238, 252, 251, 513, 512, 243, 242, 245, 244, 328, 327, 330, 329, 
        411, 410, 413, 412, 517, 516, 519, 518, 547, 546, 549, 548, 628, 627, 630, 629, 88, 89, 90, 91, 
        92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 
        112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 
        132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 246, 247, 248, 259, 279, 280, 293, 291, 
        339, 336, 338, 331, 340, 341, 342, 423, 367, 373, 351, 370, 372, 383, 381, 384, 408, 414, 386, 445, 
        453, 452, 454, 461, 463, 460, 475, 478, 465, 508, 522, 525, 532, 550, 553, 554, 555, 545, 556, 557, 
        537, 551, 568, 333, 424, 343, 344, 586, 584, 618, 633, 637, 639, 645, 659, 649, 670, 671, 672, 679, 
        681, 682, 683, 282, 686, 256, 345, 368, 375, 425, 435, 437, 535, 684, 685, 305, 296, 306, 591, 587, 
        588, 144, 145, 146, 147, 148, 149, 150 
        ]; 

    var iLenA   = sA.length, iLenB   = sB.length; 
    var jA    = 0,   jB    = 0; 
    var sIgnoreBuff_A = [],   sIgnoreBuff_B = []; 


    function iSortIgnoreBuff() 
    { 
     var iIgLenA = sIgnoreBuff_A.length, iIgLenB = sIgnoreBuff_B.length; 
     var kA  = 0,     kB  = 0; 

     while (kA < iIgLenA && kB < iIgLenB) 
     { 
      var igA = sIgnoreBuff_A [kA++], igB = sIgnoreBuff_B [kB++]; 

      if (aSortOrder[igA] > aSortOrder[igB]) return 1; 
      if (aSortOrder[igA] < aSortOrder[igB]) return -1; 
     } 
     //--- All else equal, longest string loses 
     if (iIgLenA > iIgLenB)  return 1; 
     if (iIgLenA < iIgLenB)  return -1; 

     return 0; 
    } 


    while (jA < iLenA && jB < iLenB) 
    { 
     var cA = sA.charCodeAt (jA++); 
     var cB = sB.charCodeAt (jB++); 

     if (cA == cB) 
     { 
      continue; 
     } 

     while (bIgnoreForPrimarySort (cA)) 
     { 
      sIgnoreBuff_A.push (cA); 
      if (jA < iLenA) 
       cA = sA.charCodeAt (jA++); 
      else 
       break; 
     } 
     while (bIgnoreForPrimarySort (cB)) 
     { 
      sIgnoreBuff_B.push (cB); 
      if (jB < iLenB) 
       cB = sB.charCodeAt (jB++); 
      else 
       break; 
     } 

     /*--- Have we reached the end of one or both strings, ending on an ignore char? 
      The strings were equal, up to that point. 
      If one of the strings is NOT an ignore char, while the other is, it wins. 
     */ 
     if (bIgnoreForPrimarySort (cA)) 
     { 
      if (! bIgnoreForPrimarySort (cB)) return -1; 
     } 
     else if (bIgnoreForPrimarySort (cB)) 
     { 
      return 1; 
     } 
     else 
     { 
      if (aSortOrder[cA] > aSortOrder[cB]) 
       return 1; 

      if (aSortOrder[cA] < aSortOrder[cB]) 
       return -1; 

      //--- We are equal, so far, on the main chars. Where there ignore chars? 
      var iBuffSort = iSortIgnoreBuff(); 
      if (iBuffSort) return iBuffSort; 

      //--- Still here? Reset the ignore arrays. 
      sIgnoreBuff_A = []; 
      sIgnoreBuff_B = []; 
     } 

    } //-- while (jA < iLenA && jB < iLenB) 

    /*--- We have gone through all of at least one string and they are still both 
     equal barring ignore chars or unequal lengths. 
    */ 
    var iBuffSort = iSortIgnoreBuff(); 
    if (iBuffSort) return iBuffSort; 

    //--- All else equal, longest string loses 
    if (iLenA > iLenB)  return 1; 
    if (iLenA < iLenB)  return -1; 

    return 0; 

} //-- function SortByRoughSQL_Latin1_General_CP1_CS_AS 

测试:

var aPhrases = [ 
        'Grails Found', 
        '--Exhibit Visitors', 
        '-Exhibit Visitors', 
        'Exhibit Visitors', 
        'Calls Received', 
        'Ëxhibit Visitors', 
        'Brochures distributed', 
        'exhibit visitors', 
        'bags of Garbage', 
        '^&$Grails Found', 
        '?test' 
       ]; 

aPhrases.sort (SortByRoughSQL_Latin1_General_CP1_CS_AS); 

console.log (aPhrases.join ('\n')); 

结果:

?test 
^&$Grails Found 
bags of Garbage 
Brochures distributed 
Calls Received 
exhibit visitors 
Exhibit Visitors 
-Exhibit Visitors 
--Exhibit Visitors 
Ëxhibit Visitors 
Grails Found 
+0

我已验证服务器排序规则设置为:SQL_Latin1_General_CP1_CI_AS,我将调查您的方法以查看它是如何排除的。顺便说一句,我认为我的赏金有点便宜......如果这样做,我会允许它在接受你的答案之前到期,这样我就可以给你一个更高的奖赏(看起来公平/合理吗?) – DanP 2010-07-16 16:18:29

+0

@ DanP:不要担心赏金(除非你没有得到满意的答案)。我喜欢积分,但我也会做这些事情来帮助和挑战 - 而不是像数独或填字游戏。 – 2010-07-16 22:37:48

+0

这对我来说很好! – Patricia 2010-07-19 20:33:34

2

对不起,JavaScript没有整理功能。唯一的字符串比较是直接在String中的UTF-16代码单元,由charCodeAt()返回。

对于基本多语言平面中的字符,这与二进制排序规则相同,所以如果您需要JS和SQL Server来同意(无论如何忽略星体平面),我认为这是您要做的唯一方法它。 (建设JS字符串和核对,精心复制SQL Server的排序规则,反正短,不是很好玩那里。)

(有什么用的情况下,为什么他们需要匹配?)

+1

感谢您的洞察力;用例很简单 - 我从sql server发回已排序的数据,并在表中具有客户端排序功能。当他们不同意时,我在分页等时遇到问题。 – DanP 2010-07-11 01:14:41

2

@BrockAdams' answer是伟大的,但我有几个优势的情况下,在未与SQL服务器匹配的字符串中的连字符,我不能完全弄清楚是去哪儿错了,所以我写了一个更多的功能版本,只是过滤掉被忽略的字符,然后比较基于拉丁代码点的数组。

它可能性能较差,但代码要理解的更少,它适用于我在下面添加的SQL测试用例的匹配项。

我正在使用SQL Server数据库与Latin1_General_100_CI_AS,所以它是不区分大小写的,但我保持这里的代码区分大小写,很容易切换到不区分大小写的检查,通过创建一个包装函数将toLowerCase应用于变量。

这两个排序规则与我的测试用例之间的排序没有区别。

/** 
 
* This is a modified version of sortByRoughSQL_Latin1_General_CP1_CS_AS 
 
* This has a more functional approach, it is more basic 
 
* It simply does a character filter and then sort 
 
* @link https://stackoverflow.com/a/3266430/327074 
 
* 
 
* @param {String} a 
 
* @param {String} b 
 
* @returns {Number} -1,0,1 
 
*/ 
 
function latinSqlSort(a, b) { 
 
    'use strict'; 
 
    //--- This is the master lookup table for Latin1 code-points. 
 
    // Here through the extended set \u02AF 
 
    var latinLookup = [ 
 
     -1,151,152,153,154,155,156,157,158, 2, 3, 4, 5, 6,159,160,161,162,163,164, 
 
     165,166,167,168,169,170,171,172,173,174,175,176, 0, 7, 8, 9, 10, 11, 12,210, 
 
     13, 14, 15, 41, 16,211, 17, 18, 65, 69, 71, 74, 76, 77, 80, 81, 82, 83, 19, 20, 
 
     42, 43, 44, 21, 22,214,257,266,284,308,347,352,376,387,419,427,438,459,466,486, 
 
     529,534,538,559,576,595,636,641,647,650,661, 23, 24, 25, 26, 27, 28,213,255,265, 
 
     283,307,346,350,374,385,418,426,436,458,464,485,528,533,536,558,575,594,635,640, 
 
     646,648,660, 29, 30, 31, 32,177,178,179,180,181,182,183,184,185,186,187,188,189, 
 
     190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209, 
 
      1, 33, 53, 54, 55, 56, 34, 57, 35, 58,215, 46, 59,212, 60, 36, 61, 45, 72, 75, 
 
     37, 62, 63, 64, 38, 70,487, 47, 66, 67, 68, 39,219,217,221,231,223,233,250,276, 
 
     312,310,316,318,392,390,395,397,295,472,491,489,493,503,495, 48,511,599,597,601, 
 
     603,652,590,573,218,216,220,230,222,232,249,275,311,309,315,317,391,389,394,396, 
 
     294,471,490,488,492,502,494, 49,510,598,596,600,602,651,589,655,229,228,227,226, 
 
     235,234,268,267,272,271,270,269,274,273,286,285,290,287,324,323,322,321,314,313, 
 
     326,325,320,319,358,357,362,361,356,355,364,363,378,377,380,379,405,404,403,402, 
 
     401,400,407,406,393,388,417,416,421,420,432,431,428,440,439,447,446,444,443,442, 
 
     441,450,449,468,467,474,473,470,469,477,484,483,501,500,499,498,507,506,527,526, 
 
     540,539,544,543,542,541,561,560,563,562,567,566,565,564,580,579,578,577,593,592, 
 
     611,610,609,608,607,606,613,612,617,616,615,614,643,642,654,653,656,663,662,665, 
 
     664,667,666,574,258,260,262,261,264,263,281,278,277,304,292,289,288,297,335,337, 
 
     332,348,349,369,371,382,415,409,434,433,448,451,462,476,479,509,521,520,524,523, 
 
     531,530,552,572,571,569,570,583,582,581,585,632,631,634,638,658,657,669,668,673, 
 
     677,676,678, 73, 79, 78,680,644, 50, 51, 52, 40,303,302,301,457,456,455,482,481, 
 
     480,225,224,399,398,497,496,605,604,626,625,620,619,624,623,622,621,334,241,240, 
 
     237,236,254,253,366,365,360,359,430,429,505,504,515,514,675,674,422,300,299,298, 
 
     354,353, 84, 85, 86, 87,239,238,252,251,513,512,243,242,245,244,328,327,330,329, 
 
     411,410,413,412,517,516,519,518,547,546,549,548,628,627,630,629, 88, 89, 90, 91, 
 
     92, 93, 94, 95, 96, 97, 98, 99,100,101,102,103,104,105,106,107,108,109,110,111, 
 
     112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131, 
 
     132,133,134,135,136,137,138,139,140,141,142,143,246,247,248,259,279,280,293,291, 
 
     339,336,338,331,340,341,342,423,367,373,351,370,372,383,381,384,408,414,386,445, 
 
     453,452,454,461,463,460,475,478,465,508,522,525,532,550,553,554,555,545,556,557, 
 
     537,551,568,333,424,343,344,586,584,618,633,637,639,645,659,649,670,671,672,679, 
 
     681,682,683,282,686,256,345,368,375,425,435,437,535,684,685,305,296,306,591,587, 
 
     588,144,145,146,147,148,149,150 
 
    ]; 
 

 
    /** 
 
    * A bunch of characters get ignored for the primary sort weight. 
 
    * The most important ones are the hyphen and apostrophe characters. 
 
    * A bunch of control characters and a couple of odds and ends, make up 
 
    * the rest. 
 
    * 
 
    * @param {Number} 
 
    * @returns {Boolean} 
 
    * @link https://stackoverflow.com/a/3266430/327074 
 
    */ 
 
    function ignoreForPrimarySort(iCharCode) { 
 
     if (iCharCode < 9) { 
 
      return true; 
 
     } 
 

 
     if (iCharCode >= 14 && iCharCode <= 31) { 
 
      return true; 
 
     } 
 

 
     if (iCharCode >= 127 && iCharCode <= 159) { 
 
      return true; 
 
     } 
 

 
     if (iCharCode == 39 || iCharCode == 45 || iCharCode == 173) { 
 
      return true; 
 
     } 
 

 
     return false; 
 
    } 
 

 
    // normal sort 
 
    function compare(a, b) { 
 
     return a === b ? 0 : a > b ? 1 : -1; 
 
    } 
 

 
    // compare two arrays return first compare difference 
 
    function arrayCompare(a, b) { 
 
     return a.reduce(function (acc, x, i) { 
 
      return acc === 0 && i < b.length ? compare(x, b[i]) : acc; 
 
     }, 0); 
 
    } 
 

 
    /** 
 
    * convert a string to array of latin code point ordering 
 
    * @param {String} x 
 
    * @returns {Array} integer array 
 
    */ 
 
    function toLatinOrder(x) { 
 
     return x.split('') 
 
      // convert to char codes 
 
      .map(function(x){return x.charCodeAt(0);}) 
 
      // filter out ignored characters 
 
      .filter(function(x){return !ignoreForPrimarySort(x);}) 
 
      // convert to latin order 
 
      .map(function(x){return latinLookup[x];}); 
 
    } 
 

 
    // convert inputs 
 
    var charA = toLatinOrder(a), 
 
     charB = toLatinOrder(b); 
 

 
    // compare the arrays 
 
    var charsCompare = arrayCompare(charA, charB); 
 
    if (charsCompare !== 0) { 
 
     return charsCompare; 
 
    } 
 

 
    // fallback to the filtered array length 
 
    var charsLenCompare = compare(charA.length, charB.length); 
 
    if (charsLenCompare !== 0) { 
 
     return charsLenCompare; 
 
    } 
 

 
    // Final fallback to a basic length comparison 
 
    return compare(a.length, b.length); 
 
} 
 

 
var tests = [ 
 
    'Grails Found', 
 
    '--Exhibit Visitors', 
 
    '-Exhibit Visitors', 
 
    'Exhibit Visitors', 
 
    'Calls Received', 
 
    'Ëxhibit Visitors', 
 
    'Brochures distributed', 
 
    'exhibit visitors', 
 
    'bags of Garbage', 
 
    '^&$Grails Found', 
 
    '?test', 
 
    '612C-520', 
 
    '612-C-122', 
 
    '612C-122 I', 
 
    '612-C-126 L', 
 
    '612C-301 B', 
 
    '612C-304 B', 
 
    '612C-306', 
 
    '612-C-306', 
 
    '612-C-306 2', 
 
    '612-C-403 H', 
 
    '612C403 O', 
 
    '612-C-403(V)', 
 
    '612E-306A/B I', 
 
    '612E-306A/B O', 
 
    '612C-121 O', 
 
    '612C-111 B', 
 
    '- -612C-111 B' 
 
].sort(latinSqlSort).join('<br>'); 
 

 
document.write(tests);

+0

不确定' - -612C-111 B'值是否正确排序,但总的来说这个答案似乎很好(现在不想重新审视这个问题)。 – 2018-02-03 18:35:06

+1

@BrockAdams这实际上是把我拖下这个兔子洞的案例之一。我检查过对SQL Server - 这是一个[SQL小提琴](http://sqlfiddle.com/#!18/3195a/2)的排序。 – icc97 2018-02-03 18:54:24