2010-08-03 199 views
1

嘿,我有2个表有很多列,我想找到table1.somecolumn的值包含在table2.someothercolumn中的那些行。例如:检查一列值是否包含在另一列值(TSQL)中?

table1.somecolumn有史密斯,彼得
table2.someothercolumn有peter.smith

这应该是一个比赛,我怎么会做这样的搜索?

谢谢:)

回答

1

有根据几个可能的解决方案,正是你需要: 使用可以创建辅助表关键字存储每条记录

  1. 使用的辅助表存储关键字对每条记录或记录和现场。例如。 table_helper(id int主键,record_id int,keyword varchar),record_id - 链接到源表。在table1,table2的触发器中填充此表。查询通用行是table_helper与自身的简单交集。您可以为table1和table2创建一个助手或使用单独的表。
  2. 使用全文索引。
2

您可以尝试SOUNDEXDIFFERENCE函数来帮助匹配字符串文字。

实施例:

select difference('peter.green', 'Green, Peter') 

返回2,由此:

的整数返回是 字符在SOUNDEX值即 是相同的数目。从0到4的返回值范围为 :0表示弱或 不相似,并且4表示强 相似或相同的值。

请参阅SOUNDEXDIFFERENCE MSDN上的主题。

更新:

探测法&差异不能正常工作时的单词顺序考虑,但如果你已经安装了全文索引功能,您不需要创建使用这个词的索引打破和解析全文引擎的能力。假设你使用SQL Server 2008,下面的函数将返回标准化术语列表:

SELECT * FROM sys.dm_fts_parser('"Peter Green"', 1033, 0, 0) 

,通过它可以CROSS APPLY到您的查询的其余部分。

请参阅sys.dm_fts_parser主题&部分K.使用在FROM主题中应用以获取更多信息。

例子:(SQL Server企业2008年启用了全文搜索引擎)

if not OBJECT_ID('Names1', 'Table') is null drop table names1 
if not OBJECT_ID('Names2', 'Table') is null drop table names2 

create table Names1 
(
    id int identity(0, 1), 
    name nvarchar(128) 
) 
insert into Names1 (name) values ('Green, Peter') 
insert into Names1 (name) values ('Smith, Peter') 
insert into Names1 (name) values ('Aadland, Beverly') 
insert into Names1 (name) values ('Aalda, Mariann') 
insert into Names1 (name) values ('Aaliyah') 
insert into Names1 (name) values ('Aames, Angela') 
insert into Names1 (name) values ('Aames, Willie') 
insert into Names1 (name) values ('Aaron, Caroline') 
insert into Names1 (name) values ('Aaron, Quinton') 
insert into Names1 (name) values ('Aaron, Victor') 
insert into Names1 (name) values ('Abbay, Peter') 
insert into Names1 (name) values ('Abbott, Dorothy') 
insert into Names1 (name) values ('Abbott, Bruce') 
insert into Names1 (name) values ('Abbott, Bud') 
insert into Names1 (name) values ('Abbott, Philip') 
insert into Names1 (name) values ('Abdoo, Rose') 
insert into Names1 (name) values ('Abdul, Paula') 
insert into Names1 (name) values ('Abel, Jake') 
insert into Names1 (name) values ('Abel, Walter') 
insert into Names1 (name) values ('Abeles, Edward') 
insert into Names1 (name) values ('Abell, Tim') 
insert into Names1 (name) values ('Aber, Chuck') 

create table Names2 
(
    id int identity(200, 1), 
    name nvarchar(128) 
) 
insert into Names2 (name) values (LOWER('Peter.Green')) 
insert into Names2 (name) values (LOWER('Peter.Smith')) 
insert into names2 (name) values (LOWER('Beverly.Aadland')) 
insert into names2 (name) values (LOWER('Mariann.Aalda')) 
insert into names2 (name) values (LOWER('Aaliyah')) 
insert into names2 (name) values (LOWER('Angela.Aames')) 
insert into names2 (name) values (LOWER('Willie.Aames')) 
insert into names2 (name) values (LOWER('Caroline.Aaron')) 
insert into names2 (name) values (LOWER('Quinton.Aaron')) 
insert into names2 (name) values (LOWER('Victor.Aaron')) 
insert into names2 (name) values (LOWER('Peter.Abbay')) 
insert into names2 (name) values (LOWER('Dorothy.Abbott')) 
insert into names2 (name) values (LOWER('Bruce.Abbott')) 
insert into names2 (name) values (LOWER('Bud.Abbott')) 
insert into names2 (name) values (LOWER('Philip.Abbott')) 
insert into names2 (name) values (LOWER('Rose.Abdoo')) 
insert into names2 (name) values (LOWER('Paula.Abdul')) 
insert into names2 (name) values (LOWER('Jake.Abel')) 
insert into names2 (name) values (LOWER('Walter.Abel')) 
insert into names2 (name) values (LOWER('Edward.Abeles')) 
insert into names2 (name) values (LOWER('Tim.Abell')) 
insert into names2 (name) values (LOWER('Chuck.Aber')); 

with ftsNamesFirst (id, term) as 
(
    select id, terms.display_term 
     from names1 cross apply sys.dm_fts_parser('"' + name + '"', 1033, 0, 0) terms 
), ftsNamesSecond (id, term) as 
(
select id, terms.display_term 
     from names2 cross apply sys.dm_fts_parser('"' + name + '"', 1033, 0, 0) terms 
) 
select * from 
(
    select 
    ROW_NUMBER() over (partition by nfirst.id order by sum(DIFFERENCE(ftsNamesFirst.term, ftsNamesSecond.term)) desc) ranking, 
    sum(DIFFERENCE(ftsNamesFirst.term, ftsNamesSecond.term)) Confidence, 
    nFirst.id Names1ID, 
    nFirst.name Names1Name, 
    nSecond.id Names2ID, 
    nSecond.name Names2Name 
    from 
    ftsNamesFirst cross join ftsNamesSecond 
    left outer join names1 nFirst on nFirst.id = ftsNamesFirst.id 
    left outer join names2 nSecond on nSecond.id = ftsNamesSecond.id 
    where DIFFERENCE(ftsNamesFirst.term, ftsNamesSecond.term) = 4 
    group by 
     nFirst.id, nFirst.name, nSecond.id, nSecond.name 
) MatchedNames 
where ranking = 1 

输出:

凡与置信度最高的匹配优先(所有其他人都被过滤掉使用窗口排名查询)。

Confidence Names1ID Names1Name Names2ID Names2Name 
8 0 Green, Peter 200 peter.green 
8 1 Smith, Peter 201 peter.smith 
8 2 Aadland, Beverly 202 beverly.aadland 
8 3 Aalda, Mariann 203 mariann.aalda 
4 4 Aaliyah 204 aaliyah 
8 5 Aames, Angela 205 angela.aames 
8 6 Aames, Willie 206 willie.aames 

这并不完美,但这是一个很好的起点,从它可以调整以提高成功概率。

相关问题