当比较dm()
输出I使用下面的函数,以允许模糊性的另一水平。直接检查dm('smith') != dm('schmitt')
失败了大量的名字,包括我自己的常见拼写错误。
该函数创建一个0.0到1.0之间的匹配权重(我希望),它允许我对每个返回的行进行排名,并选择好处,0.3对于捕捉奇怪的发音是一个相当不错的值,0.5是比较平常的。
即 dmcompare(dm("boothroyd"), dm("boofreed")) = 0.3
dmcompare(dm("smith"), dm("scmitt")) = 0.5
请注意,这是双音位字符串和不原串,这是性能问题的比较,我的数据库包含了音位列以及原始字符串。
CREATE FUNCTION `dmcompare`(leftValue VARCHAR(55), rightValue VARCHAR(55))
RETURNS DECIMAL(2,1)
NO SQL
BEGIN
---------------------------------------------------------------------------------------
-- Compare two (double) metaphone strings for potential similarlity, i.e.
-- dm("smith") != dm("schmitt") :: "SM0;XMT" != "XMT;SMT"
-- dmcompare(dm('smith'), dm('schmitt') returns 0,5
-- @author: P.Boothroyd
-- @version: 0.9, 08/01/2013
-- The values here can still be played with
-- (c) GNU P L - feel free to share and adapt, but please acknowledge the original code
---------------------------------------------------------------------------------------
DECLARE leftPri, leftSec, rightPri, rightSec VARCHAR(55) DEFAULT '';
DECLARE sepPos INT;
DECLARE retValue DECIMAL(2,1);
DECLARE partMatch BOOLEAN;
-- Extract the metaphone tags
SET sepPos = LOCATE(";", leftValue);
IF sepPos = 0 THEN
SET sepPos = LENGTH(leftValue) + 1;
END IF;
SET leftPri = LEFT(leftValue, sepPos - 1);
SET leftSec = MID(leftValue, sepPos + 1, LENGTH(leftValue) - sepPos);
SET sepPos = LOCATE(";", rightValue);
IF sepPos = 0 THEN
SET sepPos = LENGTH(rightValue) + 1;
END IF;
SET rightPri = LEFT(rightValue, sepPos - 1);
SET rightSec = MID(rightValue, sepPos + 1, LENGTH(rightValue) - sepPos);
-- Calculate likeness factor
SET retValue = 0;
SET partMatch = FALSE;
-- Primaries equal 50% match
IF leftPri = rightPri THEN
SET retValue = retValue + 0.5;
SET partMatch = TRUE;
ELSE
IF SOUNDEX(leftPri) = SOUNDEX(rightPri) THEN
SET retValue = retValue + 0.3;
SET partMatch = TRUE;
END IF;
END IF;
-- Test alternate primary and secondaries, worth 30% match
IF leftSec = rightPri THEN
SET retValue = retValue + 0.3;
SET partMatch = TRUE;
IF SOUNDEX(leftSec) = SOUNDEX(rightPri) THEN
SET retValue = retValue + 0.2;
SET partMatch = TRUE;
END IF;
END IF;
-- Test alternate primary and secondaries, worth 30% match
IF leftPri = rightSec THEN
SET retValue = retValue + 0.3;
SET partMatch = TRUE;
IF SOUNDEX(leftPri) = SOUNDEX(rightSec) THEN
SET retValue = retValue + 0.2;
SET partMatch = TRUE;
END IF;
END IF;
-- Are secondary values the same or both NULL
IF leftSec = rightSec THEN
-- No secondaries ...
IF leftSec = '' THEN
-- If there is prior matching then no secondaries is 40%
IF partMatch = TRUE THEN
SET retValue = retValue + 0.4;
END IF;
ELSE
-- If the secondaries match then 50% match
SET retValue = retValue + 0.5;
END IF;
ELSE
IF SOUNDEX(leftSec) = SOUNDEX(rightSec) THEN
IF leftSec = '' THEN
IF partMatch = TRUE THEN
SET retValue = retValue + 0.3;
END IF;
END IF;
END IF;
END IF;
RETURN (retValue);
END
请随时个代码中使用,也请注明来源为这个代码P.Boothroyd任何用途 - 即改变价值观念等
干杯,保罗
链接被破坏。 – 2015-11-16 20:39:06
MySQL(和Python)代码现在位于GitHub上:https://github.com/AtomBoy/double-metaphone – Andrew 2016-05-23 20:30:30