规范化使得连接多个表很难

我有一个包含商店名称和地址的商店表。经过一番讨论后，我们现在正在对表格进行规范化，将地址放在单独的表格中。这是出于两个原因：规范化使得连接多个表很难

提高搜索速度，存储由位置/地址
提高执行时间检查导入存储时，使用Levenshtein algorithm拼错的街道名称。

新的结构看起来像这样（忽略错别字）：

country; 
+--------------------+--------------+------+-----+---------+-------+ 
| Field    | Type   | Null | Key | Default | Extra | 
+--------------------+--------------+------+-----+---------+-------+ 
| id     | varchar(2) | NO | PRI | NULL |  | 
| name    | varchar(45) | NO |  | NULL |  | 
| prefix    | varchar(5) | NO |  | NULL |  | 
+--------------------+--------------+------+-----+---------+-------+ 

city; 
+--------------------+--------------+------+-----+---------+-------+ 
| Field    | Type   | Null | Key | Default | Extra | 
+--------------------+--------------+------+-----+---------+-------+ 
| id     | int(11)  | NO | PRI | NULL |  | 
| city    | varchar(50) | NO |  | NULL |  | 
+--------------------+--------------+------+-----+---------+-------+ 

street; 
+--------------------+--------------+------+-----+---------+-------+ 
| Field    | Type   | Null | Key | Default | Extra | 
+--------------------+--------------+------+-----+---------+-------+ 
| id     | int(11)  | NO | PRI | NULL |  | 
| street    | varchar(50) | YES |  | NULL |  | 
| fk_cityID   | int(11)  | NO |  | NULL |  | 
+--------------------+--------------+------+-----+---------+-------+ 

address; 
+--------------------+--------------+------+-----+---------+-------+ 
| Field    | Type   | Null | Key | Default | Extra | 
+--------------------+--------------+------+-----+---------+-------+ 
| id     | int(11)  | NO | PRI | NULL |  | 
| streetNum   | varchar(10) | NO |  | NULL |  | 
| street2   | varchar(50) | NO |  | NULL |  | 
| zipcode   | varchar(10) | NO |  | NULL |  | 
| fk_streetID  | int(11)  | NO |  | NULL |  | 
| fk_countryID  | int(11)  | NO |  | NULL |  | 
+--------------------+--------------+------+-----+---------+-------+ 
*street2 is for secondary reference or secondary address in e.g. the US. 

store; 
+--------------------+--------------+------+-----+---------+-------+ 
| Field    | Type   | Null | Key | Default | Extra | 
+--------------------+--------------+------+-----+---------+-------+ 
| id     | int(11)  | NO | PRI | NULL |  | 
| name    | varchar(50) | YES |  | NULL |  | 
| street    | varchar(50) | YES |  | NULL |  |  
| fk_addressID  | int(11)  | NO |  | NULL |  | 
+--------------------+--------------+------+-----+---------+-------+ 
*I've left out address columns in this table to shorten code

新表已填入正确的数据，剩下的唯一的事情就是添加外键address.id在store表。

下面的代码列出了所有街道名称正确：

select a.id, b.street, a.street2, a.zipcode, c.city, a.fk_countryID 
from address a 
left join street b on a.fk_streetID = b.id 
left join city c on b.fk_cityID = c.id

我如何更新store表fk_addressID？
如何列出所有商店地址正确？
考虑到上述原因，这是不是正常化？

UPDATE

好像下面的代码列出了正确的地址，所有的商店 - 但它是一个有点慢（我有大约2000店）：

select a.id, a.name, b.id, c.street 
from sl_store a, sl_address b, sl_street c 
where b.fk_streetID = c.id 
and a.street1 = c.street 
group by a.name 
order by a.id

来源

2011-11-22 Steven

我认为杰夫说过的东西“正常化，直到它伤害，反正常化，直到它的作品”在他的博客......所以我想这就是他们是如何做到的。 –

这是真的:)你有一个链接到他的博客？ – Steven

http://www.codinghorror.com/blog/2008/07/maybe-normalizing-isnt-normal.html –

我不会说拼写错误。由于您正在导入数据，所以在暂存表中可以更好地处理拼写错误。

让我们来看看这个稍微简化的版本。

create table stores 
(
    store_name varchar(50) primary key, 
    street_num varchar(10) not null, 
    street_name varchar(50) not null, 
    city varchar(50) not null, 
    state_code char(2) not null, 
    zip_code char(5) not null, 
    iso_country_code char(2) not null, 
    -- Depending on what kind of store you're talking about, you *could* have 
    -- two of them at the same address. If so, drop this constraint. 
    unique (street_num, street_name, city, state_code, zip_code, iso_country_code) 
); 

insert into stores values 
('Dairy Queen #212', '232', 'N 1st St SE', 'Castroville', 'CA', '95012', 'US'), 
('Dairy Queen #213', '177', 'Broadway Ave', 'Hartsdale', 'NY', '10530', 'US'), 
('Dairy Queen #214', '7640', 'Vermillion St', 'Seneca Falls', 'NY', '13148', 'US'), 
('Dairy Queen #215', '1014', 'Handy Rd',  'Olive Hill', 'KY', '41164', 'US'), 
('Dairy Mart #101', '145', 'N 1st St SE', 'Castroville', 'CA', '95012', 'US'), 
('Dairy Mart #121', '1042', 'Handy Rd',  'Olive Hill', 'KY', '41164', 'US');

虽然很多人坚信邮政编码决定美国的城市和州，但事实并非如此。邮政编码与运营商如何驾驶其路线有关，与地理位置无关。有些城市跨越国家之间的边界;单个邮政编码路线可以跨州线。即使是Wikipedia knows this，尽管它们的示例可能已过时。（配送路线不断改变。）

因此，我们必须有两个候选键的表，

{STORE_NAME}和
{street_num，慨，城市，STATE_CODE，ZIP_CODE，iso_country_code}

它没有非关键属性。我认为这张桌子是在5NF。你怎么看？

如果我想提高街道名称的数据完整性，我可能会从这样的事情开始。

create table street_names 
(
    street_name varchar(50) not null, 
    city varchar(50) not null, 
    state_code char(2) not null, 
    iso_country_code char(2) not null, 
    primary key (street_name, city, state_code, iso_country_code) 
); 

insert into street_names 
select distinct street_name, city, state_code, iso_country_code 
from stores; 

alter table stores 
add constraint streets_from_street_names 
foreign key    (street_name, city, state_code, iso_country_code) 
references street_names (street_name, city, state_code, iso_country_code); 
-- I don't cascade updates or deletes, because in my experience 
-- with addresses, that's almost never the right thing to do when a 
-- street name changes.

您可能（也可能应该）为城市名称，州名（州代码）和国家名称重复此过程。

一些问题，你的做法

可以明显地输入街道ID号街这是在美国，与国家ID克罗地亚一起。（一个城市的“全名”，可以这么说，就是您为了增加数据完整性而可能希望存储的那种事实）。这可能也适用于街道的“全名”。）

对每一位数据使用ID号大大增加了所需的连接数量。使用ID号码与标准化没有任何关系。在自然键上使用没有相应唯一约束的id号 - 一个完全常见的错误 - 允许重复数据。

来源

2011-12-07 07:23:04

感谢您的反馈意见。我们最终只使用了一个包含所有地址的表格。这是为了避免使用多个连接并加快搜索结果。使用例如拼写检查拼写错误的街道名称Levenshtein只有在添加新商店时才会完成。仅使用一个表格作为街道地址会产生重复，但不会太多，以至于无法对其进行索引并在搜索时给出快速结果。 – Steven

规范化使得连接多个表很难

回答

相关问题