代用钥匙与自然钥匙:性能差异的硬数字?[英] surrogate vs natural key: hard numbers on performance differences?

本文是小编为大家收集整理的关于代用钥匙与自然钥匙:性能差异的硬数字?的处理/解决方法,可以参考本文帮助大家快速定位并解决问题,中文翻译不准确的可切换到English标签页查看源文。

问题描述

代理和天然钥匙之间存在一个健康的辩论:

so post 1

SO POST 2

我的看法似乎符合多数(这是苗条的多数),您应该使用替代键,除非天然钥匙完全明显并且保证不会改变.然后,您应该在天然钥匙上执行唯一性.这意味着几乎所有时间都意味着代孕键.

这两种方法的示例,从公司表开始:

1:替代键:表具有一个ID字段,即PK(和一个身份).公司名称必须按州独特,因此那里有一个独特的约束.

2:自然键:表将CompanyName和State用作PK - 满足PK和独特性.

假设公司PK在其他10个表中使用.我的假设没有数字来支持它,是替代关键方法在这里的速度要快得多.

我对自然钥匙的唯一令人信服的论点是,将两个外国钥匙用作天然钥匙的许多桌子.我认为在这种情况下,这是有道理的.但是,如果您需要重构,您可能会陷入麻烦;我认为这是不超出这篇文章的范围.

有人在一组使用替代键 vs的表上比较了 的文章天然键?环顾四周,Google并没有产生任何值得的东西,只是很多理论制作.


重要的更新:我已经开始构建一个回答此问题的测试表.看起来这样:

  • partnatural-使用的零件表 独特的党派作为PK
  • parturogate-零件表 将ID(INT,身份)用作PK和 在党派上有独特的索引
  • 植物-ID(INT,身份)为PK
  • 工程师 - id(int,身份)为pk

每个部分都连接到工厂,并且在工厂的每个零件实例都与工程师相连.如果有人对此测试床有问题,那么现在是时候了.

推荐答案

两者都使用!天然密钥可以防止数据库损坏(不一致可能是一个更好的词).当"右"天然钥匙(消除重复行)会由于长度或涉及的列数而表现不佳,出于性能目的,也可以添加一个替代键,以用作其他表中的外国钥匙,而不是其他表中的外国钥匙天然钥匙...但是自然键应作为替代键或唯一索引,以防止数据损坏和强制执行数据库一致性...

大部分的hoohah(在本期的"辩论"中)可能是由于一个错误的假设 - 您必须使用 primary键 对于其他桌子上的加入和外国钥匙.这是错误的.您可以将任何键用作其他表中外键的目标.只要它在目标关系中是唯一的(表),它可以是主要键,替代键或任何唯一的索引或唯一约束.至于加入,您可以将任何东西都用于联接条件,甚至不必是钥匙,索引,甚至是唯一的! (尽管它不是唯一的,但您会在其创建的笛卡尔产品中获得多行).您甚至可以使用非特异性标准(例如>,<或"喜欢"作为联接条件.

)创建一个联接.

的确,您可以使用对布尔值进行评估的任何有效的SQL表达式创建加入.

其他推荐答案

天然键与价值的替代键不同,而不是类型.

任何类型都可以用于替代键,例如系统生成的slug或其他类型.

但是,大多数用于替代密钥的使用类型是INTEGER和RAW(16)(或您的RDBMS的任何类型都用于GUID's),

比较替代整数和天然整数(例如SSN)的时间完全相同.

比较VARCHAR s让整理考虑,它们通常比整数更长,从而使它们效率降低.

比较一组两个INTEGER的效率可能也比比较单个INTEGER.

在数据类型上的大小很小,此差异可能是percents 获取页面所需时间的 percents 获取页面,遍历索引,呼吸数据库闩锁等所需的时间.

这是数字(MySQL中):

CREATE TABLE aint (id INT NOT NULL PRIMARY KEY, value VARCHAR(100));
CREATE TABLE adouble (id1 INT NOT NULL, id2 INT NOT NULL, value VARCHAR(100), PRIMARY KEY (id1, id2));
CREATE TABLE bint (id INT NOT NULL PRIMARY KEY, aid INT NOT NULL);
CREATE TABLE bdouble (id INT NOT NULL PRIMARY KEY, aid1 INT NOT NULL, aid2 INT NOT NULL);

INSERT
INTO    aint
SELECT  id, RPAD('', FLOOR(RAND(20090804) * 100), '*')
FROM    t_source;

INSERT
INTO    bint
SELECT  id, id
FROM    aint;

INSERT
INTO    adouble
SELECT  id, id, value
FROM    aint;

INSERT
INTO    bdouble
SELECT  id, id, id
FROM    aint;

SELECT  SUM(LENGTH(value))
FROM    bint b
JOIN    aint a
ON      a.id = b.aid;

SELECT  SUM(LENGTH(value))
FROM    bdouble b
JOIN    adouble a
ON      (a.id1, a.id2) = (b.aid1, b.aid2);

t_source只是一个带有1,000,000行的虚拟表.

aint和adouble,bint和bdouble包含完全相同的数据,除了aint具有整数为a PRIMARY KEY,而adouble有两个相同的整数./p>

在我的机器上,两个查询都运行14.5秒,+/- 0.1秒

性能差异(如果有)在波动范围内.

本文地址:https://www.itbaoku.cn/post/597673.html

问题描述

There's a healthy debate out there between surrogate and natural keys:

SO Post 1

SO Post 2

My opinion, which seems to be in line with the majority (it's a slim majority), is that you should use surrogate keys unless a natural key is completely obvious and guaranteed not to change. Then you should enforce uniqueness on the natural key. Which means surrogate keys almost all of the time.

Example of the two approaches, starting with a Company table:

1: Surrogate key: Table has an ID field which is the PK (and an identity). Company names are required to be unique by state, so there's a unique constraint there.

2: Natural key: Table uses CompanyName and State as the PK -- satisfies both the PK and uniqueness.

Let's say that the Company PK is used in 10 other tables. My hypothesis, with no numbers to back it up, is that the surrogate key approach would be much faster here.

The only convincing argument I've seen for natural key is for a many to many table that uses the two foreign keys as a natural key. I think in that case it makes sense. But you can get into trouble if you need to refactor; that's out of scope of this post I think.

Has anyone seen an article that compares performance differences on a set of tables that use surrogate keys vs. the same set of tables using natural keys? Looking around on SO and Google hasn't yielded anything worthwhile, just a lot of theorycrafting.


Important Update: I've started building a set of test tables that answer this question. It looks like this:

  • PartNatural - parts table that uses the unique PartNumber as a PK
  • PartSurrogate - parts table that uses an ID (int, identity) as PK and has a unique index on the PartNumber
  • Plant - ID (int, identity) as PK
  • Engineer - ID (int, identity) as PK

Every part is joined to a plant and every instance of a part at a plant is joined to an engineer. If anyone has an issue with this testbed, now's the time.

推荐答案

Use both! Natural Keys prevent database corruption (inconsistency might be a better word). When the "right" natural key, (to eliminate duplicate rows) would perform badly because of length, or number of columns involved, for performance purposes, a surrogate key can be added as well to be used as foreign keys in other tables instead of the natural key... But the natural key should remain as an alternate key or unique index to prevent data corruption and enforce database consistency...

Much of the hoohah (in the "debate" on this issue), may be due to what is a false assumption - that you have to use the Primary Key for joins and Foreign Keys in other tables. THIS IS FALSE. You can use ANY key as the target for foreign keys in other tables. It can be the Primary Key, an alternate Key, or any unique index or unique constraint., as long as it is unique in the target relation (table). And as for joins, you can use anything at all for a join condition, it doesn't even have to be a key, or an index, or even unique !! (although if it is not unique you will get multiple rows in the Cartesian product it creates). You can even create a join using non-specific criterion (like >, <, or "like" as the join condition.

Indeed, you can create a join using any valid SQL expression that evaluate to a boolean.

其他推荐答案

Natural keys differ from surrogate keys in value, not type.

Any type can be used for a surrogate key, like a VARCHAR for the system-generated slug or something else.

However, most used types for surrogate keys are INTEGER and RAW(16) (or whatever type your RDBMS does use for GUID's),

Comparing surrogate integers and natural integers (like SSN) takes exactly same time.

Comparing VARCHARs make take collation into account and they are generally longer than integers, that making them less efficient.

Comparing a set of two INTEGER is probably also less efficient than comparing a single INTEGER.

On datatypes small in size this difference is probably percents of percents of the time required to fetch pages, traverse indexes, acquite database latches etc.

And here are the numbers (in MySQL):

CREATE TABLE aint (id INT NOT NULL PRIMARY KEY, value VARCHAR(100));
CREATE TABLE adouble (id1 INT NOT NULL, id2 INT NOT NULL, value VARCHAR(100), PRIMARY KEY (id1, id2));
CREATE TABLE bint (id INT NOT NULL PRIMARY KEY, aid INT NOT NULL);
CREATE TABLE bdouble (id INT NOT NULL PRIMARY KEY, aid1 INT NOT NULL, aid2 INT NOT NULL);

INSERT
INTO    aint
SELECT  id, RPAD('', FLOOR(RAND(20090804) * 100), '*')
FROM    t_source;

INSERT
INTO    bint
SELECT  id, id
FROM    aint;

INSERT
INTO    adouble
SELECT  id, id, value
FROM    aint;

INSERT
INTO    bdouble
SELECT  id, id, id
FROM    aint;

SELECT  SUM(LENGTH(value))
FROM    bint b
JOIN    aint a
ON      a.id = b.aid;

SELECT  SUM(LENGTH(value))
FROM    bdouble b
JOIN    adouble a
ON      (a.id1, a.id2) = (b.aid1, b.aid2);

t_source is just a dummy table with 1,000,000 rows.

aint and adouble, bint and bdouble contain exactly same data, except that aint has an integer as a PRIMARY KEY, while adouble has a pair of two identical integers.

On my machine, both queries run for 14.5 seconds, +/- 0.1 second

Performance difference, if any, is within the fluctuations range.