6NF中参考完整性的复合键与代理键的比较[英] Composite vs Surrogate keys for Referential Integrity in 6NF

本文是小编为大家收集整理的关于6NF中参考完整性的复合键与代理键的比较的处理/解决方法,可以参考本文帮助大家快速定位并解决问题,中文翻译不准确的可切换到English标签页查看源文。

问题描述

获取三层信息:

第1层:信息

该层包含具有UNIQUE天然索引和替代键的数据,可容易转移.

Table Surnames:

+-----------------------------+--------------+
|    ID (Auto Increment, PK)  |    Surname   |
+-----------------------------+--------------+
|               1             |     Smith    |
|               2             |    Edwards   |
|               3             |     Brown    |
+-----------------------------+--------------+

Table FirstNames

+-----------------------------+--------------+
|   ID (Auto Increment, PK)   |   FirstName  |
+-----------------------------+--------------+
|               1             |     John     |
|               2             |     Bob      |
|               3             |     Mary     |
|               4             |     Kate     |
+-----------------------------+--------------+

天然键

另外,如Mike Sherrill所解释的那样,上面的两个表可以没有ID,并将姓氏和名称作为天然主键.在这种情况下,假设以下参考varchar而不是int.

第2层:人

在此层中使用了复合索引.该值可以是UNIQUE或PRIMARY,具体取决于是否使用替代密钥作为主要键.

+-----------------+--------------+
|    FirstName    |    LastName  |
+-----------------+--------------+
|        1        |       2      |
|        1        |       3      |
|        2        |       3      |
|        3        |       1      |
|        4        |       2      |
|       ...       |      ...     |
+-----------------+--------------+

第3层:父母

在此层中,通过ParentsOf表探索了人们之间的关系.

ParentsOf

+-----------------+-----------------+
|      Person     |   PersonParent  |
+-----------------+-----------------+

 OR

+-----------------+-----------------+-----------------+-----------------+
| PersonFirstName |  PersonSurname  | ParentFirstName |  ParentSurname  |
+-----------------+-----------------+-----------------+-----------------+

问题

假设参考完整性对我的核心非常重要,并且我将在这些索引上拥有FOREIGN KEYS,以便我让数据库负责监视其自身的完整性,如果我要去使用一个ORM,它将像 inctrine 具有对复合主键的本机支持...

请帮助我了解:

  • 使用代理密钥与第一层的天然键进行的权衡列表.

  • 使用复合键与第二层的代理密钥进行的权衡列表,该键可以转移到第三层.

我对听觉不感兴趣,这更好,因为我知道专业人士在这个话题上存在重大分歧,这将引发宗教战争.取而代之的是,我非常简单地和客观地要求人类的可能性,通过将代孕键传递到维护主键(天然/复合或替代/复合材料)的每一层与维护的代孕键,您将采取哪些权衡.任何人都可以找到有人说 从不 或 始终 在SO和其他网站上使用替代密钥.相反,我最喜欢您的答案的合理分析.

编辑:已经指出,姓氏示例是使用6NF的一个糟糕示例.为了完整地保持问题,我要离开它.如果您在想象此用例中遇到困难,那么更好的列表可能是"杂货项目"的列表.又名:

+-----------------------------+--------------+
|   ID (Auto Increment, PK)   |   Grocery    |
+-----------------------------+--------------+
|               1             | Sponges      |
|               2             | Tomato Soup  |
|               3             | Ice Cream    |
|               4             | Lemons       |
|               5             | Strawberries |
|               6             | Whipped Cream|
+-----------------------------+--------------+

+-----------------------------+--------------+
|   ID (Auto Increment, PK)   |   Brand      |
+-----------------------------+--------------+
|               1             | Bright       |
|               2             | Ben & Jerry's|
|               3             | Store Brand  |
|               4             | Campbell's   |
|               5             | Cool Whip    |
+-----------------------------+--------------+    

天然复合关键示例:

+-----------------------------+--------------+
|           Grocery           |   Brand      |
+-----------------------------+--------------+
|           Sponges           | Bright       |
|           Ice Cream         | Ben & Jerry's|
|           Ice Cream         | Store Brand  |
|           Tomato Soup       | Campbell's   |
|           Tomato Soup       | Store Brand  |
|           Lemons            | Store Brand  |
|           Whipped Cream     | Cool Whip    |
+-----------------------------+--------------+ 

推荐配对

+-----------------+-----------------+-----------------+-----------------+
|     Grocery1     |  Brand1        | Grocery2        |  Brand2         |
+-----------------+-----------------+-----------------+-----------------+

重申,这也是一个例子.这不是我建议继续进行的方式,但应该有助于说明我的问题.

这种方法缺口.我要重申,这个问题是要求浏览以下每种方法的好处和缺点,而不是要强调一个比另一种更好.我相信大多数人能够超越这个特定示例的可疑性,以回答核心问题.此编辑是针对那些不能的.

下面有一些很好的答案,如果您对要走哪个方向感到好奇,请阅读它们.

结束编辑

谢谢!

推荐答案

以下是一些权衡:

单个替代(人为创建):

  • 所有子表外键都只需要一个列来引用主键.

  • 非常容易更新表中的天然键,而无需使用外国钥匙更新每个子桌

  • 较小的主/外键索引(即,不是宽),这可以使数据库运行速度,例如,当在父表中删除记录时,需要搜索子表以确保这样做不会创建孤儿.狭窄的索引更快地扫描(只是视力).

  • 您将拥有更多索引,因为您很可能还希望索引数据中存在的任何天然键.

天然复合键盘表:

  • 数据库中的索引

  • 数据库中的列

  • 更容易/更快地插入大量记录,因为您无需抓取序列生成器

  • 更新化合物中的一个键,要求每个子表都可以更新.

然后还有另一个类别:人工复合主键

我只找到一个实例,这很有意义.当您需要标记每个表格中的每个记录以进行行级别安全.

例如,假设您有一个数据库,该数据库为50,000个客户端存储数据,并且每个客户端不应该看到其他客户端的数据 - 在Web应用程序开发中非常普遍.

如果每个记录都用client_id字段标记,则您正在创建一个行级安全环境.大多数数据库都有正确设置时强制执行行级安全性的工具.

第一件事是设置初级和外键.通常,具有id字段的表作为主键.通过添加client_id,密钥现在是复合键.而且有必要将client_id携带到所有子桌上.

复合键基于2个替代键,是确保客户端和数据库中整体中数据完整性的防弹方法.

之后,您将创建视图(或者使用Oracle EE设置虚拟私有数据库)和其他各种结构,以允许数据库执行行级别安全性(这是其拥有的主题).

鉴于该数据结构不再标准化为n度.每个pk/fk中的client_id字段构成原本正常模型.该模型的好处是在数据库级别上强制执行行级别安全性(这是数据库应该做什么).每个选择,插入,更新,删除都仅限于当前设置会话的任何client_id.数据库具有会话意识.

摘要

代理密钥始终是安全的赌注.他们需要更多的工作来进行设置,需要更多的存储空间.

我认为最大的好处是:

  • 能够在一个表中更新PK,并且所有其他子表都不会瞬间更改而不会被触摸.

  • 当数据弄乱时 - 在某个时候,由于编程错误,替代键使清洁变得更加容易,在某些情况下只能做到,因为有代孕键.

  • 查询性能得到改进,因为数据库能够搜索属性以找到s.key,然后通过单个数字键加入所有子表.

天然钥匙尤其是复合nkeys使编写代码变得痛苦.当您需要加入4张表时,与使用单键相比," WHERE子句"将更长(更容易弄乱).

代理密钥是"安全"路线.天然钥匙在几个地方是有益的,我会说大约1%的桌子.

其他推荐答案

首先,您的第二层至少可以表达四种不同的方式,它们都与您的问题相关.在下面,我使用的是伪-SQL,主要使用PostgreSQL语法.无论结构如何,某些类型的查询都需要递归和多个额外的索引,因此我不会再说再说了.使用支持聚类索引的DBM在这里可能会影响某些决策,但不要以为群集索引上的六个连接要比简单地读取单个索引索引的值要快.测试,测试,测试.

第二,第一层的权衡实际上并不多.外国键可以以完全相同的方式引用声明not null unique的列来引用primary key的列.替代钥匙将表的宽度增加4个字节;对于大多数但不是全部数据库应用程序来说都是微不足道的.

第三,正确的外键和独特的约束将在这四个设计中保持参考完整性. (但请参见下面的"关于级联".)

a.外国钥匙替代密钥

create table people (
  FirstName integer not null
    references FirstNames (ID),
  LastName integer not null
    references Surnames (ID),
  primary key (FirstName, LastName)
);

b.天然键的外键

create table people (
  FirstName varchar(n) not null
    references FirstNames (FirstName),
  LastName varchar(n) not null
    references Surnames (Surname),
  primary key (FirstName, Surname)
);

c.外国钥匙替代钥匙,附加替代键

create table people (
  ID serial primary key,
  FirstName integer not null
    references FirstNames (ID),
  LastName integer not null
    references Surnames (ID),
  unique (FirstName, LastName)
);

d.外国天然钥匙的钥匙,附加替代键

create table people (
  ID serial primary key,
  FirstName varchar(n) not null
    references FirstNames (FirstName),
  LastName varchar(n) not null
    references Surnames (Surname),
  unique (FirstName, Surname)
);

现在让我们看一下桌子的父母.

a.外国钥匙在A上方

中代孕键
create table ParentsOf (
  PersonFirstName integer not null,
  PersonSurname integer not null,
  foreign key (PersonFirstName, PersonSurname)
    references people (FirstName, LastName),

  ParentFirstName integer not null,
  ParentSurname integer not null,
  foreign key (ParentFirstName, ParentSurname)
    references people (FirstName, LastName),

  primary key (PersonFirstName, PersonSurname, ParentFirstName, ParentSurname)
);

要检索给定行的名称,您需要四个连接.您可以直接加入" firstNames"和"姓氏"表;您无需通过加入 "人"表来获取名字.

b.

的外国天然键的外国钥匙
create table ParentsOf (
  PersonFirstName varchar(n) not null,
  PersonSurname varchar(n) not null,
  foreign key (PersonFirstName, PersonSurname)
    references people (FirstName, LastName),

  ParentFirstName varchar(n) not null,
  ParentSurname varchar(n) not null,
  foreign key (ParentFirstName, ParentSurname)
    references people (FirstName, LastName),

  primary key (PersonFirstName, PersonSurname, ParentFirstName, ParentSurname)
);

此设计需要零连接才能检索给定行的名称.许多SQL平台根本不需要读取表,因为它们可以从主键上的索引获取所有数据.

c.外国量替代键,C中的附加替代密钥,上方

create table ParentsOf (
  Person integer not null
    references People (ID),
  PersonParent integer not null
    references People (ID),
  primary key (Person, PersonParent)
);

要检索名称,您必须通过"人"表加入.您总共需要六个加入.

d.外国天然钥匙的钥匙,d上方的其他代理密钥

此设计具有与上面C的C中相同的结构.因为D中的"人"表在上面较远,具有自然键引用表" firstNames"和" surnames",所以您只需要两个加入桌子"人"才能获取名称.

关于ORMS

orms不会像SQL开发人员写SQL那样构建SQL.如果SQL开发人员编写了需要六个连接以获取名称的选择语句,则ORM可能会执行七个简单的查询以获取相同的数据.这可能是一个问题;可能不是.

关于级联

替代ID号使每个外国密钥参考成为隐式,未宣布的"更新级联".例如,如果您在姓氏表上运行此更新语句. . .

update surnames
set surname = 'Smythe'
where surname = 'Smith';

那么所有的史密斯人都会变成史密斯.防止这种情况的唯一方法是撤销"姓氏"的更新权限.隐性,未宣布的"在更新级联"并不总是一件好事.撤销权限仅仅是为了防止不必要的隐性"级联"并不总是一件好事.

其他推荐答案

使用天然密钥可以启用更简单,更快的查询,因为一个人不需要一直加入国外钥匙链即可找到"自然"值,例如,例如.在屏幕上显示.

本文地址:https://www.itbaoku.cn/post/597612.html

问题描述

Take three layers of information:

Layer 1: Information

This layer contains data with UNIQUE natural indexes and a surrogate key that is easily transferrable.

Table Surnames:

+-----------------------------+--------------+
|    ID (Auto Increment, PK)  |    Surname   |
+-----------------------------+--------------+
|               1             |     Smith    |
|               2             |    Edwards   |
|               3             |     Brown    |
+-----------------------------+--------------+

Table FirstNames

+-----------------------------+--------------+
|   ID (Auto Increment, PK)   |   FirstName  |
+-----------------------------+--------------+
|               1             |     John     |
|               2             |     Bob      |
|               3             |     Mary     |
|               4             |     Kate     |
+-----------------------------+--------------+

Natural Keys

Alternatively, the two tables above can be without ID and utilize Surname and FirstName as Natural Primary Keys, as explained by Mike Sherrill. In this instance, assume the layer below references varchar rather than int.

Layer 2: People

In this layer a composite index is used. This value can be UNIQUE or PRIMARY, depending on whether a surrogate key is utilized as the Primary Key.

+-----------------+--------------+
|    FirstName    |    LastName  |
+-----------------+--------------+
|        1        |       2      |
|        1        |       3      |
|        2        |       3      |
|        3        |       1      |
|        4        |       2      |
|       ...       |      ...     |
+-----------------+--------------+

Layer 3: Parents

In this layer, relationships between people are explored through a ParentsOf table.

ParentsOf

+-----------------+-----------------+
|      Person     |   PersonParent  |
+-----------------+-----------------+

 OR

+-----------------+-----------------+-----------------+-----------------+
| PersonFirstName |  PersonSurname  | ParentFirstName |  ParentSurname  |
+-----------------+-----------------+-----------------+-----------------+

The Question

Assuming that referential integrity is VERY important to me at its very core, and I will have FOREIGN KEYS on these indexes so that I keep the database responsible for monitoring its own integrity on this front, and that, if I were to use an ORM, it would be one like Doctrine which has native support for Compound Primary Keys...

Please help me to understand:

  • The list of trade-offs that take place with utilizing surrogate keys vs. natural keys on the 1st Layer.

  • The list of trade-offs that take place with utilizing compound keys vs. surrogate keys on the 2nd Layer which can be transferred over to the 3rd Layer.

I am not interested in hearing which is better, because I understand that there are significant disagreements among professionals on this topic and it would be sparking a religious war. Instead, I am asking, very simply and as objectively as is humanly possible, what trade-offs will you be taking by passing surrogate keys to each Layer vs maintaining Primary keys (natural/composite, or surrogate/composite). Anyone will be able to find someone saying NEVER or ALWAYS use surrogate keys on SO and other websites. Instead, a reasoned analyses of trade-offs is what I will most appreciate in your answers.

EDIT: It has been pointed out that a surname example is a poor example for a use of 6NF. For the sake of keeping the question intact, I am going to leave it be. If you are having trouble imagining the use case for this, a better one might be a list of "Grocery Items". AKA:

+-----------------------------+--------------+
|   ID (Auto Increment, PK)   |   Grocery    |
+-----------------------------+--------------+
|               1             | Sponges      |
|               2             | Tomato Soup  |
|               3             | Ice Cream    |
|               4             | Lemons       |
|               5             | Strawberries |
|               6             | Whipped Cream|
+-----------------------------+--------------+

+-----------------------------+--------------+
|   ID (Auto Increment, PK)   |   Brand      |
+-----------------------------+--------------+
|               1             | Bright       |
|               2             | Ben & Jerry's|
|               3             | Store Brand  |
|               4             | Campbell's   |
|               5             | Cool Whip    |
+-----------------------------+--------------+    

Natural Composite Key Example:

+-----------------------------+--------------+
|           Grocery           |   Brand      |
+-----------------------------+--------------+
|           Sponges           | Bright       |
|           Ice Cream         | Ben & Jerry's|
|           Ice Cream         | Store Brand  |
|           Tomato Soup       | Campbell's   |
|           Tomato Soup       | Store Brand  |
|           Lemons            | Store Brand  |
|           Whipped Cream     | Cool Whip    |
+-----------------------------+--------------+ 

Recommended Pairings

+-----------------+-----------------+-----------------+-----------------+
|     Grocery1     |  Brand1        | Grocery2        |  Brand2         |
+-----------------+-----------------+-----------------+-----------------+

To reiterate, this is also just an example. This is not how I would recommend proceeding, but it should help to illustrate my question.

There ARE shortfalls to this method. I'll reiterate that this question was to request walking through the benefits and drawbacks of each method below, not to highlight one as better than another. I believe most people were able to look past the questionable nature of this specific example to answer the core question. This edit is for those that cannot.

There are some very good answers below and if you are curious about which direction to go, please read them.

END EDIT

Thank you!

推荐答案

Here's some trade-offs:

Single Surrogate (artificially created):

  • All child tables foreign keys only need a single column to reference the primary key.

  • Very easy to update the natural keys in table without needing to update every child table with foreign keys

  • Smaller primary/foreign key indexes (ie. not a wide) This can make the database run faster, for example when a record is deleted in a parent table, the child tables need to be searched to make sure this will not create orphans. Narrow indexes are faster to scan (just sightly).

  • you will have more indexes because you most likely will also want to index whatever natural keys exists in the data.

Natural composite keyed tables:

  • fewer indexes in the database

  • less columns in the database

  • easier/faster to insert a ton of records as you will not need to grab the sequence generator

  • updating one of the keys in the compound requires that every child table also be updated.

Then there is another category: artificial composite primary keys

I've only found one instance where this makes sense. When you need to tag every record in every table for row level security.

For example, suppose you had an database which stored data for 50,000 clients and each client was not supposed to see other client's data--very common in web application development.

If each record was tagged with a client_id field, you are creating a row level security environment. Most databases have the tools to enforce row level security when setup correctly.

First thing to do is setup primary and foreign keys. Normally a table with have an id field as the primary key. By adding client_id the key is now composite key. And it is necessary to carry client_id to all child table.

The composite key is based on 2 surrogate keys and is a bulletproof way to ensure data integrity among clients and within the database a whole.

After this you would create views (or if using Oracle EE setup Virtual Private Database) and other various structures to allow the database to enforce row level security (which is a topic all it own).

Granted that this data structure is no longer normalized to the nth degree. The client_id field in each pk/fk denormalizes an otherwise normal model. The benefit of the model is the ease of enforcing row level security at the database level (which is what databases should do). Every select, insert, update, delete is restricted to whatever client_id your session is currently set. The database has session awareness.

Summary

Surrogate keys are always the safe bet. They require a little more work to setup and require more storage.

The biggest benefit in my opinion is:

  • Being able to update the PK in one table and all other child tables are instantaneously changed without ever being touched.

  • When data gets messed up--and it will at some point due to a programming mistake, surrogate keys make the clean up much much easier and in some cases only possible to do because there are surrogate keys.

  • Query performance is improved as the db is able to search attributes to locate the s.key and then join all child table by a single numeric key.

Natural Keys especially composite NKeys make writing code a pain. When you need to join 4 tables the "where clause" will be much longer (and easier to mess up) than when single SKeys were used.

Surrogate keys are the "safe" route. Natural keys are beneficial in a few places, I'd say around 1% of the tables in a db.

其他推荐答案

First of all, your second layer can be expressed at least four different ways, and they're all relevant to your question. Below I'm using pseudo-SQL, mainly with PostgreSQL syntax. Certain kinds of queries will require recursion and more than one additional index regardless of the structure, so I won't say any more about that. Using a dbms that supports clustered indexes can affect some decisions here, but don't assume that six joins on clustered indexes will be faster than simply reading values from a single, covering index; test, test, test.

Second, there really aren't many tradeoffs at the first layer. Foreign keys can reference a column declared not null unique in exactly the same way they can reference a column declared primary key. The surrogate key increases the width of the table by 4 bytes; that's trivial for most, but not all, database applications.

Third, correct foreign keys and unique constraints will maintain referential integrity in all four of these designs. (But see below, "About Cascades".)

A. Foreign keys to surrogate keys

create table people (
  FirstName integer not null
    references FirstNames (ID),
  LastName integer not null
    references Surnames (ID),
  primary key (FirstName, LastName)
);

B. Foreign keys to natural keys

create table people (
  FirstName varchar(n) not null
    references FirstNames (FirstName),
  LastName varchar(n) not null
    references Surnames (Surname),
  primary key (FirstName, Surname)
);

C. Foreign keys to surrogate keys, additional surrogate key

create table people (
  ID serial primary key,
  FirstName integer not null
    references FirstNames (ID),
  LastName integer not null
    references Surnames (ID),
  unique (FirstName, LastName)
);

D. Foreign keys to natural keys, additional surrogate key

create table people (
  ID serial primary key,
  FirstName varchar(n) not null
    references FirstNames (FirstName),
  LastName varchar(n) not null
    references Surnames (Surname),
  unique (FirstName, Surname)
);

Now let's look at the ParentsOf table.

A. Foreign keys to surrogate keys in A, above

create table ParentsOf (
  PersonFirstName integer not null,
  PersonSurname integer not null,
  foreign key (PersonFirstName, PersonSurname)
    references people (FirstName, LastName),

  ParentFirstName integer not null,
  ParentSurname integer not null,
  foreign key (ParentFirstName, ParentSurname)
    references people (FirstName, LastName),

  primary key (PersonFirstName, PersonSurname, ParentFirstName, ParentSurname)
);

To retrieve the names for a given row, you'll need four joins. You can join directly to the "FirstNames" and "Surnames" tables; you don't need to join through the "People" table to get the names.

B. Foreign keys to natural keys in B, above

create table ParentsOf (
  PersonFirstName varchar(n) not null,
  PersonSurname varchar(n) not null,
  foreign key (PersonFirstName, PersonSurname)
    references people (FirstName, LastName),

  ParentFirstName varchar(n) not null,
  ParentSurname varchar(n) not null,
  foreign key (ParentFirstName, ParentSurname)
    references people (FirstName, LastName),

  primary key (PersonFirstName, PersonSurname, ParentFirstName, ParentSurname)
);

This design needs zero joins to retrieve the names for a given row. Many SQL platforms won't need to read the table at all, because they can get all the data from the index on the primary key.

C. Foreign keys to surrogate keys, additional surrogate key in C, above

create table ParentsOf (
  Person integer not null
    references People (ID),
  PersonParent integer not null
    references People (ID),
  primary key (Person, PersonParent)
);

To retrieve names, you must join through the "people" table. You'll need a total of six joins.

D. Foreign keys to natural keys, additional surrogate key in D, above

This design has the same structure as in C immediately above. Because the "people" table in D, farther above, has natural keys referencing the tables "FirstNames" and "Surnames", you'll only need two joins to the table "people" to get the names.

About ORMs

ORMs don't build SQL the way a SQL developer writes SQL. If a SQL developer writes a SELECT statement that needs six joins to get the names, an ORM is liable to execute seven simpler queries to get the same data. This might be a problem; it might not.

About Cascades

Surrogate ID numbers make every foreign key reference an implicit, undeclared "ON UPDATE CASCADE". For example, if you run this update statement against your table of surnames . . .

update surnames
set surname = 'Smythe'
where surname = 'Smith';

then all the Smiths will become Smythes. The only way to prevent that is to revoke update permissions on "surnames". Implicit, undeclared "ON UPDATE CASCADE" is not always a Good Thing. Revoking permissions solely to prevent unwanted implicit "cascades" is not always a Good Thing.

其他推荐答案

Using natural keys can enable simpler, faster queries since one needn't join all the way up the foreign key chain to find the "natural" value e.g. for display on-screen.