过渡性依存关系有什么问题?[英] What is wrong with a transitive dependency?

本文是小编为大家收集整理的关于过渡性依存关系有什么问题?的处理/解决方法,可以参考本文帮助大家快速定位并解决问题,中文翻译不准确的可切换到English标签页查看源文。

问题描述

我的数据库设计中有一些传递依赖性.我的上司告诉我这些会导致错误.我发现很难找到资源,这些资源会告诉我如何拥有这些依赖性会导致错误.它们会引起什么样的问题?

我没有提出这一事实的争议,只是渴望了解它们会引起什么样的问题.

编辑以获取更多详细信息:

来自Wikipedia:

传递依赖性
传递依赖性是一种间接的函数依赖性,其中x→z仅通过x→y和y→z.

推荐答案

我将通过一个示例解释:

-------------------------------------------------------------------
|  Course  |    Field     |   Instructor   |  Instructor Phone    |
-------------------------------------------------------------------
|  English |  Languages   |  John Doe      |     0123456789       |
|  French  |  Languages   |  John Doe      |     0123456789       |
|  Drawing |  Art         |  Alan Smith    |     9856321158       |
|  PHP     |  Programming |  Camella Ford  |     2225558887       |
|  C++     |  Programming |  Camella Ford  |     2225558887       |
-------------------------------------------------------------------
  • 如果您有Course,则可以轻松地获得其Instructor Course->Instructor.
  • 如果您有一个Instructor,则无法获得他的Course,因为他可能正在教不同的课程.
  • 如果您有一个Instructor,您可以轻松获得他的Phone Instructor->Phone.

这意味着如果您有Course,则可以获得Instructor Phone,这意味着Course->Instructor Phone(即及时依赖性)

现在解决问题:

  1. 如果您同时删除French和English课程,那么您也会删除他们的教练John Doe,他的电话号码将永远丢失.
  2. 除非您首先为他添加Course,否则无法在数据库中添加新的Instructor,或者您可以在Instructors table中复制数据.
  3. 如果教练John Doe更改了他的电话号码,那么您将必须更新他教他的所有课程,并使用新信息,这可能非常容易出错.
  4. 除非您删除他教过的所有课程或将所有字段设置为null.
  5. 如果您决定保留教练的出生日期怎么办?您将必须在Courses表中添加Birth Date字段.这听起来甚至合乎逻辑吗?为什么首先将讲师信息保留在课程表中?

其他推荐答案

表达3NF的一种方法是:

所有属性都应取决于密钥,整个密钥,而除了键.

传递依赖性x-> y-> z违反了该原理,导致数据冗余和潜在的修改异常.


让我们分解:

  1. 按定义,对于a functional 依赖依赖项x-> y - > z也为 the the the x <-y必须 not 保持.
  2. 如果y是钥匙,则x <-y将保持,因此y不能成为钥匙. (footNote1)
  3. 由于y不是钥匙,因此可以在多行中重复任何给定的y.
  4. y-> z意味着所有保持相同y的行也必须保持相同的z.(footNote2)
  5. 在几行中重复相同(y,z)元组不会为系统提供任何有用的信息.它是冗余.
简而

冗余导致修改异常(例如,更新一些,但不是全部zs"连接"到同一y的全部本质上会损坏数据,因为您不再知道哪个副本是正确的).通常,通过将原始表分为两个表,一个包含{x,y}和另一个包含{y,z}的另一个表来解决这一问题,以这种方式,y可以是第二个表中的键,而不重复z./p>

另一方面,如果x <-y确实保留(即x-> y-> z不是传递的),则我们可以保留一个表,其中x和y都是键.在这种情况下,Z不会不必要地重复.

(footNote1)键是一组(最小)属性集,在功能上确定关系中的所有属性.理由:如果k是键,则不能有相同值的多行,因此k的任何给定值始终与其他每个属性的一个值(假设1NF)的一个值相关.根据定义(请参阅脚注2),"与一个依赖性相关"是相同的.

(footNote2)定义,y-> z,仅当时,只有当时才是,每个y值与一个z值相关联.


示例:

假设每条消息完全有一个作者,每个作者都有一个主要的电子邮件,试图在同一表中表示消息和用户将导致重复电子邮件:

MESSAGE                         USER    EMAIL
-------                         ----    -----
Hello.                          Jon     jon@gmail.com
Hi, how are you?                Rob     rob@gmail.com
Doing fine, thanks for asking.  Jon     jon@gmail.com

(实际上,这些是MESSAGE_ID s,但让我们在这里保持简单.)

现在,如果乔恩决定将电子邮件更改为" jon2@gmail.com",会发生什么?我们需要更新乔恩行的.如果我们只更新一个,那么我们就会有以下情况...

MESSAGE                         USER    EMAIL
-------                         ----    -----
Hello.                          Jon     jon2@gmail.com
Hi, how are you?                Rob     rob@gmail.com
Doing fine, thanks for asking.  Jon     jon@gmail.com

...而且我们不再知道乔恩的电子邮件中的哪一封正确.我们本质上丢失了数据!

情况尤其糟糕,因为没有声明性的约束我们可以用来强迫DBMS为我们执行这两个更新.客户端代码 will 有错误,并且可能没有太多考虑在并发环境中可能发生的复杂交互.

但是,如果您拆分表...

MESSAGE                         USER
-------                         ----
Hello.                          Jon 
Hi, how are you?                Rob 
Doing fine, thanks for asking.  Jon 

USER    EMAIL
----    -----
Jon     jon@gmail.com
Rob     rob@gmail.com

...现在只有一排了解乔恩的电子邮件,所以歧义是不可能的.

顺便说一句,所有这些都可以看作是 dry Princtiple "> dry Princtiple .

其他推荐答案

如果表中存在及其依赖性,则不符合3NF;因此,表中有很高的可能性.检查 this 以澄清这个概念.

本文地址:https://www.itbaoku.cn/post/597530.html

问题描述

I have some transitive dependencies in my database design. I have been told by my superiors that these can cause bugs. I am finding it difficult to find resources that will tell me how having these dependencies will cause bugs. What kind of problems will they cause?

I am not disputing the fact, just eager to learn what kind of problems they can cause.

Edit for more details:

From wikipedia :

Transitive dependency
A transitive dependency is an indirect functional dependency, one in which X→Z only by virtue of X→Y and Y→Z.

推荐答案

I'll explain by an example:

-------------------------------------------------------------------
|  Course  |    Field     |   Instructor   |  Instructor Phone    |
-------------------------------------------------------------------
|  English |  Languages   |  John Doe      |     0123456789       |
|  French  |  Languages   |  John Doe      |     0123456789       |
|  Drawing |  Art         |  Alan Smith    |     9856321158       |
|  PHP     |  Programming |  Camella Ford  |     2225558887       |
|  C++     |  Programming |  Camella Ford  |     2225558887       |
-------------------------------------------------------------------
  • If you have a Course you can easily get its Instructor so Course->Instructor.
  • If you have an Instructor you can't get his Course as he might be teaching different courses.
  • If you have an Instructor you can easily get his Phone so Instructor->Phone.

That means the if you have a Course then you can get the Instructor Phone which means Course->Instructor Phone (i.e. Transitive dependency)

Now for the problems:

  1. If you delete both the French and English courses then you will delete their instructor John Doe as well and his phone number will be lost forever.
  2. There is no way to add a new Instructor to your database unless you add a Course for him first, or you can duplicate the data in an Instructors table which is even worse.
  3. If Instructor John Doe changes his phone number then you will have to update all Courses that he teaches with the new info which can be very prone to mistakes.
  4. You can't delete an Instructor from your database unless you delete all the courses he teaches or set all his fields to null.
  5. What if you decide to keep the birth date of your instructors? You will have to add a Birth Date field to the Courses table. Does this even sound logical? Why keep an instructor information in the courses table in the first place?

其他推荐答案

One way to express the 3NF is:

All attributes should depend on the key, whole key and nothing but the key.

The transitive dependency X->Y->Z violates that principle, leading to data redundancy and potential modification anomalies.


Let us break this down:

  1. By definition, for a functional dependency X->Y->Z to also be transitive, the X<-Y must not hold.
  2. If Y was a key, the X<-Y would hold, so Y cannot be a key. (FOOTNOTE1)
  3. Since Y is not a key, any given Y can be repeated in multiple rows.
  4. The Y->Z implies that all rows holding the same Y must also hold the same Z. (FOOTNOTE2)
  5. Repeating the same (Y, Z) tuple in several rows does not contribute any useful information to the system. It is redundant.

In short, since Y is not a key and Y->Z, we have violated the 3NF.

Redundancies lead to modification anomalies (e.g. updating some but not all of the Zs "connected" to the same Y essentially corrupts the data, since you no longer know which copy is correct). This is typically resolved by splitting the original table into two tables, one containing {X, Y} and the other other containing {Y, Z}, This way, Y can be a key in the second table and Z is not repeated.

On the other hand, if the X<-Y does hold (i.e. X->Y->Z is not transitive), then we can retain a single table, where both X and Y are keys. Z won't be unnecessarily repeated in this scenario.

(FOOTNOTE1) A key is a (minimal) set of attributes that functionally determine all of the attributes in a relation. Rationale: If K is a key, there cannot be multiple rows with the same value of K, so any given value of K is always associated to precisely one value of every other attribute (assuming 1NF). By definition (see FOOTNOTE2), "being associated to precisely one" is the same thing as "being in a functional dependency".

(FOOTNOTE2) By definition, Y->Z if, and only if, each Y value is associated with precisely one Z value.


Example:

Assuming each message has exactly one author and each author has exactly one primary e-mail, attempting to represent messages and users in the same table would lead to repeating e-mails:

MESSAGE                         USER    EMAIL
-------                         ----    -----
Hello.                          Jon     jon@gmail.com
Hi, how are you?                Rob     rob@gmail.com
Doing fine, thanks for asking.  Jon     jon@gmail.com

(In reality, these would be MESSAGE_IDs, but let us keep things simple here.)

Now, what happens if Jon decides to change his e-mail to, say, "jon2@gmail.com"? We would need to update both of Jon's rows. If we only update one, then we have the following situation...

MESSAGE                         USER    EMAIL
-------                         ----    -----
Hello.                          Jon     jon2@gmail.com
Hi, how are you?                Rob     rob@gmail.com
Doing fine, thanks for asking.  Jon     jon@gmail.com

...and we no longer know which one of the Jon's e-mails is correct. We have essentially lost the data!

The situation is especially bad since there is no declarative constraint we could use to coerce the DBMS into enforcing both updates for us. The client code will have bugs and is probably written without much regard for complex interactions that can happen in the concurrent environment.

However, if you split the table...

MESSAGE                         USER
-------                         ----
Hello.                          Jon 
Hi, how are you?                Rob 
Doing fine, thanks for asking.  Jon 

USER    EMAIL
----    -----
Jon     jon@gmail.com
Rob     rob@gmail.com

...there is now only one row that knows about Jon's e-mail, so ambiguity is impossible.

BTW, all this can be viewed as just another expression of the DRY principle.

其他推荐答案

If there's transitive dependencies in your table then it's not compliant with 3NF; so there's a high probability that there is redundant data in your table. Check this to clarify this concept.