数据建模。超类型/次类型[英] Data Modeling: Supertype / Subtype

本文是小编为大家收集整理的关于数据建模。超类型/次类型的处理/解决方法,可以参考本文帮助大家快速定位并解决问题,中文翻译不准确的可切换到English标签页查看源文。

问题描述

希望找出适当的方式来建模以下要求.

  1. 有3种类型的"聚会",一个粉丝,乐队和乐队成员.
  2. 乐队成员将始终与乐队相关联,也可以成为任何乐队的粉丝.
  3. 风扇,乐队和乐队成员之间都有共同的属性,但是这3个也将具有自己的独特属性.
  4. 粉丝可以是任何乐队的粉丝,也不是根本不喜欢

这是更大思想的一小部分,但它在扩展模型时会引起混乱.我相信它必须是图2或其他选项,因为我看不出第一个模型中的乐队成员如何与乐队相关联.

我感谢任何输入.

 alt text

 alt text

推荐答案

警告

  1. 首先要有一些警告,以理解局限性.所有使用或存储的数据都需要一起考虑/建模.例如.无论如何,您都发现了"在扩展模型中产生混乱".从我的身边,不知道Parties(子类型)与其他实体有何关系,限制了我提供了完全正确的答案(这不会改变).

    由于您将数据提供两个分组,因此答案将分为两个分组,第二个批次将需要更改第一个模型.不是一个投诉,只是事先建议您,因为如果我在前看到所有数据,就可以避免.

  2. 您非常感谢(a)对数据进行建模以及(b)进行概念,逻辑,然后是物理的科学(已记录了30多年)的必要性.那正是我要做的.您无法想象正式过程节省的时间和精力.

    • 但是,这在我的答案中并没有遇到,因为这是一个"问答"网站,我必须在发问者的层面上回答. (我比其他人更充分地回答问题,甚至会引起负面评论!).请放心,我要经过正式序列.
  3. 必须提到的是,关系建模是行业中巨人的作品,1980年代的E F Codd博士和R Brown.该方法基于他们的工作. IDEF1X于1993年成为NIST标准.当我回答数据建模问题时,我没有提供一些我写过一本书的个人方法,我站在巨人的肩膀上.

    • 我遵守关系模型

    • 我拒绝日期的工作;达尔文; fagin;等,因为它与关系模型相矛盾,所以它是反关系的.

    • erd(p chen)是经过统一的,idef1x和原始的.它没有标识符的概念,它无法处理关系密钥.令人惊讶的是,ERD仍被教导为"关系",而IDEF1X被抑制了. 30年.

数据建模

  1. 是的,在这里,Supertype-Subtype结构是正确的.不幸的是,它并不常见,因此通常不了解.
  • 亚型在关系模型之前就存在,并继续存在. IDEF1X是关系数据建模的唯一标准,它具有特定的符号. Erd一无所有.
  1. 分别,即使在实施亚型的情况下,它们也以非常有限的方式实施.亚型具有更改超构型的角色的作用.确实,正确的实施确实很少见,我还没有看到任何地方(当然我自己的数据库实施给客户,还有一些高端供应商).

  2. 点是,看起来"复杂",但实际上并不是很简单.

    这是肯·唐斯(Ken Downs)和克里斯·贝伦斯(Chris Behrens)将建模的简单性(高度扩展)与未建模的实现(不正确和不可扩展性)混淆,这是由于矮人(Martin Fowler)所建议的简单方法.没有犯罪,我知道人们对他们所知道的,并会捍卫他们所知道的,无论可能有什么限制.

    • 请注意,每个子类型也是一个完全有效的实体(当我们到达该阶段时表中的表格),并且可以自行站立.

    • 对于与这些亚型有关系的较低级别或交易或功能表,诀窍是使用正确的亚型(角色).常见的错误是他们使用Party,然后使用子类型或角色的含义,而正确参考完整性丢失了.

    • 分别从Party> 派生的所有rolenames都是从Party中得出的,但这不是使用Party而不是正确的角色的有效理由.

    • 在这里您非常了解数据,但是(没有人教给您这一点,并且您的角色和子类型混淆了.

    • BandMember和Fan不是Parties.它们是Persons,第一(和Person是Party,第二)

  3. 为了在这个概念层面上提供这些要点,我们需要与实体和标识符(不是属性)合作,而不仅仅是实体.因此,我也提供了.

    • 看来您有Erwin(最好!);它使您可以非常方便地在该级别查看单个模型.即使在此抽象层面上,也可以在实体中实现标识符.

障碍

我在介绍模型之前指出这些内容,因为您似乎对学习IDEF1X(用于建模关系数据库的标准方法),以便将来放松模型.因此,或任何网站都不是正规互动教育的好媒介,但我们将为我们提供最好的机会.

  1. 在模型(1)中,Band不能独立(方形角):由于它被确定为依赖于Party>;它是Party的亚型;它具有相同的标识符.

  2. 缺失的基数至关重要.将其放入实际上将有助于解决模型.我并没有大惊小怪的是IDEF1X(圆圈)vs ieee(乌鸦脚),但我总是将它们放进去,并随着模型的进展而继续更改.

    • 您的模型不显示Band由一对多 Members组成. et cetera.
      .
      虽然编程可以逐步进展(一旦定义稳定),但建模却不会.例如.您不能对实体进行建模,而不能建立关系;关系但不是基数.这就是为什么他们是不同的科学,程序员不制作好的建模者,反之亦然.
      .
  3. 在此阶段,规则也非常重要.实际上,建模是建模规则.因此,纠正或调制规则是建模过程的一部分.

    • 一个风扇可以成为任何频段的粉丝,也不是完全不合理的.如果Person根本没有 ,那么他们是公众的成员,并且与任何Band无关.普通Person.

    • a Fan与至少一个Band有关系.实际上,与Band建立关系是从该领域中取出Person的原因,并导致Fan的详细信息或带频段的细节.

    • 如果有一个实体,例如Fan with no Band(即.便宜!).

  4. 动词短语在此阶段也很重要.不少于我的观点RE规则和基数,它是建模过程的一部分,随着模型的进行,它需要更改/调制.您不会相信正确的动词短语是多么重要.将它们放在可能的情况下,可以帮助您澄清子类型与角色.这是每个数据模板都知道的定义.

    • 实体是模型中的名词

    • 关系是动词,在名词之间发生的动作

    • 动词短语定义了这些动作(这就是为什么它们被准确地称为动词短语,这不是一个有趣的名字).

  • a Person制作一对多Bands,因此是Member

  • a Band由一对多People组成,谁是Members

  • a Person光顾Band,这使它们成为Fan(不仅是Person谁在Fan表中有一排)

  • a Band取决于People,谁是Fans

提出最短,最有意义的动词短语;不使用简单的单词(要避免"包含"),对于建模者来说是一个挑战.请随时改善我提供的动词短语.

这是您的 在IDEF1X中的实体和密钥级别.

不熟悉建模关系数据库的读者可能会发现我的 idef1x符号 有用.

注意

我所做的就是解决子类型;角色;关系的基础性,如上所述.

  1. 当您评估第二个批次或交易实体时,子类型与角色的相关性将更加清晰(如您所述,在这里我们只有识别实体).

    .

  2. 标识符.这是值得注意的,不仅是为了澄清模型,还因为它是使用IDEF1X标识符的一个很好的例子,以及它们部署的功能.您已经在模型中指出,您了解(实线),我只是给它提供完整的治疗.

    • Person和Band是Party的亚型.它们也是Party的角色.因此,我们从那以下是PersonId和BandId,而不是PartyId(即使是PartyId).

    • Person扮演Member的角色时,我们使用MemberId(PersonId,PartyId).

    • Person扮演Fan的角色时,我们使用FanId(PersonId,PartyId).

假设您列出了Band的Fans,您的查询以Fan为中心.如果您在每个表上都有这些Id替代密钥,则将被迫加入Person,然后加入Party.但是有了您拥有的关系标识符,您可以直接转到Party:

SELECT  ...,
        Name  -- Party.Name
        ...
    FROM Party
    JOIN Fan
        ON PartyId = FanId

并跳过Band之间的表格.是的,事实是,归一化的关系数据库需要更少的加入,更少的资源(处理,缓存,磁盘I/O),这就是它们执行得更好的原因之一.神话没有科学的基础.

请评估并提出具体问题.

更新

有关亚型的通用处理和真实SQL平台的实现详细信息,请参阅我的 subtype文档 .

其他推荐答案

我认为这比您想象的要简单.您有两个对象 - 乐队和人,它们可以通过两种不同的方式作为风扇或成员进行连接.这是一个没有外键或其他任何内容的Quickie DB脚本:

CREATE TABLE [dbo].[XREFBandMembers](
    [MemberID] [int] NOT NULL,
    [BandId] [int] NOT NULL,
 CONSTRAINT [PK_XREFBandMembers] PRIMARY KEY CLUSTERED 
(
    [MemberID] ASC,
    [BandId] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

CREATE TABLE [dbo].[XREFBandFans](
    [FanId] [int] NOT NULL,
    [BandId] [int] NOT NULL,
 CONSTRAINT [PK_XREFBandFans] PRIMARY KEY CLUSTERED 
(
    [FanId] ASC,
    [BandId] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

CREATE TABLE [dbo].[People](
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [Name] [nvarchar](100) NOT NULL,
 CONSTRAINT [PK_People] PRIMARY KEY CLUSTERED 
(
    [Id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

CREATE TABLE [dbo].[Bands](
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [Name] [nvarchar](50) NOT NULL,
 CONSTRAINT [PK_Bands] PRIMARY KEY CLUSTERED 
(
    [Id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

至于特定于关系的属性,您可以将它们放在Xref表中,例如,FanclubMembershipNumber在Xrefbandfans中.

本文地址:https://www.itbaoku.cn/post/597521.html

问题描述

Looking to figure out the proper way to model the below requirements.

  1. There are 3 types of “parties” to be concerned with, a Fan, a Band, and a BandMember.
  2. That BandMember will always be associated with a Band and can also be a Fan of any band.
  3. There are common attributes between a Fan, a Band, and a BandMember, but each of these 3 will also have their own unique attributes.
  4. A Fan can be a fan of of any Band or none at all

This is a small part of a bigger thought but it is creating confusion in expanding the model. I believe it would have to be diagram 2 or some other option since I don't see how a BandMember can be associated with a Band in the first model.

I appreciate any input.

alt text

alt text

推荐答案

Caveat

  1. First a couple of caveats for understanding the limitations. All the data being used or stored needs to be considered/modelled together. Eg. you have found that out anyway in your "creating confusion in expanding the model". From my side, not knowing how the Parties (subtypes) are related to other entities, limits me from provided a totally correct answer (that will not change).

    Since you are providing the data in two tranches, therefore the answer will be in two tranches, and the second tranche will require changes to the first model. Not a complaint, just advising you beforehand, as that could be avoided if I saw all the data up front.

  2. It is really great that you appreciate the need for (a) modelling the data and (b) going through the science (documented for over 30 years) of Conceptual, Logical, then physical. That is exactly what I do. You cannot imagine the time and effort that is saved by the formal process.

    • However, that does not come across in my answers at SO, because it is a "question and answer" site, I have to answer at the level of the questioner. (I answer questions more fully than others, and even that, causes negative commentary !). Be assured that I go through the formal sequence.
  3. It must be mentioned that Relation Modelling is the work of giants in the industry, Dr E F Codd and R Brown in the 1980's. The Methodology is based on their work. IDEF1X became a NIST Standard in 1993. When I answer Data Modelling questions, I am not supplying some personal method that I wrote a book about, I stand on the shoulders of giants.

    • I adhere to the Relational Model by Dr E F Codd.

    • I reject the work of Date; Darwen; Fagin; etc, because it contradicts the Relational Model, it is anti-Relational.

    • ERD (P Chen) is pre-relational, pre-IDEF1X, and primitive in comparison. It has not concept of the Identifier, it cannot handle a Relational Key. It is staggering that ERD is still being taught as "relational", and IDEF1X is suppressed. For 30 years.

Data Modelling

  1. Yes, a Supertype-Subtype structure is correct here. Unfortunately it is not common and therefore not commonly understood.
  • Subtypes existed long before the Relational Model, and continues to exist. IDEF1X, the only Standard for Relational Data Modelling, has specific symbols for it. ERD has nothing.
  1. Separately, even where Subtypes are implemented, they are implemented in very limited fashion. Subtypes have the effect of changing the Role of the Supertype. The correct implementation of that is very rare indeed, I have not seen this anywhere (except of course my own database implementations for customers, and a few high-end suppliers).

  2. Point is, that may look "complex" but it isn't, it is actually very simple.

    This is where Ken Downs and Chris Behrens confuse modelled simplicity (highly extensible) with unmodelled implementation (incorrect and un-extensible), due to the simplistic approach advised by dwarves such as Martin Fowler. No offence, I understand that people are attached to, and will defend, what they know, however limited as that may be.

    • notice that each Subtype is also a perfectly valid Entity (Table in the Physical, when we get to that stage) in its own right, and can stand on its own.

    • for the lower levels or transaction or function tables, which have Relations to these Subtypes, the trick is to use the correct Subtype (Role). The common mistake is they use Party, and then the meaning of the Subtype or Role, and the correct Referential Integrity is lost.

    • separately all the RoleNames are derived from Party, but that is not a valid reason to useParty instead of the correct Role.

    • Here you understand the data really well, but (no one has taught you this and) you have confused Roles and Subtypes.

    • BandMember and Fan are not Parties. They are Persons, first (and Person is a Party, second)

  3. In order to provide clarity re those points, at this Conceptual level, we need to work with Entities and Identifiers (not Attributes), rather than just Entities. Therefore I have provided that as well.

    • It appears you have ERwin (the best!); it allows you to view the single model at that level very conveniently. Do implement the Identifiers in the Entities, even at this abstract level.

Obstacle

I point these out before presenting the model, because you appear to be seriously interested in learning IDEF1X, the Standard methodolgy for modelling Relational Databases, with a view to easing your models in the future. SO or any website, is not a good medium for formal interactive education, but we will give it our best shot.

  1. In model (1), Band cannot be Independent (square corners): since it is identified as being Dependent on Party; it is a Subtype of Party; and it has the same Identifier.

  2. The missing Cardinality is critical. Putting it in will actually assist in resolving the model. I am not fussed re IDEF1X (circles) vs IEEE (crows feet), but I always put them in as I go, and keep changing them as the model progresses.

    • your model does not show that a Band is made up of one-to-many Members. Et cetera.
      .
      While programming can progress incrementally (once the definition is stable)), modelling does not. Eg. you can't model the Entities but not the Relations; the Relations but not the Cardinality. That is why they are different sciences, programmers do not make good modellers, and vice versa.
      .
  3. At this stage the Rules are also very important. Modelling is, in fact, modelling the Rules. Therefore correcting or modulating the Rules is part of the Modelling process.

    • A Fan can be a fan of of any Band or none at all is not reasonable. If a Person is none at all then they are a member of the general public, and they have no Relation to any Band. An ordinary Person.

    • A Fan has a Relation to at least one Band. In fact, having a Relation to a Band is what takes a Person out of that realm and causes storage of Fan details or specific fan-of-band details.

    • If there are is such an Entity as Fan with no Band (ie. you are storing details of that, separate to Fan as per my model), please advise, and I will change the model (paper is cheap!).

  4. Verb Phrases are also important at this stage; no less than my point re Rules and Cardinality above, it is a part of the modelling process, and it needs change/modulation as the model progresses. You will not believe how important it is to gett the Verb Phrases right. Putting them in may well have assisted you in clarifying Subtypes vs Roles. Here is a definition that every Data Modeller knows by heart.

    • The Entities are the Nouns in the model

    • The Relations are the Verbs, the actions that take place between the Nouns

    • The Verb Phrases define those actions (that's why they are accurately called Verb Phrases, it is not a funny name).

As described in the IDEF1X Notation document, for Associative tables, read the Verb Phrase "through" them, to the parent on the other side of the association.

  • A Person makes one-to-many Bands, and is thus a Member

  • A Band is made up of one-to-many People, who are Members

  • A Person patronises a Band, which makes them a Fan (not merely a Person who has a row in the Fan table)

  • A Band depends on People, who are Fans

Coming up with the shortest, most meaningful, Verb Phrase; not using simplistic words ("comprises" is to be avoided"), is a challenge for Modellers. Feel free to ameliorate the Verb phrases I supplied.

Here is your Party Data Model at the Entity and Key levels in IDEF1X.

Readers who are unfamiliar with the Standard for Modelling Relational Databases may find my IDEF1X Notation useful.

Note

All I have done is resolve the Subtypes; Roles; the Cardinality of the Relations, as identified above.

  1. The relevance of the Subtypes vs Roles will be more clear to you when you evaluate your second tranche or your transaction Entities (as you have stated, here we have only Identifying Entities).

  2. Identifiers. This is worth spelling out, not only for the purpose of clarifying the model, also because it is a good example of IDEF1X Identifiers being used, and the power they deploy. You have already indicated in your model that you understand that (solid lines), I am merely giving it the full treatment.

    • Person and Band are Subtypes of Party. They are also Roles of Party. Therefore we use PersonId and BandId from that point downward, not PartyId (even though it is PartyId).

    • When a Person plays the Role of Member, we use MemberId (which is PersonId, which is PartyId).

    • When a Person plays the Role of Fan, we use FanId (which is PersonId, which is PartyId).

Let's say you were listing the Fans of a Band, your query is centred on Fan. If you had those Id Surrogate Keys on every table, you would be forced to join Person, then join Party. But with the Relational Identifiers you have, you can go directly to Party:

SELECT  ...,
        Name  -- Party.Name
        ...
    FROM Party
    JOIN Fan
        ON PartyId = FanId

and skip the Band table in-between. Yes, the truth is, Normalised Relational Database require fewer joins, less resources (processing, cache, disk I/O), and that is one reason why they perform so much better. The myth has no scientific basis.

Please evaluate and ask specific questions.

Update

For a generic treatment of Subtypes, and implementation details for genuine SQL platforms, refer to my Subtype Document.

其他推荐答案

I think this is simpler than you think. You've got two objects - Band and Person, and they can be connected in two different ways, either as a fan or a member. Here is a quickie db script with no foreign keys or anything:

CREATE TABLE [dbo].[XREFBandMembers](
    [MemberID] [int] NOT NULL,
    [BandId] [int] NOT NULL,
 CONSTRAINT [PK_XREFBandMembers] PRIMARY KEY CLUSTERED 
(
    [MemberID] ASC,
    [BandId] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

CREATE TABLE [dbo].[XREFBandFans](
    [FanId] [int] NOT NULL,
    [BandId] [int] NOT NULL,
 CONSTRAINT [PK_XREFBandFans] PRIMARY KEY CLUSTERED 
(
    [FanId] ASC,
    [BandId] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

CREATE TABLE [dbo].[People](
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [Name] [nvarchar](100) NOT NULL,
 CONSTRAINT [PK_People] PRIMARY KEY CLUSTERED 
(
    [Id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

CREATE TABLE [dbo].[Bands](
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [Name] [nvarchar](50) NOT NULL,
 CONSTRAINT [PK_Bands] PRIMARY KEY CLUSTERED 
(
    [Id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

As for the relationship specific attributes, you can place them in the XREF tables, e.g., FanClubMembershipNumber goes in XREFBandFans.