需要数据库结构建议[英] Database Structure Advice Needed

本文是小编为大家收集整理的关于需要数据库结构建议的处理/解决方法,可以参考本文帮助大家快速定位并解决问题,中文翻译不准确的可切换到English标签页查看源文。

问题描述

IM目前在包含产品目录的网站上工作.我对数据库设计有些新鲜,因此我正在寻找有关如何最好地做到这一点的建议.我熟悉关系数据库设计,因此我理解"很多到很多"或"一对一"等(在大学里参加了很好的DB课).这是一个可能被归类为:

的示例
Propeller -> aircraft -> wood -> brand -> product.

而不是试图写我到目前为止的内容,而只需快速查看我从phpmyadmin Designer功能创建的图像.

alt Text http://www.usfultimate.com/temp/db_design.jpg

现在,这一切看起来都很好,花花公子,直到我意识到"木材"类别也将在螺旋桨 - >乘船 - >(木材)下使用.这意味着,每当我想在另一个父母下使用它时,就必须重新创建"木材".这不是世界末日,但我想知道是否有一种更最佳的方法可以解决这个问题.

另外,我试图使这件事保持动态性,以便客户可以随着他的需求而组织他的目录.

*编辑.正在考虑仅创建一个"标签"表.因此,我可以将标签"木头"或"金属"或" 50英寸"分配给许多物品.我仍然会为主要类别保留育儿类型的东西,但是这样,这些类别不必那么深,也不会重复.

推荐答案

首先,用户界面:作为用户i Hate 在A 中组织的目录中搜索产品.我从不记得在哪种sub-sub-sub ...类别中,"异国情调"的产品是什么,这迫使我浪费时间探索"有希望的"类别只是为了发现它是在a中分类的(至少对我来说)奇怪的方式.

什么 kevin peno strong> faceted浏览 . AS marcia bates dot-bomb之后:这段时间," em> ..刻度分类是层次分类,因为关系数据库与层次数据库... ".

本质上,FaceTed搜索允许用户从他们喜欢的任何"方面"开始搜索您的目录,并让他们过滤信息在搜索中选择其他方面.请注意,与通常如何构想标签系统相反,没有什么可以阻止您在层次上组织其中的某些方面.

快速了解什么面积搜索是有意义的,有 一些演示 flamenco搜索界面项目 - 搜索接口.

第二,应用程序逻辑:什么 manitra 提出也是一个很好的建议(我理解),即不同关系中树/图的分离nodes和links.他所说的"祖先表"(然而,这是一个更好的直觉名称)被称为 transistive闭合定向的无环图(DAG)(可及性关系).除了性能之外,它可以大大简化查询,正如曼尼特拉所说的.

但我建议查看对于此类"祖先表"(及时关闭),因此更新是实时和增量的,而不是通过批处理作业进行定期.我在回答图形集的查询语言:数据建模问题.特别是,请查看保持图形及其图形及其图形的传递闭合sql (.ps -postscript).

产品类别关系

马尼特拉的第一点也值得强调.

他的意思是,在产品和类别之间存在许多与众不同的关系.即:每种产品可以分为一个或多个类别,在每个类别中都可以零或更多产品.

给定的关系变量(Relvars)产品和类别可以将这种关系表示为具有至少属性p#和c#的Relvar PC产品和类别编号.

这是类别层次结构管理的补充.当然,这只是一个设计草图.

在SQL

中浏览的浏览

实现"练习浏览"的一个有用概念是相关部门即使,关系比较(请参阅链接页面的底部). IE.将PC(产品类别)除以(增长)从用户选择的类别列表(Facet导航),一个人仅在此类类别中获得产品(当然,类别是 note yote 都是相互排斥的,否则选择两个类别将获得零产品).

基于SQL的DBM通常缺乏此操作员(除法和比较),因此我在下面提供了一些有趣的论文来实施/讨论它们:

等等...

我不会在此处详细介绍,但是类别层次结构和方面浏览之间的互动需要特殊护理.

"平面度"的离题

我简要介绍了 pras pras =" http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/" rel =" nofollow noreferrer">在mysql中管理层次数据简介:

简介

大多数用户一次或一次有 处理SQL中的层次数据 数据库,毫无疑问, 层次数据的管理不是 关系数据库的意图 为了.关系桌 数据库不是层次结构(就像 XML),但仅仅是平面列表. 分层数据有亲子 不自然的关系 在关系数据库中表示 桌子. ...

了解为什么这种对关系平坦的坚持只是胡说八道,请想象一下三维笛卡尔坐标系:它将通过8个坐标(三重率)(例如p1(x1,y1,z1),p2(x2,y2,z2),...,...,... p8(x8,y8,z8)[在这里我们不关心这些坐标的约束,因此它们确实代表了一个立方体].

现在,我们将将这些坐标集(点)放入关系变量中,我们将命名此变量Points.我们将表示Points的关系值如下表:

Points|  x |  y |  z |
=======+====+====+====+
       | x1 | y1 | z1 |
       +----+----+----+
       | x2 | y2 | z2 |
       +----+----+----+
       | .. | .. | .. |
       | .. | .. | .. |
       +----+----+----+
       | x8 | y8 | z8 |
       +----+----+----+

这个立方体是否仅仅通过表达方式代表它来"扁平"?关系(值)与其表格表示相同吗?

一个关系变量假定为n维离散空间中点的值集,其中n是关系属性的数量("列"). n维离散空间是"平坦的"是什么意思?正如我上面写的那样,只是胡说八道.

不要误会我的意思,SQL是一种设计不好的语言,基于SQL的DBMSE充满了特质和缺点(nulls,冗余,...),尤其是Bad的DBMS,尤其是DBMS是事实. - 作为哑店类型(无参考约束,没有完整性约束,...).但这与关系数据模型幻想的局限性无关,相反:更多的是,他们远离它,更糟的是结果.

尤其是,一旦您理解了关系模型,就可以在代表任何结构,甚至层次结构和图形方面都构成任何问题,正如我对上述已发表论文的引用所详细介绍的那样.即使是SQL也可以,如果您在其缺陷上掩盖了更好的东西.

在"嵌套集模型"

我浏览了其余的该文章和i' m对这种逻辑设计并不特别留下深刻的印象:它建议混淆两个不同的实体, nodes 和链接, 成为一个关系,这可能会引起尴尬.但是我不愿意更彻底地分析该设计,对不起.


编辑:斯蒂芬·埃格尔蒙特(Stephan Eggermont)在下面的评论中反对" 平面列表模型是一个问题.这是实现的抽象,使得绩效难以实现. ".

现在,我的意思是:

  1. 这个"平面列表模型"是幻想:仅仅是因为一个人(表示)关系为表("平面列表")并不意味着关系是"平面列表"(一个平面列表"(一个"(一个"对象"及其表示形式不是同一回事);
  2. 逻辑表示(关系)和物理存储细节(水平或垂直分解,压缩,索引(哈希,b+tree,r-tree,...),聚类,分区等)是不同的;关系数据模型的点之一( rdm )是将逻辑与"物理"模型解矛(对DBMSES的用户和实施者都有优势);
  3. 性能是物理存储细节(实现)的直接结果,不是逻辑表示的(Eggermont的评论是逻辑 - 物理混乱).

RDM模型不会以任何方式约束实现;一个人可以自由实施元组和关系.关系是不一定是文件,元组不一定是文件的记录.这种对应关系是愚蠢的直接图像实现.

不幸的是,基于SQL的DBMS实现 经常是愚蠢的直接图像实现,并且在各种情况下的性能差 - olap / vertica ,这是.. 的商业继任者

  • c-Store:column-opented dbms ;
  • monetdb ;
  • luciddb ;
  • kdb 以某种方式;
  • So so on ...
  • 当然,这一点是不是必须存在"最佳"物理存储设计,但是任何物理存储设计都可以通过NICE 声明性语言基于关系代数/calculi(和SQL是 bad 示例),或更直接地在逻辑编程语言上(例如Prolog,例如 - 请参阅我对" prolog to sql Converter "问题).良好的DBMS应根据数据访问统计信息(和/或用户提示)进行更改的物理存储设计.

    最后,在Eggermont的评论中," 关系模型正在云和Prevayler之间被挤压./p>

    其他推荐答案

    在数据库中创建层次类别模型之前,请查看本文解释了问题和解决方案(使用嵌套集).

    总而言之,使用简单的parent_category_id并不能很好地缩放,您将很难编写Performant SQL查询.答案是使用嵌套集,使您可视化多到多的类别模型作为嵌套在其他集合中的集合.

    其他推荐答案

    如果您希望类别具有多个父类别,那么它只是一种"多到很多"的关系,而不是"一对一"的关系.您需要在类别和自身之间放置一个桥接表.

    但是,我怀疑这就是您想要的.如果我在类别中寻找飞机>木材,那么我不想从划船>伍德中看到物品.有两个 Wood 类别,因为它们包含不同的项目.

    本文地址:https://www.itbaoku.cn/post/597393.html

    问题描述

    Im currently working on a site which will contain a products catalog. I am a little new to database design so I'm looking for advice on how best to do this. I am familiar with relational database design so I understand "many to many" or "one to many" etc (took a good db class in college). Here is an example of what an item might be categorized as:

    Propeller -> aircraft -> wood -> brand -> product.
    

    Instead of trying to write what I have so far, just take a quick look at this image I created from the phpmyadmin designer feature.

    alt text http://www.usfultimate.com/temp/db_design.jpg

    Now, this all seemed fine and dandy, until I realized that the category "wood" would also be used under propeller -> airboat -> (wood). This would mean, that "wood" would have to be recreated every time I want to use it under a different parent. This isn't the end of the world, but I wanted to know if there is a more optimal way to go about this.

    Also, I am trying to keep this thing as dynamic as possible so the client can organize his catalog as his needs change.

    *Edit. Was thinking about just creating a "tags" table. So I could assign the tag "wood" or "metal" or "50inch" to 1 to many items. I would still keep a parenting type thing for the main categories, but this way the categories wouldnt have to go so deep and there wouldnt be the repetition.

    推荐答案

    First, the user interface: as user I hate to search a product in a catalog organized in a strictly hierarchical way. I never remember in what sub-sub-sub-sub...-category an "exotic" product is in and this force me to waste time exploring "promising" categories just to discover it is categorized in a (for me, at least) strange way.

    What Kevin Peno suggests is a good advice and is known as faceted browsing. As Marcia Bates wrote in After the Dot-Bomb: Getting Web Information Retrieval Right This Time, " .. faceted classification is to hierarchical classification as relational databases are to hierarchical databases. .. ".

    In essence, faceted search allows users to search your catalog starting from whatever "facet" they prefer and let them filter information choosing other facets along the search. Note that, contrary to how tag systems are usually conceived, nothing prevents you to organize some of these facets hierarchically.

    To quickly understand what faceted search is all about, there are some demos to explore at The Flamenco Search Interface Project - Search Interfaces that Flow.

    Second, the application logic: what Manitra proposes is also a good advice (as I understand it), i.e. separating nodes and links of a tree/graph in different relations. What he calls "ancestor table" (which is a much better intuitive name, however) is known as transitive closure of a directed acyclic graph (DAG) (reachability relation). Beyond performance, it simplify queries greatly, as Manitra said.

    But I suggest a view for such "ancestor table" (transitive closure), so that updates are in real-time and incremental, not periodical by a batch job. There is SQL code (but I think it needs to be adapted a little to specific DBMSes) in papers I mentioned in my answer to query language for graph sets: data modeling question. In particular, look at Maintaining Transitive Closure of Graphs in SQL (.ps - postscript).

    Products-Categories relationship

    The first point of Manitra is worth of emphasis, also.

    What he is saying is that between products and categories there is a many-to-many relationship. I.e.: each product can be in one or more categories and in each category there can be zero or more products.

    Given relation variables (relvars) Products and Categories such relationship can be represented, for example, as a relvar PC with at least attributes P# and C#, i.e. product and category numbers (identifiers) in a foreign-key relationships with corresponding Products and Categories numbers.

    This is complementary to management of categories' hierarchies. Of course, this is only a design sketch.

    On faceted browsing in SQL

    A useful concept to implement "faceted browsing" is relational division, or, even, relational comparisons (see bottom of linked page). I.e. dividing PC (Products-Categories) by a (growing) list of categories chosen from a user (facet navigation) one obtains only products in such categories (of course, categories are presumed not all mutually exclusive, otherwise choosing two categories one will obtain zero products).

    SQL-based DBMS usually lack this operators (division and comparisons), so I give below some interesting papers that implement/discuss them:

    and so on...

    I will not go into details here but interaction between categories hierarchies and facet browsing needs special care.

    A digression on "flatness"

    I briefly looked at the article linked by Pras, Managing Hierarchical Data in MySQL, but I stopped reading after these few lines in the introduction:

    Introduction

    Most users at one time or another have dealt with hierarchical data in a SQL database and no doubt learned that the management of hierarchical data is not what a relational database is intended for. The tables of a relational database are not hierarchical (like XML), but are simply a flat list. Hierarchical data has a parent-child relationship that is not naturally represented in a relational database table. ...

    To understand why this insistence on flatness of relations is just nonsense, imagine a cube in a three dimensional Cartesian coordinate system: it will be identified by 8 coordinates (triplets), say P1(x1,y1,z1), P2(x2,y2,z2), ..., P8(x8, y8, z8) [here we are not concerned with constraints on these coordinates so that they represent really a cube].

    Now, we will put these set of coordinates (points) into a relation variable and we will name this variable Points. We will represent the relation value of Points as a table below:

    Points|  x |  y |  z |
    =======+====+====+====+
           | x1 | y1 | z1 |
           +----+----+----+
           | x2 | y2 | z2 |
           +----+----+----+
           | .. | .. | .. |
           | .. | .. | .. |
           +----+----+----+
           | x8 | y8 | z8 |
           +----+----+----+
    

    Does this cube is being "flattened" by the mere act of representing it in a tabular way? Is a relation (value) the same thing as its tabular representation?

    A relation variable assumes as values sets of points in a n-dimensional discrete space, where n is the number of relation attributes ("columns"). What does it mean, for a n-dimensional discrete space, to be "flat"? Just nonsense, as I wrote above.

    Don't get me wrong, It is certainly true that SQL is a badly designed language and that SQL-based DBMSes are full of idiosyncrasies and shortcomings (NULLs, redundancy, ...), especially the bad ones, the DBMS-as-dumb-store type (no referential constraints, no integrity constrains, ...). But that has nothing to do with relational data model fantasized limitations, on the contrary: more they turn away from it and worse is the outcome.

    In particular, the relational data model, once you understand it, poses no problem in representing whatever structure, even hierarchies and graphs, as I detailed with references to published papers mentioned above. Even SQL can, if you gloss over its deficiencies, missing something better.

    On the "The Nested Set Model"

    I skimmed the rest of that article and I'm not particularly impressed by such logical design: it suggests to muddle two different entities, nodes and links, into one relation and this will probably cause awkwardness. But I'm not inclined to analyze that design more thoroughly, sorry.


    EDIT: Stephan Eggermont objected, in comments below, that " The flat list model is a problem. It is an abstraction of the implementation that makes performance difficult to achieve. ... ".

    Now, my point is, precisely, that:

    1. this "flat list model" is a fantasy: just because one lay out (represents) relations as tables ("flat lists") does not mean that relations are "flat lists" (an "object" and its representations are not the same thing);
    2. a logical representation (relation) and physical storage details (horizontal or vertical decompositions, compression, indexes (hashes, b+tree, r-tree, ...), clustering, partitioning, etc.) are distinct; one of the points of relational data model (RDM) is to decouple logical from "physical" model (with advantages to both users and implementors of DBMSes);
    3. performance is a direct consequence of physical storage details (implementation) and not of logical representation (Eggermont's comment is a classic example of logical-physical confusion).

    RDM model does not constraint implementations in any way; one is free to implement tuples and relations as one see fit. Relations are not necessarily files and tuples are not necessarily records of a file. Such correspondence is a dumb direct-image implementation.

    Unfortunately SQL-based DBMS implementations are, too often, dumb direct-image implementations and they suffer poor performance in a variety of scenarios - OLAP/ETL products exist to cover these shortcomings.

    This is slowly changing. There are commercial and free software/open source implementations that finally avoid this fundamental pitfall:

    Of course, the point is not that there must exist an "optimal" physical storage design, but that whatever physical storage design can be abstracted away by a nice declarative language based on relational algebra/calculi (and SQL is a bad example) or more directly on a logic programming language (like Prolog, for example - see my answer to "prolog to SQL converter" question). A good DBMS should be change physical storage design on-the-fly, based on data access statistics (and/or user hints).

    Finally, in Eggermont's comment the statement " The relational model is getting squeeezed between the cloud and prevayler. " is another nonsense but I cannot give a rebuttal here, this comment is already too long.

    其他推荐答案

    Before you create a hierarchical category model in your database, take a look at this article which explains the problems and the solution (using nested sets).

    To summarize, using a simple parent_category_id doesn't scale very well and you'll have a hard time writing performant SQL queries. The answer is to use nested sets which make you visualize your many-to-many category model as sets which are nested inside other sets.

    其他推荐答案

    If you want categories to have multiple parent categories, then it's just a "many to many" relationship instead of a "one to many" relationship. You'll need to put a bridging table between category and itself.

    However, I doubt this is what you want. If I'm looking in the category Aircraft > Wood then I wouldn't want to see items from Boating > Wood. There are two Wood categories because they contain different items.