用于存储历史数据的数据库结构[英] Database structure for storing historical data

本文是小编为大家收集整理的关于用于存储历史数据的数据库结构的处理/解决方法,可以参考本文帮助大家快速定位并解决问题,中文翻译不准确的可切换到English标签页查看源文。

问题描述

前言: 前几天,我考虑了一个新应用程序的新数据库结构,并意识到我们需要一种以有效的方式存储历史数据的方法.我想让别人看一看,看看这种结构是否有任何问题.我意识到,这种存储数据的方法很可能已经发明了(我几乎可以肯定它已经发明了),但是我不知道它是否有名字和一些我尝试过的Google搜索,但没有产生任何收益.

问题: 可以说,您有一个订单表,订单与下订单的客户的客户表有关.在普通数据库结构中,您可能会期望这样的事情:

orders
------
orderID
customerID


customers
---------
customerID
address
address2
city
state
zip

非常简单,OrderID具有CustomerId的外键,这是客户表的主要钥匙.但是,如果我们要通过订单表运行报告,我们将加入客户表进入订单表,这将带回该客户ID的当前记录.如果下订单时,客户地址有所不同,随后已更改.现在,我们的订单不再反映该订单时客户所说的历史记录.基本上,通过更改客户记录,我们只是更改了该客户的所有历史记录.

现在有几种方法,其中之一是在创建订单时复制记录.我想出的是我认为这样做的一种更简单的方法,也许更优雅,并且在任何更改时都具有额外的奖励.

如果我做这样的结构,该怎么办:

orders
------
orderID
customerID
customerHistoryID


customers
---------
customerID
customerHistoryID


customerHistory
--------
customerHistoryID
customerID
address
address2
city
state
zip
updatedBy
updatedOn

请原谅格式,但我认为您可以看到这个想法.基本上,这个想法是,每当客户更改,插入或更新时,客户历史程序都会增加,并且客户表已随着最新的CustomerHistoryID更新.现在,订单表不仅指向客户ID(它允许您查看客户记录的所有修订版),还指向CustomerHistoryId,该customentoryId指出了记录的特定修订.现在,该订单在创建订单时反映了数据状态.

通过将更新和更新列添加到客户史表中,您还可以看到数据的"审核日志",因此您可以看到谁进行了更改以及何时进行更改.

一个潜在的缺点可能是删除的,但是我并不是真的担心这一点,因为任何东西都不应删除.但是,即使仍然可以通过使用ActiveFlag或类似的内容来实现相同的效果.

我的想法是所有表都将使用此结构.每当检索历史数据时,都会使用CustomerhistordID与历史表相对于历史表,以显示该特定顺序的数据状态.

检索客户列表很容易,它只需加入客户史上的客户表.

从设计的角度来看,任何人都可以看到这种方法的任何问题,也可以看到这种方法不好的原因.请记住,无论我做什么,我都需要确保保留历史数据,以便随后的记录更新不会改变历史记录.有没有更好的办法?这是一个具有名称的已知想法,还是上面的任何文档?

感谢您的任何帮助.

更新: 这是我真正要拥有的一个非常简单的例子.我的真实应用程序将带有"订单",并带有其他表格的外国钥匙.来源/目的地位置信息,客户信息,设施信息,用户信息等.有人建议我可以将信息复制到订单记录中,我已经看到它以这种方式完成了多次但这将导致数百列的记录,在这种情况下确实是不可行的.

推荐答案

当我遇到此类问题时,一个替代方法是使记录表订单.它的功能相同,但更容易遵循

orders
------
orderID
customerID
address
City
state
zip



customers
---------
customerID
address
City
state
zip

编辑:如果您喜欢的列数达到很高,则可以根据自己喜欢的方式将其分开.

如果您确实选择了其他选项并使用历史表,则应考虑使用 bitmoral 数据,因为您可能必须处理需要纠正历史数据的可能性.例如,客户将当前地址从A更改为B,但您还必须在当前履行的现有订单上更正地址.

另外,如果您使用的是MS SQL Server,则可能需要考虑使用索引视图.这将使您能够交易少量的增量插入/更新perf降低,以增加精选的perf增加.如果您不使用MS SQL Server,则可以使用触发器和表来复制此.

其他推荐答案

当您设计数据结构时,请非常谨慎地存储正确的关系,而不是与正确关系相似的东西.如果需要维护订单的地址,那是因为地址是订单的一部分,而不是客户.此外,单位价格是订单的一部分,而不是产品等.

尝试这样的安排:

Customer
--------
CustomerId (PK)
Name
AddressId (FK)
PhoneNumber
Email

Order
-----
OrderId (PK)
CustomerId (FK)
ShippingAddressId (FK)
BillingAddressId (FK)
TotalAmount

Address
-------
AddressId (PK)
AddressLine1
AddressLine2
City
Region
Country
PostalCode

OrderLineItem
-------------
OrderId (PK) (FK)
OrderItemSequence (PK)
ProductId (FK)
UnitPrice
Quantity

Product
-------
ProductId (PK)
Price

etc.

如果您确实需要存储历史记录,例如跟踪随着时间的推移更改订单,那么您应该使用日志或审核表来执行此操作,而不是通过交易表进行操作.

其他推荐答案

通常,订购只需在订单时存储信息即可.诸如零件号,零件名称和价格以及客户地址和名称之类的事情尤其如此.然后,您不必加入5或六个表即可获取可以存储在其中的信息.这不是统计化,因为您实际上需要具有订单时存在的创新.我认为,在订单和订单详细信息(存储单个项目订购的单个项目)中,就数据意外更改而言,将此信息较少.

您的订单表将没有数百列.由于一对一关系,您将拥有一个订单表和订单详细表.订单表将包括订单号.客户ID 9SO您可以搜索该客户订购的所有内容,即使名称更改),客户名称,客户地址(请注意,您不需要城市州zip等,将地址放在一个字段中),订购日期,甚至可能是其他几个字段直接与最高级别有关的字段.然后,您有一个订单详细表,该表具有订单号,lidet_id,零件号,零件描述(这可以是诸如大小,颜色等的一堆字段的合并.或者您可以分开最常见的内容),没有项目,单位类型,每单位价格,税收,总价,船舶日期,状态.您为订购的每个项目输入一个条目.

本文地址:https://www.itbaoku.cn/post/597495.html

问题描述

Preface: I was thinking the other day about a new database structure for a new application and realized that we needed a way to store historical data in an efficient way. I was wanting someone else to take a look and see if there are any problems with this structure. I realize that this method of storing data may very well have been invented before (I am almost certain it has) but I have no idea if it has a name and some google searches that I tried didn't yield anything.

Problem: Lets say you have a table for orders, and orders are related to a customer table for the customer that placed the order. In a normal database structure you might expect something like this:

orders
------
orderID
customerID


customers
---------
customerID
address
address2
city
state
zip

Pretty straightforward, orderID has a foreign key of customerID which is the primary key of the customer table. But if we were to go and run a report over the order table, we are going to join the customers table to the orders table, which will bring back the current record for that customer ID. What if when the order was placed, the customers address was different and it has been subsequently changed. Now our order no longer reflects the history of that customers address, at the time the order was placed. Basically, by changing the customer record, we just changed all history for that customer.

Now there are several ways around this, one of which would be to copy the record when an order was created. What I have come up with though is what I think would be an easier way to do this that is perhaps a little more elegant, and has the added bonus of logging anytime a change is made.

What if I did a structure like this instead:

orders
------
orderID
customerID
customerHistoryID


customers
---------
customerID
customerHistoryID


customerHistory
--------
customerHistoryID
customerID
address
address2
city
state
zip
updatedBy
updatedOn

please forgive the formatting, but I think you can see the idea. Basically, the idea is that anytime a customer is changed, insert or update, the customerHistoryID is incremented and the customers table is updated with the latest customerHistoryID. The order table now not only points to the customerID (which allows you to see all revisions of the customer record), but also to the customerHistoryID, which points to a specific revision of the record. Now the order reflects the state of data at the time the order was created.

By adding an updatedby and updatedon column to the customerHistory table, you can also see an "audit log" of the data, so you could see who made the changes and when.

One potential downside could be deletes, but I am not really worried about that for this need as nothing should ever be deleted. But even still, the same effect could be achieved by using an activeFlag or something like it depending on the domain of the data.

My thought is that all tables would use this structure. Anytime historical data is being retrieved, it would be joined against the history table using the customerHistoryID to show the state of data for that particular order.

Retrieving a list of customers is easy, it just takes a join to the customer table on the customerHistoryID.

Can anyone see any problems with this approach, either from a design standpoint, or performance reasons why this is bad. Remember, no matter what I do I need to make sure that the historical data is preserved so that subsequent updates to records do not change history. Is there a better way? Is this a known idea that has a name, or any documentation on it?

Thanks for any help.

Update: This is a very simple example of what I am really going to have. My real application will have "orders" with several foreign keys to other tables. Origin/destination location information, customer information, facility information, user information, etc. It has been suggested a couple of times that I could copy the information into the order record at that point, and I have seen it done this way many times, but this would result in a record with hundreds of columns, which really isn't feasible in this case.

推荐答案

When I've encountered such problems one alternative is to make the order the history table. Its functions the same but its a little easier to follow

orders
------
orderID
customerID
address
City
state
zip



customers
---------
customerID
address
City
state
zip

EDIT: if the number of columns gets to high for your liking you can separate it out however you like.

If you do go with the other option and using history tables you should consider using bitemporal data since you may have to deal with the possibility that historical data needs to be corrected. For example Customer Changed his current address From A to B but you also have to correct address on an existing order that is currently be fulfilled.

Also if you are using MS SQL Server you might want to consider using indexed views. That will allow you to trade a small incremental insert/update perf decrease for a large select perf increase. If you're not using MS SQL server you can replicate this using triggers and tables.

其他推荐答案

When you are designing your data structures, be very carful to store the correct relationships, not something that is similar to the correct relationships. If the address for an order needs to be maintained, then that is because the address is part of the order, not the customer. Also, unit prices are part of the order, not the product, etc.

Try an arrangement like this:

Customer
--------
CustomerId (PK)
Name
AddressId (FK)
PhoneNumber
Email

Order
-----
OrderId (PK)
CustomerId (FK)
ShippingAddressId (FK)
BillingAddressId (FK)
TotalAmount

Address
-------
AddressId (PK)
AddressLine1
AddressLine2
City
Region
Country
PostalCode

OrderLineItem
-------------
OrderId (PK) (FK)
OrderItemSequence (PK)
ProductId (FK)
UnitPrice
Quantity

Product
-------
ProductId (PK)
Price

etc.

If you truly need to store history for something, like tracking changes to an order over time, then you should do that with a log or audit table, not with your transaction tables.

其他推荐答案

Normally orders simply store the information as it is at the time of the order. This is especially true of things like part numbers, part names and prices as well as customer address and name. Then you don;t have to join to 5 or six tables to get teh information that can be stored in one. This is not denormalization as you actually need to have the innformation as it existed at the time of the order. I think is is less likely that having this information in the order and order detail (stores the individual items ordered) tables is less risky in terms of accidental change to the data as well.

Your order table would not have hundreds of columns. You would have an order table and an order detail table due to one to many relationships. Order table would include order no. customer id 9so you can search for everything this customer has ever ordered even if the name changed), customer name, customer address (note you don't need city state zip etc, put the address in one field), order date and possibly a few other fields that relate directly to the order at a top level. Then you have an order detail table that has order number, detail_id, part number, part description (this can be a consolidation of a bunch of fields like size, color etc. or you can separate out the most common), No of items, unit type, price per unit, taxes, total price, ship date, status. You put one entry in for each item ordered.