数据库中的历史行管理[英] history rows management in database

本文是小编为大家收集整理的关于数据库中的历史行管理的处理/解决方法,可以参考本文帮助大家快速定位并解决问题,中文翻译不准确的可切换到English标签页查看源文。

问题描述

与许多数据库一样,我正在设计一个数据库,该数据库应保留每个表中的行更改的先前版本.

此问题的标准解决方案是为每个数据表保留一个历史表, 每当需要在数据表中更新一行时,将当前行的副本插入到历史记录表中,而数据表中的一行将更新.

这个解决方案的缺点:

  • 维护2个表而不是1表(如果表的结构需要更改)
  • 应用程序需要知道两个表,而不是一个表
  • 表格的名称可能需要简短以保留表名和历史记录表名称(例如,some_table,some_table_hist)

我正在考虑一个不同的解决方案,并想知道是否还可以. 对于每个表,我们添加列is_last

  • 当一行插入表格时,它将被IS_LAST =1.
  • 插入
  • 当一行更新时,使用IS_LAST = 0的更改将重复原始行的副本,并且原始行将根据需要进行更新(仍然保持IS_LAST = 1).

假设在我的情况下,行平均更新10次. 另外,应用程序执行的操作中至少有90%仅在最近的行中发生.

我的数据库是一个Oracle 10G,因此为了保持"活动"表格,我们可以将表格分为2个分区:iS_last = 1分区,而IS_LAST = 0分区.

分区是解决历史数据保存问题的好方法吗?

此解决方案是否将其他分区限制为这些表?

谢谢!

推荐答案

第一个问题应该是:您将如何处理该数据?如果您没有明确的业务要求,请不要这样做.

我做了类似的事情,经过3年的运行,大约有20%的"有效数据",而休息是"以前的版本".它是1000万 + 4000万张记录.在过去的三年中,我们有2(两个)请求调查变更历史的历史,两次请求都很愚蠢 - 我们记录了记录变更的时间戳,我们被要求检查人员是否加班(下午5点以后).

现在,我们被超大数据库所困扰,其中包含80%没有人需要的数据.

编辑:

由于您要求提供可能的解决方案,我将描述我们的所作所为.这与您正在考虑的解决方案有些不同.

  1. 所有表都有替代主键.
  2. 所有主密钥都是由单个序列生成的.这可以正常工作,因为Oracle可以生成和缓存数字,因此这里没有性能问题.我们使用ORM,我们希望在内存中(以及数据库中的对应记录)中的每个对象都具有唯一的标识符
  3. 我们在数据库表和类之间使用ORM和映射信息是属性的形式.

我们记录了单个存档表中的所有更改,其中包括以下列:

  • ID(代理主键)
  • 时间邮票
  • 原始表
  • 原始记录的ID
  • 用户ID
  • 事务类型(插入,更新,删除)
  • 将数据记录为Varchar2字段
    • 这是字段名称/值对的形式的实际数据.

事物是这样工作的:

  • ORM具有插入/更新和删除comands.
  • 我们为所有业务对象创建了一个基类,覆盖插入/更新和删除命令
    • 插入/update/delete命令使用反射以fieldName/value对的形式创建字符串.代码寻找映射信息并读取字段名称,关联的值和字段类型.然后,我们创建类似于JSON的东西(我们添加了一些修改).当创建代表对象的当前状态的字符串时,将其插入存档表中.
  • 当将新的或更新的对象保存到数据库表中时,它将保存到他的目标表中,同时我们将一个带有当前值的记录插入存档表中.
  • 删除对象时,我们将其从他的目标表中删除,与此同时,我们在存档表中插入一个记录,该记录具有事务类型=" delete"

pro:

  • 我们没有数据库中每个表的存档表.当架构更改时,我们也不必担心更新存档表.
  • 完整的存档与"当前数据"分开,因此存档不会在数据库上施加任何性能.我们将其放在单独的磁盘上单独的表空间上,并且可以正常工作.
  • 我们创建了2种观看档案的表格:
    • 可以根据档案表上的过滤器列出存档表的总查看器.过滤数据用户可以输入表单(时间跨度,用户,...).我们以表单名称/值显示每个记录,并且每个更改均为颜色编码.用户可以看到每个记录的所有版本,他们可以看到谁以及进行更改.
    • 发票查看器 - 这很复杂,但是我们创建了表单,该表单显示与原始发票输入表非常相似,但是还有一些其他可以显示不同世代的按钮.创建此形式需要大量努力.表格被使用了几次,然后被遗忘了,因为当前工作流程不需要.
  • 用于创建存档记录的代码位于单个C#类中.数据库中的每个表上都不需要触发器.
  • 性能非常好.在高峰时段,大约700-800个用户使用系统.这是ASP.NET应用程序. ASP.NET和Oracle都在一个带有8GB RAM的双Xeon上运行.

cons:

  • 单个表存档格式比解决方案更难读取.
  • 在存档表中的非ID字段上搜索很难 - 我们只能在字符串上使用LIKE操作员.

因此,再次,检查存档上的要求.这不是微不足道的任务,但是收益和使用可能是最小的.

其他推荐答案

我会创建两个表:一个表格为islast的价值,一个用于历史值.然后,我将设置一个触发器,每次更新Islast时都将值插入历史表中.

其他推荐答案

如果我有1或2张历史表可以保留,我会按照Tuinstoel的建议进行.但是,如果您有数十张表来执行此操作,我会进一步朝着Zendar描述的解决方案前进.原因是这个.

您如何回答

之类的问题
  • 自从昨天一切正常时发生了什么变化?

  • 用户Smithg是否做出了任何更改?

这些问题需要每个表格一个查询,无论是单独的_HIST表还是表中的分区.没关系,这是一些巨大的查询列表.如果您有一个看起来像这样的中央桌子,那就是一块馅饼.

table_name, Column_name, PK, Before_value, After_value, User, timestamp

插入仅在值之后,

删除仅在值之前,

更新都具有两者,但仅适用于更改的列.

一些变化

如果您喜欢 您可以排除插入的列值,并且只需记录PK和我,因为正确的值仍在表中.

由于这是Oracle,您可以在table_name上进行分区,因此本质上,您实际上确实有一个"表"每个真实表.

您可以轻松回答上述问题,我认为这是最常见的问题.并且它处理您可以通过分区或_历史表回答的每个问题.

本文地址:https://www.itbaoku.cn/post/597615.html

问题描述

As in many databases, i am designing a database that should keep record of previous versions of the rows changed in each table.

The standard solution to this problem is to keep a history table for each data table, and whenever a row needs to be updated in the data table, a copy of the current row gets inserted to the history table and than the row in the data table gets updated.

the disadvantages of this solution for me:

  • maintenance of 2 tables instead of 1, (in case the structure of the table needs change)
  • the application needs to know both of the tables instead of one
  • names of the tables might need to be short to keep a convention of the table name and the history table name (SOME_TABLE, SOME_TABLE_HIST for example)

I am considering a different solution and would like to know if it's ok. for each table, we add the column IS_LAST

  • when a row gets inserted to the table, it will get inserted with IS_LAST=1.
  • when a row gets updated, a copy of the original row will be duplicated to the same table with the change of IS_LAST=0, and the original row will get updated as needed (still keeping IS_LAST=1).

assume that in my case, rows are updated at an average of 10 times. also assume that at least 90% of the actions performed by the application happens only on the recent version of the rows.

my database is an Oracle 10g so to keep the "active" table slim, we can split the table to 2 partitions: the IS_LAST=1 partition, and the IS_LAST=0 partition.

Is partitioning a good way of solving the history data keeping problem?

Does this solution limits other partition potential to these tables?

thanks!

推荐答案

First question should be: what would you do with that data? If you don't have clear business requirement, don't do it.

I did something similar and after 3 years of running there is about 20% of "valid data" and rest is "previous versions". And it is 10 million + 40 million records. In last three years we had 2 (two) requests to investigate history of changes and both times requests were silly - we record time stamp of record change and we were asked to check if persons worked overtime (after 5pm).

Now, we are stuck with oversized database that contains 80% of data that nobody needs.

EDIT:

Since you asked for possible solutions, I'll describe what we did. It's a bit different than solution you are considering.

  1. All tables have surrogate primary key.
  2. All primary keys are generated from single sequence. This works fine because Oracle can generate and cache numbers, so no performance problems here. We use ORM and we wanted each object in memory (and corresponding record in database) to have unique identifier
  3. We use ORM and mapping information between database table and class is in form of attributes.

We record all changes in single archive table with following columns:

  • id (surrogate primary key)
  • time stamp
  • original table
  • id of original record
  • user id
  • transaction type (insert, update, delete)
  • record data as varchar2 field
    • this is actual data in form of fieldname/value pairs.

Thing works this way:

  • ORM has insert/update and delete comands.
  • we created one base class for all our business objects that overrides insert/update and delete commands
    • insert/update/delete commands create string in form of fieldname/value pairs using reflection. Code looks for mapping information and reads field name, associated value and field type. Then we create something similar to JSON (we added some modifications). When string representing current state of object is created, it is inserted into archive table.
  • when new or updated object is saved to database table, it is saved to his target table and at the same time we insert one record with current value into archive table.
  • when object is deleted, we delete it from his target table and at the same time we insert one record in archive table that have transaction type = "DELETE"

Pro:

  • we don't have archive tables for each table in database. We also don't need to worry about updating archive table when schema changes.
  • complete archive is separated from "current data", so archive does not impose any performance hit on database. We put it onto separate tablespace on separate disk and it works fine.
  • we created 2 forms for viewing archive:
    • general viewer that can list archive table according to filter on archive table. Filter data user can enter on form (time span, user, ...). We show each record in form fieldname/value and each change is color coded. Users can see all versions for each record and they can see who and when made changes.
    • invoice viewer - this one was complex, but we created form that shows invoice very similar to original invoice entry form, but with some additional buttons that can show different generations. It took considerable effort to create this form. Form was used few times and then forgotten because it was not needed in current workflow.
  • code for creating archive records is located in single C# class. There is no need for triggers on every table in database.
  • performance is very good. At peak times, system is used by around 700-800 users. This is ASP.Net application. Both ASP.Net and Oracle are running on one dual XEON with 8Gb RAM.

Cons:

  • single table archive format is harder to read than solution where there is one archive table for each of the data tables.
  • search on non-id field in archive table is hard - we can use only LIKE operator on string.

So, again, check the requirements on archive. It is not trivial task, but gains and use can be minimal.

其他推荐答案

I'd create two tables: one for IsLast kind of values and one for historical ones. Then I'd setup a trigger that inserts value into the historical table every time the isLast is updated.

其他推荐答案

If I have 1 or 2 tables of history to keep I would do it exactly as Tuinstoel has suggested. But if you had dozens of tables to do this on I would move more toward a solution described by zendar. The reason is this.

How do you answer questions like,

  • What changed since yesterday when everything was fine?

  • Has user SMITHG made any changes?

Those questions require a one query per table, whether it's a separate _hist table or a partition inside the table. No matter, it's some huge list of queries. If you have a central table that looks like this, then it's a piece of pie.

table_name, Column_name, PK, Before_value, After_value, User, timestamp

Inserts have only after values,

Deletes have only before values,

Update have both but only for the columns which changed.

Some variations

You can include a column for I/U/D if you prefer You can exclude column values for Inserts and just record the PK and I since the correct values are still in the table.

Since this is Oracle you could partition on table_name, so in essence you actually do have one hist "table" per real table.

You can easily answer the above questions, which I believe are, quite simply, the most often asked questions. And it handles every question you can answer with partitions or _hist tables.