智能(?)数据库缓存[英] Smart (?) Database Cache

本文是小编为大家收集整理的关于智能(?)数据库缓存的处理/解决方法,可以参考本文帮助大家快速定位并解决问题,中文翻译不准确的可切换到English标签页查看源文。

问题描述

我已经看过几个数据库缓存引擎,它们都很愚蠢(即:keep this query cached for X minutes),要求您在INSERT/UPDATE/DELETE具有的情况下手动删除整个缓存存储库被执行.

大约2或3年前,我为正在从事的项目开发了一个替代的DB缓存系统,该想法基本上是使用正则表达式来查找与特定SQL查询有关的表:

$query_patterns = array
(
    'INSERT' => '/INTO\s+(\w+)\s+/i',
    'SELECT' => '/FROM\s+((?:[\w]|,\s*)+)(?:\s+(?:[LEFT|RIGHT|OUTER|INNER|NATURAL|CROSS]\s*)*JOIN\s+((?:[\w]|,\s*)+)\s*)*/i',
    'UPDATE' => '/UPDATE\s+(\w+)\s+SET/i',
    'DELETE' => '/FROM\s+((?:[\w]|,\s*)+)/i',
    'REPLACE' => '/INTO\s+(\w+)\s+/i',
    'TRUNCATE' => '/TRUNCATE\s+(\w+)/i',
    'LOAD' => '/INTO\s+TABLE\s+(\w+)/i',
);

我知道这些正则可能存在一些缺陷(当时我的正则技能非常绿色),并且显然不匹配嵌套的查询,但是由于我从不使用它们,这对我来说不是问题.

无论如何,找到所涉及的表后,我将按字母顺序排序并在缓存存储库中创建一个新文件夹,并使用以下命名约定:

+table_a+table_b+table_c+table_...+

如果SELECT查询,我会从数据库中获取结果,serialize()它们并将它们存储在适当的缓存文件夹中,因此例如以下查询的结果:

SELECT `table_a`.`title`, `table_b`.`description` FROM `table_a`, `table_b` WHERE `table_a`.`id` <= 10 ORDER BY `table_a`.`id` ASC;

将存储在:

/cache/+table_a+table_b+/079138e64d88039ab9cb2eab3b6bdb7b.md5

MD5本身是查询本身.随之而来的选择查询,结果将是微不足道的.

如果任何其他类型的写Query(INSERT,REPLACE,UPDATE,UPDATE,DELETE等),我将glob()所有具有+matched_table(s)+的文件夹,所有+matched_table(s)+ in sumpl所有文件内容.这样,就不必删除整个缓存,只有受影响和相关表使用的缓存.

该系统运行良好,性能差异很明显 - 尽管该项目的阅读疑问比写查询要多得多.从那以后,我开始使用交易,FK Cascade UPDATES/DELETES,从来没有时间完善系统来使其与这些功能一起使用.

我已经使用了 mysql Query Query Query Cache 过去,我必须说表演甚至没有比较.

我想知道:我是唯一看到这个系统中美丽的人吗?我可能不知道有瓶颈吗?为什么喜欢 codeigniter kohana (我不知道 Zend Framework )具有这样的基本DB缓存系统?

更重要的是,您认为这是值得追求的功能吗?如果是的,我有什么可以/使用使它更快的(我的主要问题是磁盘I/O和(de)序列化 /p>

我感谢所有输入,谢谢

推荐答案

我可以看到该解决方案中的美丽,但是,我相信它仅适用于非常特定的应用程序.不适用的方案包括:

  • 使用级联删除/更新或任何类型的触发器的数据库.例如,您的删除表A可能会导致表B中的删除.

  • 从无法通过您的缓存无效方案的点访问数据库,例如crontab脚本等.如果您曾经决定跨机器实现复制(引入只读的奴隶),它也可能会干扰缓存(因为它不会通过缓存无效等.)

即使这些方案对您的情况不现实,它仍然回答了为什么框架不实施这种缓存的问题.

关于这是值得追求的,这完全取决于您的申请.也许您想提供更多信息?

其他推荐答案

您描述的解决方案有同时发生问题的风险.当您每秒收到数百个查询时,您一定会点击一个更新语句运行的情况,但是在清除缓存之前,请从中读取读取并获取过时的数据.此外,当几个更新在短时间内击中相同的行时,您可能会遇到问题.

从更广泛的意义上讲,缓存的最佳实践是缓存最大的对象.例如,而不是在整个地方缓存一堆"用户"相关的行,而是只需缓存"用户"对象本身.

更好的是,如果您可以缓存整个页面(例如,向所有人显示相同的主页;个人资料页面几乎与每个人等相同),那就更好了.一个缓存获取完整的预渲染页面将极大地超过数十个缓存,以获取行/查询级别的缓存,然后重新延长页面.

长话短说:个人资料.如果您花时间进行一些测量,您可能会发现,在构建这些事物的大型物体,甚至不是小的查询中,这是一个巨大的性能胜利.

其他推荐答案

我确实看到了这一点的美 - 尤其是对于资源有限且不能轻易扩展的环境,例如共享托管 - 我个人会在将来担心并发症:如果某人,新雇用的人,该怎么办不知道缓存机制,开始使用嵌套查询?如果某些外部服务开始更新表,并且没有注意到缓存怎么办?

对于一个专业的,定义的项目,迫切需要通过添加处理器或RAM无法帮助的加速度,这看起来像是一个很好的解决方案.作为一般组成部分,我发现它太动摇了,从长远来看,人们会害怕细微的问题,这些问题源于人们忘记有一个可以意识到的缓存.

本文地址:https://www.itbaoku.cn/post/597684.html

问题描述

I've seen several database cache engines, all of them are pretty dumb (i.e.: keep this query cached for X minutes) and require that you manually delete the whole cache repository after a INSERT / UPDATE / DELETE query has been executed.

About 2 or 3 years ago I developed an alternative DB cache system for a project I was working on, the idea was basically to use regular expressions to find the table(s) involved in a particular SQL query:

$query_patterns = array
(
    'INSERT' => '/INTO\s+(\w+)\s+/i',
    'SELECT' => '/FROM\s+((?:[\w]|,\s*)+)(?:\s+(?:[LEFT|RIGHT|OUTER|INNER|NATURAL|CROSS]\s*)*JOIN\s+((?:[\w]|,\s*)+)\s*)*/i',
    'UPDATE' => '/UPDATE\s+(\w+)\s+SET/i',
    'DELETE' => '/FROM\s+((?:[\w]|,\s*)+)/i',
    'REPLACE' => '/INTO\s+(\w+)\s+/i',
    'TRUNCATE' => '/TRUNCATE\s+(\w+)/i',
    'LOAD' => '/INTO\s+TABLE\s+(\w+)/i',
);

I know that these regexs probably have some flaws (my regex skills were pretty green back then) and obviously don't match nested queries, but since I never use them that isn't a problem for me.

Anyway, after finding the involved tables I would alphabetically sort them and create a new folder in the cache repository with the following naming convention:

+table_a+table_b+table_c+table_...+

In case of a SELECT query, I would fetch the results from the database, serialize() them and store them in the appropriate cache folder, so for instance the results of the following query:

SELECT `table_a`.`title`, `table_b`.`description` FROM `table_a`, `table_b` WHERE `table_a`.`id` <= 10 ORDER BY `table_a`.`id` ASC;

Would be stored in:

/cache/+table_a+table_b+/079138e64d88039ab9cb2eab3b6bdb7b.md5

The MD5 being the query itself. Upon a consequent SELECT query the results would be trivial to fetch.

In case of any other type of write query (INSERT, REPLACE, UPDATE, DELETE and so on) I would glob() all the folders that had +matched_table(s)+ in their name all delete all the file contents. This way it wouldn't be necessary to delete the whole cache, just the cache used by the affected and related tables.

The system worked pretty well and the difference of performance was visible - although the project had many more read queries than write queries. Since then I started using transactions, FK CASCADE UPDATES / DELETES and never had the time to perfect the system to make it work with these features.

I've used MySQL Query Cache in the past but I must say the performance doesn't even compare.

I'm wondering: am I the only one who sees beauty in this system? Is there any bottlenecks I may not be aware of? Why do popular frameworks like CodeIgniter and Kohana (I'm not aware of Zend Framework) have such rudimentary DB cache systems?

More importantly, do you see this as a feature worth pursuing? If yes, is there anything I could do / use to make it even faster (my main concerns are disk I/O and (de)serialization of query results)?

I appreciate all input, thanks.

推荐答案

I can see the beauty in this solution, however, I belive it only works for a very specific set of applications. Scenarios where it is not applicable include:

  • Databases which utilize cascading deletes/updates or any kind of triggers. E.g., your DELETE to table A may cause a DELETE from table B. The regex will never catch this.

  • Accessing the database from points which do not go through you cache invalidation scheme, e.g. crontab scripts etc. If you ever decide to implement replication across machines (introduce read-only slaves), it may also disturb the cache (because it does not go through cache invalidation etc.)

Even if these scenarios are not realistic for your case it does still answer the question of why frameworks do not implement this kind of cache.

Regarding if this is worth pursuing, it all depends on your application. Maybe you care to supply more information?

其他推荐答案

The solution, as you describe it, is at risk for concurrency issues. When you're receiving hundreds of queries per second, you're bound to hit a case where an UPDATE statement runs, but before you can clear your cache, a SELECT reads from it, and gets stale data. Additionally, you may run in to issues when several UPDATEs hit the same set of rows in a short time period.

In a broader sense, best practice with caching is to cache the largest objects possible. E.g., rather than having a bunch of "user"-related rows cached all over the place, it's better to just cache the "user" object itself.

Even better, if you can cache whole pages (e.g., you show the same homepage to everyone; a profile page appears identical to almost everyone, etc.), that's even better. One cache fetch for a whole, pre-rendered page will dramatically outperform dozens of cache fetches for row/query level caches followed by re-rending the page.

Long story short: profile. If you take the time to do some measurement, you'll likely find that caching large objects, or even pages, rather than small queries used to build those things, is a huge performance win.

其他推荐答案

While I do see the beauty in this - especially for environments where resources are limited and can not easily be extended, like on shared hosting - I personally would fear complications in the future: What if somebody, newly hired and unaware of the caching mechanism, starts using nested queries? What if some external service starts updating the table, with the cache not noticing?

For a specialized, defined project that urgently needs a speedup that cannot be helped by adding processor power or RAM, this looks like a great solution. As a general component, I find it too shaky, and would fear subtle problems in the long run that stem from people forgetting that there is a cache to be aware of.