实施模型级缓存[英] Implementing Model-level caching

本文是小编为大家收集整理的关于实施模型级缓存的处理/解决方法,可以参考本文帮助大家快速定位并解决问题,中文翻译不准确的可切换到English标签页查看源文。

问题描述

我在相关问题缓存和有关实际实施的一些问题出现了.一个人如何实现一个模型级缓存,该缓存在不需要手动缓存的情况下透明地工作,但仍然保持高效?

我会保留我的缓存 牢固地责任 模型.这不是控制器的 或查看模型的业务 获取数据.他们所关心的只是 当请求数据时,数据是 提供 - 这就是MVC的方式 范式应该可以工作.

(来源: jarrod )

我持怀疑态度的原因是,除非有真正的需求,否则通常不应该执行缓存,并且不应为搜索结果之类的事情做.因此,以某种方式,模型本身必须知道发出的选择陈述是否值得被缓存.该模型是否必须在天文学上很聪明,也不是/或存储统计数据,以便为了准确做出决定,很长一段时间以来最常查询什么?所有这些的开销不会使缓存无用吗?

您将如何从另一个查询(或更准确地,从另一个结果集中设置)唯一地识别查询?如果您使用已准备好的语句,只有根据用户输入而更改参数怎么办?

另一个海报说:

我建议使用MD5哈希 您的查询与序列化的 输入参数的版本.

很少有碰撞的机会值得担心?

从概念上讲,模型中的缓存对我来说似乎是一个好主意,但是在实用性上,由于缓存的性质,开发人员的性质应直接控制它,并将其阐述为控制器逻辑.

.


赏金更新

我确实使用了与Activerecord有些相似的极轻的ORM,但能够在没有问题的情况下进行复杂的连接和子征服.我自己构建了它,所以它是灵活的,并且在关系或列名称方面不受限制,我只想了解如何实施缓存机制.

遵循有益的人的建议,我将在与其参数列表相连的查询中进行哈希(可能是MD5),并将其用作该特定数据存储的关键.我应该在需要它的模型类中单独实现缓存,还是应该成为ORM层的一部分?

我怎么知道什么时候应该无效?我是否必须在参数中解析更新/删除/插入查询和子,以找出正在修改的记录?或更糟糕的是,每当修改数据以跟踪哪些事情发生变化并应无效的情况时,是否会进行其他查询?

我将向我授予我明确的概念解释的人(无论这是否真的是必要/有效的),如果是这样,则有一些模型缓存的实施细节.如果这有助于缩小您的焦点,我正在使用PHP和MySQL.

推荐答案

您的帖子仅在模型是微不足道的ORM时才有意义.而且有很多原因为什么这是一件坏事.尝试考虑模型,好像是Web服务.

缓存是模型的责任.

您将如何从另一个查询(或更准确地,从另一个结果集中设置)唯一地识别查询?如果您使用已准备好的语句,只有根据用户输入而更改参数怎么办?

但模型的输入唯一定义了其输出.

如果您使用相同的型号来检索购物篮的内容并在产品目录上进行搜索,那么您的代码有问题.

即使在购物篮的情况下,在ttl的缓存数据中也可能有价值,而在处理交易所花费的时间少于目录搜索的情况下,该交易会改变其内容,并缓存列表几个小时的匹配产品可能不会对销售产生可衡量的影响,但是在减少数据库负载方面进行了折衷.

您正在使用琐碎的ORM的事实并不能排除您将其包裹在您自己的代码中.

模型不必在天文学上和/或存储统计

否.您确定是否缓存,如果无法确保缓存一致,则根据请求的类型强制执行TTL.

作为一般的经验法则,您应该能够根据绑定任何变量之前的选择查询预测适当的TTL,并且需要在设计时间实现这一点 - 但是显然结果应该是绑定后根据查询索引.

我应该在需要它的模型类中单独实现缓存,还是应该成为ORM层的一部分?

对于偏好,我将在模型类上实现它作为装饰器 - 这样您就可以轻松地将其移植到实施工厂而不是琐碎的模型中.

c.

其他推荐答案

缓存,例如哈希,无效等,有很多因素需要考虑,但是缓存的目标始终相同:减少响应时间和资源消耗.

这是我头顶上的几个快速思想,对于 不 使用ORM:

  • 如果您有内存的内存,则使用memcache的某些东西
  • 都不会受到伤害.
  • 您应该只能缓存
  • 所有缓存的查询都应参数化
  • 缓存键应为与参数的serialize()'d版本相连的查询的MD5(这标识了唯一的查询.串行的参数不是问题琐碎的).序列化并不像您想象的那样昂贵.而且,由于您将静态查询与动态参数相连,因此您永远不必担心碰撞.
  • 修改(INSERT/UPDATE/DELETE)对模型中的行应无效(或设置TTL),以在该模型中缓存的所有项目上
  • 应该扩展该模型以允许与查询一起发送缓存TTL值
  • 您的模型应该支持跳过缓存(可能是通过与查询一起传递0的TTL)
  • 即使可以缓存基本查询,通常在新(修改的)查询中应用ORDER BY/LIMIT键入操作通常更有效实现同一件事(除非您的网络和数据库服务器之间存在很高的延迟).

试图管理ORM系统的缓存验证是完全不同的野兽(由于关系),并且可能应逐案(在控制器中)进行处理.但是,如果您真正关心性能,那么您可能不会使用ORM.

更新:

如果您发现自己在单个线程中使用了同一模型类的多个实例,我建议还可能对实例化模型进行备忘录(取决于您的构造函数,对象的必要性和唤醒对象有时比构造对象更有效).一旦您拥有一个iNTIALIAD对象(无论是构造还是估算化),它是 worlds更有效 to clone()一个对象的基本实例,并设置其新状态,而不是重建PHP中的对象.

其他推荐答案

我持怀疑态度的原因是 通常不应该完成缓存 除非有真正的需求,并且 不应该为诸如此类的事情做 搜索结果.所以以某种方式模型 本身必须知道是否 发出的选择声明 值得被缓存.不会 模型必须在天文上很聪明, 和/或什么是商店统计数据 最常在很长的地方询问 为了准确的时间 做决定?不会 所有这一切的开销使缓存 反正没用?

还有谁更适合跟踪任何内容?多个控制器将使用相同的模型获取所需的数据.那么,在世界上,控制器将如何做出理性的决定?

没有艰难而快速的规则 - 智能的缓存策略几乎完全由上下文驱动.业务逻辑(再次,模型!)将决定在缓存需要无效的缓存等等中应有哪种事物.

您绝对正确地,缓存搜索结果似乎是个坏主意.我确定通常是.如果您的搜索结果生成非常昂贵,并且您正在做类似分页的事情,则可能需要一个使用搜索参数的每个用户缓存,该缓存具有最新结果.但是我认为这是一个相当特殊的案例.

没有上下文很难提供更具体的建议,但是这里有几个方案:

1)您的业务对象可以分配一个类别.类别很少发生变化.您的类别模型应该缓存完整的类别以进行阅读操作.当不频繁的右操作发生时,它们可能会使缓存无效.系统中的每个视图脚本现在都可以查询模型并将当前类别返回(为了渲染选择框,可以说),而无需与缓存有关.系统中的任何控制器现在都可以添加/更新/删除类别而不知道缓存.

2)您有一些复杂的公式,该公式会消耗多个输入,并为某种"产品"创建普及等级.页面布局中的某些小部件以摘要形式显示了5个最受欢迎的对象.您的产品模型将提供getPopular()方法,该方法将依赖于缓存.该模型每x分钟可能使缓存无效,或者某些背景过程可能会定期运行以使/重建无效.无论系统的哪个部分都需要流行产品,他们都会通过模型透明地管理缓存.

确切的缓存实现高度取决于您正在操纵的数据,并结合了典型的用例.

这里的警告是,如果您要滥用ActivereCord,和/或在控制器中撰写SQL查询(或等效),则可能会遇到问题.如果您有一个精确的型号,可以准确地模拟您的域,而不是仅包装数据库表.

,进行智能缓存要容易得多.

这与模型无关,而是关于 developer 很聪明的.

本文地址:https://www.itbaoku.cn/post/597450.html

问题描述

I was posting some comments in a related question about MVC caching and some questions about actual implementation came up. How does one implement a Model-level cache that works transparently without the developer needing to manually cache, yet still remains efficient?

I would keep my caching responsibilities firmly within the model. It is none of the controller's or view's business where the model is getting data. All they care about is that when data is requested, data is provided - this is how the MVC paradigm is supposed to work.

(Source: Post by Jarrod)

The reason I am skeptical is because caching should usually not be done unless there is a real need, and shouldn't be done for things like search results. So somehow the Model itself has to know whether or not the SELECT statement being issued to it is worthy of being cached. Wouldn't the Model have to be astronomically smart, and/or store statistics of what is being most often queried over a long period of time in order to accurately make a decision? And wouldn't the overhead of all this make the caching useless anyway?

How would you uniquely identify a query from another query (or more accurately, a result set from another result set)? What about if you're using prepared statements, with only the parameters changing according to user input?

Another poster said this:

I would suggest using the md5 hash of your query combined with a serialized version of your input arguments.

Is the minuscule chance of collision worth worrying about?

Conceptually, caching in the Model seems like a good idea to me, but it seems in practicality and due to the nature of caching the developer should have direct control over it and explicity code it into the controller logic.


Update for Bounty

I am indeed using an extremely lightweight ORM somewhat similar to ActiveRecord but is capable of doing complex joins and subqueries without the n^2 problem. I built it myself, so it is flexible and isn't restrictive in terms of relations or column names, and I just want to understand how I should implement the caching mechanism.

Following the advice of the helpful people, I would take a hash (probably md5) of the query concatenated with a list of its parameters, and use this as the key for that particular data store. Should I implement the caching individually in the Model classes that require it, or should it be part of the ORM layer?

How do I know when it should be invalidated? Would I have to parse the UPDATE/DELETE/INSERT queries and sub in parameters manually to find out which records are being modified? Or worse, do additional queries whenever data is modified to keep track of which things have changed and what should be invalidated?

I will award the bounty to whoever can give me a clear conceptual explanation (whether or not this is really necessary/efficient to be done transparently), and if so, has some implementation details for the Model caching. I am using PHP and MySQL if that helps to narrow your focus.

推荐答案

Your post only makes any sense if the model is a trivial ORM. And there are lots of reasons why that's a bad thing. Try thinking about the model as if it were a web service.

Caching is the responsiblity of the model.

How would you uniquely identify a query from another query (or more accurately, a result set from another result set)? What about if you're using prepared statements, with only the parameters changing according to user input?

But the inputs to the model uniquely define its output.

If you're using the same model to retrieve the contents of a shopping basket and to run a search on your product catalog then there's something wrong with your code.

Even in the case of the shopping basket, there may be merit in caching data with a TTL of less than the time taken to process a transaction which would change its contents, in the case of the catalog search, caching the list of matching products for a few hours will probably have no measurable impact on sales, but trade-off well in reducing database load.

The fact that you are using a trivial ORM out of the box does not exclude you from wrapping it in your own code.

Wouldn't the Model have to be astronomically smart, and/or store statistics

No. You make the determination on whether to cache, and if you can't ensure that the cache is consistent then enforce a TTL based on the type of request.

As a general rule of thumb, you should be able to predict appropriate TTLs based on the SELECT query before binding any variables and this needs to be implemented at design time - but obviously the results should be indexed based on the query after binding.

Should I implement the caching individually in the Model classes that require it, or should it be part of the ORM layer?

For preference I would implement this as a decorator on the model class - that way you can easily port it to models which implement a factory rather than trivial ORM.

C.

其他推荐答案

There are quite a few factors to consider with caching, such as hashing, invalidation, etc. But the goal of caching is always the same: to reduce response times and resource consumption.

Here are a couple of quick thoughts off the top of my head for systems that do not use ORM:

  • It never hurts to cache something using memcache if you have the memory for it
  • You should only ever cache SELECT queries since other types affect data
  • All cached queries should be parametized
  • The cache key should be an md5 of the query concatenated with a serialize()'d version of the parameters (this identifies unique queries. Seralizing parameters is not an issue because the size of parameters generally passed to select queries is usually quite trivial). Serializing isn't as expensive as you think. And because you hashed your static query concatenated with your dynamic params, you should never have to worry about collisions.
  • Modifications (INSERT/UPDATE/DELETE) to rows in a model should invalidate (or set a TTL) on all items cached for that model
  • The model should be extended to allow for cache TTL values to be sent along with a query
  • Your model should have support for skipping the cache (probably by passing TTL of 0 along with the query)
  • Even though a base query may be cached, it is generally more efficient to apply ORDER BY / LIMIT type operations in a new (modified) query rather than to pull an entire rowset from cache and manipulate it through PHP to achieve the same thing (unless there is very high latency between your web and database servers).

Attempting to manage cache validation for an ORM system is a completely different beast (due to relations), and should probably be handled on a case-by-case basis (in the controller). But if you're truly concerned with performance, chances are you wouldn't be using an ORM to begin with.

UPDATE:

If you find yourself using multiple instances of the same model class within a single thread, I would suggest also potentially memcaching your instantiated model (depending on your constructor, deserializing and waking an object is sometimes more efficient than constructing an object). Once you have an intialized object (whether constructed or deserialized), it is worlds more efficient to clone() a basic instance of an object and set its new state rather than to reconstruct an object in PHP.

其他推荐答案

The reason I am skeptical is because caching should usually not be done unless there is a real need, and shouldn't be done for things like search results. So somehow the Model itself has to know whether or not the SELECT statement being issued to it worthy of being cached. Wouldn't the Model have to be astronomically smart, and/or store statistics of what is being most often queried over a long period of time in order to accurately make a decision? And wouldn't the overhead of all this make the caching useless anyway?

Who else is better suited to track any of that? Multiple controllers will be using the same model to fetch the data they need. So how in the world would a controller be able to make a rational decision?

There are no hard and fast rules -- a smart caching strategy is almost completely driven by context. The business logic (again, models!) is going to dictate what sorts of things ought to be in the cache, when the cache needs to be invalidated, etc.

You're absolutely right that caching search results seems like a bad idea. I'm sure it usually is. It's possible that if your search results are very expensive to generate, and you're doing something like pagination, you might want a per-user cache that holds the most recent results, along with the search parameters. But I think that's a fairly special case.

It's difficult to give more specific advice without the context, but here are a couple of scenarios:

1) You have business objects that can have a category assigned. The categories rarely change. Your Category model ought to cache the full set of categories for read operations. When the infrequent right operations occur, they can invalidate the cache. Every view script in the system can now query the model and get the current categories back (for rendering select boxes, let's say) without concerning itself with the cache. Any controller in the system can now add/update/delete categories without knowing about the cache.

2) You have some complex formula that consumes multiple inputs and creates a popularity rating for some kind of "products". Some widget in your page layout shows the 5 most popular objects in summary form. Your Product model would provide a getPopular() method, which would rely on the cache. The model could invalidate the cache every X minutes, or some background process could run at regular intervals to invalidate/rebuild. No matter what part of the system wants the popular products, they request it via the model, which transparently manages the cache.

The exact caching implementation is highly dependent on the sort of data you're manipulating, combined with the typical use cases.

The caveat here is that if you're abusing ActiveRecord, and/or composing SQL queries (or equivalents) in your controllers, you're probably going to have issues. Doing smart caching is a lot easier if you've got a nice, rich, model layer that accurately models your domain, instead of flimsy models that just wrap database tables.

It's not about the Models being smart, it's about the developer being smart.