在Mongo中存储嵌套类别(或分层数据)的最有效方法?[英] Most efficient way to store nested categories (or hierarchical data) in Mongo?

本文是小编为大家收集整理的关于在Mongo中存储嵌套类别(或分层数据)的最有效方法?的处理/解决方法,可以参考本文帮助大家快速定位并解决问题,中文翻译不准确的可切换到English标签页查看源文。

问题描述

我们有多种产品的嵌套类别(例如, Sports->篮球 - >男子, Sports-> Tennis-> tennis->女性),并且正在使用Mongo代替MySQL .

我们知道如何将嵌套类别存储在MySQL之类的SQL数据库中,但希望对Mongo做什么建议.我们需要优化的操作是迅速找到一个类别或子类别中的所有产品,这些产品可以嵌套在根类别以下的几层(例如,男士篮球类别中的所有产品或所有产品中的所有产品女性网球类别).

在说当子树需要操作时,我们需要的工作不佳,我们需要(因为类别可以达到多个级别).

关于有效存储和搜索任意深度的嵌套类别的最佳方法的任何建议?

推荐答案

您要决定的第一件事是您将使用哪种树.

要考虑的大事是您的数据和访问模式.您已经说过,您所有工作的90%都是查询,并且通过它的声音(电子商务)更新只会由管理员运行,很可能很少.

.

因此,您想要一个模式,使您能够通过一条路径快速查询孩子的力量,即:体育 - >篮球 - >男子,运动 - >网球 - >女子 - >女子,并且不需要真正的扩展更新.

正如您正确指出的那样,MongoDB确实有一个很好的文档页面: https://docs.mongodb.com/manual/applications/data-models-tree-schrentures/ 10Gen实际上说明了树木的不同模型和模型方法,并描述了它们的主要起伏.

如果您想轻松查询的话,应该引起注意的是实现的路径: https://docs.mongodb.com/manual/tutorial/model-tree-scruptures-with-materialized-paths/

这是一种非常有趣的方法来建立树木,因为要查询上面给"网球"中的"女性"的示例,您可以简单地做一个预固定的正则正则(可以使用索引:

db.products.find({category: /^Sports,Tennis,Womens[,]/})

找到在树的某个路径下列出的所有产品.

不幸的是,此模型在更新时确实很不好,如果您移动类别或更改其名称,则必须更新所有产品,并且可能有成千上万的产品.

更好的方法是在产品上容纳cat_id,然后将类别分为带有模式的单独集合:

{
    _id: ObjectId(),
    name: 'Women\'s',
    path: 'Sports,Tennis,Womens',
    normed_name: 'all_special_chars_and_spaces_and_case_senstive_letters_taken_out_like_this'
}

因此,现在您的查询仅涉及类别集合,这应该使它们变得更小,更具性能.例外是当您删除类别时,产品仍然需要触摸.

因此,将"网球"更改为" Badmin"的例子:

db.categories.update({path:/^Sports,Tennis[,]/}).forEach(function(doc){
    doc.path = doc.path.replace(/,Tennis/, ",Badmin");
    db.categories.save(doc);
});

不幸的是,MongoDB目前没有提供任何疑问文档反射,因此您必须将它们拉出客户端,这有点烦人,但是希望它不会导致太多类别被带回.

这基本上就是它的真正运作方式.更新有点痛苦,但是我相信,能够立即在任何路径上查询任何索引的能力更适合您的情况.

当然,额外的好处是,该模式与嵌套集模型兼容: http://en. wikipedia.org/wiki/nested_set_model 我一次又一次地发现它对于电子商务网站来说真是太棒了用户来自哪里.

实体路径的模式很容易通过添加另一个简单的path来支持这一点.

希望这是有道理的,那里很长.

其他推荐答案

如果所有类别都不同,则将它们视为标签.层次结构不需要在项目中进行编码,因为在查询项目时不需要它们.层次结构是一个介绍的事物.标记每个项目的所有类别,因此可以将"运动>棒球>鞋子"保存为{..., categories: ["sport", "baseball", "shoes"], ...}.如果您想要"运动"类别中的所有项目,请搜索{categories: "sport"},如果您只想鞋子,请搜索{tags: "shoes"}.

这不会捕获层次结构,但是如果您考虑它,那没关系.如果类别是不同的,则在查询项目时,层次结构无济于事.将没有其他"棒球",因此,当您搜索时,您只会在层次结构中获得"棒球"水平以下.

我的建议依赖于类别是不同的,我想它们不在您当前的模型中.但是,没有理由不能使它们与众不同.您可能选择使用您在页面上显示的字符串作为数据库中的类别名称.如果您使用符合符号名称,例如" sport"或" womens_shoes",并使用查找表查找要在页面上显示的字符串(如果类别的名称都更改,这也可以节省您的工作时间 - 并且它将使翻译网站更容易,如果您需要这样做)您可以轻松确保它们与众不同,因为它们与页面上显示的内容无关.因此,如果您在层次结构中有两个"鞋子"(例如,"网球>女子>鞋子"和"网球>男士>鞋子"),您可以添加一个预选赛以使其与众不同(例如" womeens_shoes"和" mens_shoes"和" mens_shoes" ,或" tennis_womens_shoes")符号名称是任意的,可以是任何东西,您甚至可以使用数字,并且每次添加类别时都可以使用序列中的下一个数字.

本文地址:https://www.itbaoku.cn/post/597625.html

问题描述

We have nested categories for several products (e.g., Sports -> Basketball -> Men's, Sports -> Tennis -> Women's ) and are using Mongo instead of MySQL.

We know how to store nested categories in a SQL database like MySQL, but would appreciate any advice on what to do for Mongo. The operation we need to optimize for is quickly finding all products in one category or subcategory, which could be nested several layers below a root category (e.g., all products in the Men's Basketball category or all products in the Women's Tennis category).

This Mongo doc suggests one approach, but it says it doesn't work well when operations are needed for subtrees, which we need (since categories can reach multiple levels).

Any suggestions on the best way to efficiently store and search nested categories of arbitrary depth?

推荐答案

The first thing you want to decide is exactly what kind of tree you will use.

The big thing to consider is your data and access patterns. You have already stated that 90% of all your work will be querying and by the sounds of it (e-commerce) updates will only be run by administrators, most likely rarely.

So you want a schema that gives you the power of querying quickly on child through a path, i.e.: Sports -> Basketball -> Men's, Sports -> Tennis -> Women's, and doesn't really need to truly scale to updates.

As you so rightly pointed out MongoDB does have a good documentation page for this: https://docs.mongodb.com/manual/applications/data-models-tree-structures/ whereby 10gen actually state different models and schema methods for trees and describes the main ups and downs of them.

The one that should catch the eye if you are looking to query easily is materialised paths: https://docs.mongodb.com/manual/tutorial/model-tree-structures-with-materialized-paths/

This is a very interesting method to build up trees since to query on the example you gave above into "Womens" in "Tennis" you could simply do a pre-fixed regex (which can use the index: http://docs.mongodb.org/manual/reference/operator/regex/ ) like so:

db.products.find({category: /^Sports,Tennis,Womens[,]/})

to find all products listed under a certain path of your tree.

Unfortunately this model is really bad at updating, if you move a category or change its name you have to update all products and there could be thousands of products under one category.

A better method would be to house a cat_id on the product and then separate the categories into a separate collection with the schema:

{
    _id: ObjectId(),
    name: 'Women\'s',
    path: 'Sports,Tennis,Womens',
    normed_name: 'all_special_chars_and_spaces_and_case_senstive_letters_taken_out_like_this'
}

So now your queries only involve the categories collection which should make them much smaller and more performant. The exception to this is when you delete a category, the products will still need touching.

So an example of changing "Tennis" to "Badmin":

db.categories.update({path:/^Sports,Tennis[,]/}).forEach(function(doc){
    doc.path = doc.path.replace(/,Tennis/, ",Badmin");
    db.categories.save(doc);
});

Unfortunately MongoDB provides no in-query document reflection at the moment so you do have to pull them out client side which is a little annoying, however hopefully it shouldn't result in too many categories being brought back.

And this is basically how it works really. It is a bit of a pain to update but the power of being able to query instantly on any path using an index is more fitting for your scenario I believe.

Of course the added benefit is that this schema is compatible with nested set models: http://en.wikipedia.org/wiki/Nested_set_model which I have found time and time again are just awesome for e-commerce sites, for example, Tennis might be under both "Sports" and "Leisure" and you want multiple paths depending on where the user came from.

The schema for materialised paths easily supports this by just adding another path, that simple.

Hope it makes sense, quite a long one there.

其他推荐答案

If all categories are distinct then think of them as tags. The hierarchy isn't necessary to encode in the items because you don't need them when you query for items. The hierarchy is a presentational thing. Tag each item with all the categories in it's path, so "Sport > Baseball > Shoes" could be saved as {..., categories: ["sport", "baseball", "shoes"], ...}. If you want all items in the "Sport" category, search for {categories: "sport"}, if you want just the shoes, search for {tags: "shoes"}.

This doesn't capture the hierarchy, but if you think about it that doesn't matter. If the categories are distinct, the hierarchy doesn't help you when you query for items. There will be no other "baseball", so when you search for that you will only get things below the "baseball" level in the hierarchy.

My suggestion relies on categories being distinct, and I guess they aren't in your current model. However, there's no reason why you can't make them distinct. You've probably chosen to use the strings you display on the page as category names in the database. If you instead use symbolic names like "sport" or "womens_shoes" and use a lookup table to find the string to display on the page (this will also save you hours of work if the name of a category ever changes -- and it will make translating the site easier, if you would ever need to do that) you can easily make sure that they are distinct because they don't have anything to do with what is displayed on the page. So if you have two "Shoes" in the hierarchy (for example "Tennis > Women's > Shoes" and "Tennis > Men's > Shoes") you can just add a qualifier to make them distinct (for example "womens_shoes" and "mens_shoes", or "tennis_womens_shoes") The symbolic names are arbitrary and can be anything, you could even use numbers and just use the next number in the sequence every time you add a category.