如何查询DynamoDB?[英] How do you query DynamoDB?

本文是小编为大家收集整理的关于如何查询DynamoDB?的处理/解决方法,可以参考本文帮助大家快速定位并解决问题,中文翻译不准确的可切换到English标签页查看源文。

问题描述

我正在查看Amazon的DynamoDB,因为它似乎消除了维护和缩放数据库服务器的所有麻烦.我目前正在使用mySQL,并且维护和缩放数据库是一个完整的头痛.

我已经浏览了文档,我很难缠绕如何构建数据,以便它可以轻松检索.

我是NOSQL和非关系数据库的新手.

从Dynamo文档中,听起来您只能在主哈希键上查询表,而主范围键则具有有限数量的比较运算符.

或者您可以运行完整的表扫描并将过滤器应用于它.捕获是它一次只会扫描1MB,因此您可能必须重复扫描才能查找x的结果数.

我意识到这些局限性使他们能够提供可预测的性能,但似乎使您的数据非常困难.并且执行全表扫描似乎看起来确实会效率低下,并且随着桌子的增长,随着时间的流逝而变得效率降低.

例如,说我有一个flickr克隆.我的图像表可能看起来像:

  • 图像ID(数字,主要哈希键)
  • 添加日期(数字,主要范围键)
  • 用户ID(字符串)
  • 标签(字符串集)
  • etc

因此,使用查询,我可以列出过去7天中的所有图像,并将其限制为x结果的数量.

但是,如果我想列出来自特定用户的所有图像,我需要进行全表扫描并通过用户名进行过滤.标签也一样.

并且由于您一次只能扫描1MB,因此您可能需要进行多次扫描才能查找X数量的图像数.我也看不到一种轻松停止x X次数的方法.如果您想抓住30张图像,您的第一次扫描可能会发现5,而第二张可能会发现40.

我有这个权利吗?基本上是权衡的吗?您将获得非常快速的可预测数据库性能,实际上是无维护的.但是权衡取舍,您需要建立更多的逻辑来处理结果?

还是我在这里完全不在基地?

推荐答案

是的,您对性能和查询灵活性之间的权衡是正确的.

但是有一些减轻疼痛的技巧 - 次要索引/贬低可能是最重要的.

例如,您将在用户ID上键入另一个表格,例如列出所有图像.添加图像时,您会更新此表,并在图像ID上键入的表中添加一行.

您必须确定所需的查询,然后设计围绕它们的数据模型.

其他推荐答案

我认为您需要创建自己的辅助索引,使用另一个表.

此表"架构"可能是:

    User ID (String, Primary Key)
    Date Added (Number, Range Key)
    Image ID (Number)

-

这样,您可以通过用户ID查询并按日期进行过滤

其他推荐答案

您可以使用 复合哈希范围键 作为主要索引.

来自DynamoDB页面:

主键可以是单属性哈希键,也可以是复合键 哈希范围键.单个属性哈希主键可能是 例如"用户ID".这将使您能够快速阅读和写数据 对于与给定用户ID相关的项目.

复合哈希范围键被索引为哈希密钥元素和一个 范围关键要素.该多部分密钥在 第一个和第二个元素值.例如,复合材料 哈希范围键可能是" userId"(哈希)和 "时间戳"(范围).保持哈希关键元素常数,您可以 搜索范围关键元素以检索项目. 这会 允许您使用查询API来检索所有项目 跨各种时间戳的单个用户ID.

本文地址:https://www.itbaoku.cn/post/597442.html

问题描述

I'm looking at Amazon's DynamoDB as it looks like it takes away all of the hassle of maintaining and scaling your database server. I'm currently using MySQL, and maintaining and scaling the database is a complete headache.

I've gone through the documentation and I'm having a hard time trying to wrap my head around how you would structure your data so it could be easily retrieved.

I'm totally new to NoSQL and non-relational databases.

From the Dynamo documentation it sounds like you can only query a table on the primary hash key, and the primary range key with a limited number of comparison operators.

Or you can run a full table scan and apply a filter to it. The catch is that it will only scan 1Mb at a time, so you'd likely have to repeat your scan to find X number of results.

I realize these limitations allow them to provide predictable performance, but it seems like it makes it really difficult to get your data out. And performing full table scans seems like it would be really inefficient, and would only become less efficient over time as your table grows.

For Instance, say I have a Flickr clone. My Images table might look something like:

  • Image ID (Number, Primary Hash Key)
  • Date Added (Number, Primary Range Key)
  • User ID (String)
  • Tags (String Set)
  • etc

So using query I would be able to list all images from the last 7 days and limit it to X number of results pretty easily.

But if I wanted to list all images from a particular user I would need to do a full table scan and filter by username. Same would go for tags.

And because you can only scan 1Mb at a time you may need to do multiple scans to find X number of images. I also don't see a way to easily stop at X number of images. If you're trying to grab 30 images, your first scan might find 5, and your second may find 40.

Do I have this right? Is it basically a trade-off? You get really fast predictable database performance that is virtually maintenance free. But the trade-off is that you need to build way more logic to deal with the results?

Or am I totally off base here?

推荐答案

Yes, you are correct about the trade-off between performance and query flexibility.

But there are a few tricks to reduce the pain - secondary indexes/denormalising probably being the most important.

You would have another table keyed on user ID, listing all their images, for example. When you add an image, you update this table as well as adding a row to the table keyed on image ID.

You have to decide what queries you need, then design the data model around them.

其他推荐答案

I think you need create your own secondary index, using another table.

This table "schema" could be:

    User ID (String, Primary Key)
    Date Added (Number, Range Key)
    Image ID (Number)

--

That way you can query by User ID and filter by Date as well

其他推荐答案

You can use composite hash-range key as primary index.

From the DynamoDB Page:

A primary key can either be a single-attribute hash key or a composite hash-range key. A single attribute hash primary key could be, for example, “UserID”. This would allow you to quickly read and write data for an item associated with a given user ID.

A composite hash-range key is indexed as a hash key element and a range key element. This multi-part key maintains a hierarchy between the first and second element values. For example, a composite hash-range key could be a combination of “UserID” (hash) and “Timestamp” (range). Holding the hash key element constant, you can search across the range key element to retrieve items. This would allow you to use the Query API to, for example, retrieve all items for a single UserID across a range of timestamps.