以适度可扩展的方式交付活动馈赠项目[英] Delivering activity feed items in a moderately scalable way

本文是小编为大家收集整理的关于以适度可扩展的方式交付活动馈赠项目的处理/解决方法,可以参考本文帮助大家快速定位并解决问题,中文翻译不准确的可切换到English标签页查看源文。

问题描述

我正在处理的应用程序有一个活动提要,每个用户都可以看到他们的朋友的活动(就像Facebook一样).我正在寻找一种适度可扩展的方式来展示给定用户的活动流.我之所以说"中度",是因为我想只使用数据库(PostgreSQL)和也许 memcached来做到这一点.例如,我希望该解决方案扩展到每人100个朋友的200k用户.

当前,有一个主活动表可以存储给定活动的HTML(吉姆添加了一个朋友,乔治安装了一个应用程序等).此主活动表保留源用户,HTML和时间戳.

然后,有一个单独的('join')表仅保留一个应该在他们的朋友供稿中看到此活动的人,以及在主活动表中的对象的指针.

所以,如果我有100个朋友,并且我做了3个活动,那么加入表将增长到300个项目.

显然,这张桌子会很快增长.不过,它具有不错的属性,以显示向用户展示的提取活动需要一个(相对)廉价的查询.

另一个选择是仅保留主活动表,并通过说出类似的话来查询它:

select * from activity where source_user in (1, 2, 44, 2423, ... my friend list)

这是一个缺点,即您要查询可能永远不会活跃的用户,并且随着您的朋友列表的增长,此查询会变得越来越慢.

我看到了双方的利弊,但是我想知道有些人是否可以帮助我权衡这些选择并提出一种方式,否则他们是另一种方法.我也向其他解决方案开放,尽管我想保持简单,而不是安装诸如couchdb之类的东西.

.

非常感谢!

推荐答案

我倾向于只拥有主活动表.如果您这样做,这就是我考虑实施的方法:

  1. 您可以创建多个活动表,并在从数据库中获取数据时进行所有联合.例如,在每月 - activity_2010_02等上滚动它们.仅按照您的示例 - 200k用户x 100个朋友x 3活动= 6000万行.对于PostgreSQL而言,这并不是一个关注的性能,但是您可能纯粹是为了方便起来,最终为了轻松的未来扩张而考虑.

  2. 这是一个缺点,即您要查询可能永远不会活跃的用户,并且随着您的朋友列表的增长,此查询会变得越来越慢.

您是否要显示全部活动提要,回到开始的开始?您没有在原始问题中提供太多细节,但是我会害怕猜测您将显示最后10/20/100的物品按时间戳排序.几个索引和限制条款应该足以提供即时响应(因为我刚刚在一张约2000万行的表上测试了).它可以在繁忙的服务器上较慢,但这应该使用硬件和缓存解决方案来解决,Postgres不会是那里的瓶颈.

即使您确实提供了返回时间到时间的活动提要, paginate 输出!限制条款将为您节省.如果对其限制的基本查询不够>和然后提供朋友ID的列表:

select * from activity 
  where ts <= 123456789 
    and source_user in (1, 2, 44, 2423, ... my friend list)

如果您有几个月或数年前的桌子跨度,则对朋友ID的搜索仅在第一个条款中选择的行中执行.

.

那就是我在您现在正在考虑的两个解决方案之间进行选择.我也会看一下:

  1. 重新考虑您对桌子的不平化.存储预生产的HTML输出真的是最好的方法吗?通过查找活动表并即时生成模板输出,您会更好地表现性能吗?预先生成的HTML一开始似乎会更好,但是考虑磁盘存储,API,未来布局更改和存储HTML之类的东西毕竟可能并不那么吸引人.查找表可以包含您可能的活动 - 添加了朋友,更改状态等,如果其他用户参与活动,则活动日志将引用该活动和朋友的ID.

  2. 做 预先生成的HTML,但不将其存储在数据库中.将这些内容保存在磁盘上,作为预生成页面.但是,这不是银弹,很大程度上取决于您网站上的书面阅读比率. IE.公共论坛上的一个典型讨论线程可能会有十几个消息,但可以观看数百次 - 是缓存的好候选人.而如果您的应用程序更加调整到即时状态更新,则必须再生HTML页面并在每几个视图后再次将其保存在磁盘上,那么此方法的价值很小.

希望这会有所帮助.

本文地址:https://www.itbaoku.cn/post/597435.html

问题描述

The application I'm working on has an activity feed where each user can see their friends' activity (much like Facebook). I'm looking for a moderately scalable way to show a given users' activity stream on the fly. I say 'moderately' because I'm looking to do this with just a database (Postgresql) and maybe memcached. For instance, I want this solution to scale to 200k users each with 100 friends.

Currently, there is a master activity table that stores the rendered html for the given activity (Jim added a friend, George installed an application, etc.). This master activity table keeps the source user, the html, and a timestamp.

Then, there's a separate ('join') table that simply keeps a pointer to the person who should see this activity in their friend feed, and a pointer to the object in the main activity table.

So, if I have 100 friends, and I do 3 activities, then the join table will then grow to 300 items.

Clearly this table will grow very quickly. It has the nice property, though, that fetching activity to show to a user takes a single (relatively) inexpensive query.

The other option is to just keep the main activity table and query it by saying something like:

select * from activity where source_user in (1, 2, 44, 2423, ... my friend list)

This has the disadvantage that you're querying for users who may never be active, and as your friend list grows, this query can get slower and slower.

I see the pros and the cons of both sides, but I'm wondering if some SO folks might help me weigh the options and suggest one way or they other. I'm also open to other solutions, though I'd like to keep it simple and not install something like CouchDB, etc.

Many thanks!

推荐答案

I'm leaning towards just having the master activity table. If you go with that, this is what I would consider implementing:

  1. You can create several activity tables and do a UNION ALL when fetching the data from the database. For example, roll them over monthly - activity_2010_02, etc. Just going by your example - 200K users x 100 friends x 3 activities = 60 million rows. Not a concern performance-wise for PostgreSQL, but you might consider this purely for convenience now and eventually for effortless future expansion.

  2. This has the disadvantage that you're querying for users who may never be active, and as your friend list grows, this query can get slower and slower.

Are you going to display the entire activity feed, going back to the beginning of times? You haven't provided much detail in the original question but I'd hazard a guess that you'd be showing the last 10/20/100 items sorted by time stamp. A couple of indexes and the LIMIT clause should be enough to provide an instant response (as I've just tested on a table with about 20 million rows). It can be slower on a busy server, but that is something that should be worked out with hardware and caching solutions, Postgres is not going to be the bottleneck there.

Even if you do provide activity feeds going back to the dawn of time, paginate the output! The LIMIT clause will save you there. If the basic query with a LIMIT on it is not enough, or if your users have a long tail of friends that are no longer active, you could consider limiting the lookup to the last day/week/month first and then provide the list of friend ids:

select * from activity 
  where ts <= 123456789 
    and source_user in (1, 2, 44, 2423, ... my friend list)

If you've got a table spanning months or years back, the search for the friends ids will only be performed within the rows selected by the first WHERE clause.

That's just if I choose between the two solutions you are considering now. I would also look at things like:

  1. Reconsidering your denormalisation of the table. Is storing pre-generated HTML output really the best way? Will you be better off performance-wise by having a lookup table of activities instead and generating templated output on the fly? Pre-generated HTML can seem better at the outset, but consider things like disk storage, APIs, future layout changes and storing HTML may not be that attractive after all. The lookup table could contain your possible activities - added a friend, changed status, etc., and the activity log would reference that and the friend's id if another user is involved in the activity.

  2. Doing pre-generate HTML, but not storing it in the database. Save the stuff on disk as pre-generated pages. This is not a silver bullet, however, and largely depends on the ratio of write-to-reads on your site. I.e. a typical discussion thread on a public forum could have a dozen messages, but could be viewed hundreds of times - a good candidate for caching. Whereas if your application is more tuned to immediate status updates and you'd have to regenerate the HTML page and save it again on disk after every couple of views, then there's little value in this approach.

Hope this helps.