问题描述
我正在开发自己的社交网络,但我还没有在实现用户操作流的网络示例中找到……例如,如何为每个用户过滤操作?如何存储动作事件?我可以将哪些数据模型和对象模型用于操作流以及ITSELVES的操作?
推荐答案
摘要:对于大约100万活跃用户和1.5亿存储的活动,我保持简单:
- 使用关系数据库存储独特的活动(每次活动/"发生的事情")使记录尽可能紧凑.结构,以便您可以快速通过活动ID或使用一组带有时间限制的朋友ID进行一批活动.
- 每当创建活动记录时,将活动ID发布给REDIS,将ID添加到每个应该看到活动的朋友/订户的用户的"活动流"列表中.
查询Redis以获取任何用户的活动流,然后根据需要从数据库中获取相关数据.如果用户需要及时浏览(如果您提供此信息),请按时间查询数据库
我使用一张普通的旧MySQL表进行处理约1500万个活动.
看起来像这样:
id user_id (int) activity_type (tinyint) source_id (int) parent_id (int) parent_type (tinyint) time (datetime but a smaller type like int would be better)
activity_type告诉我活动的类型,source_id告诉我活动与之相关的记录.因此,如果活动类型的意思是"添加的喜爱",那么我知道源_id是指喜欢的记录的ID.
parent_id/parent_type对我的应用程序很有用 - 他们告诉我活动与什么相关.如果一本书受到了青睐,则parent_id/parent_type会告诉我,该活动与给定的主键(ID)
的书(类型)有关i在(user_id, time)上索引,并查询user_id IN (...friends...) AND time > some-cutoff-point的活动.抛弃ID并选择其他聚类索引可能是个好主意 - 我没有尝试过.
非常基本的东西,但是它可以工作,很简单,并且随着您的需求而易于使用.另外,如果您不使用MySQL,则可以做得更好.
为了更快地访问最近的活动,我一直在尝试 redis . REDIS将其所有数据存储在内存中,因此您不能将所有活动都放入其中,但是您可以为网站上的大多数常见屏幕存储足够的存储.每个用户或类似用户的最新100个.将Redis混合在一起,它可能会这样工作:
- 创建您的MySQL活动记录
- 对于创建活动的用户的每个朋友,将ID推到REDIS的活动列表上.
- 将每个列表修剪到最后一个X项目
redis很快,并提供了一种通过一个连接进行管道命令的方法 - 因此,将活动推向1000个朋友,毫秒毫秒.
有关我在说什么的更详细说明,请参见Redis的Twitter示例: http://redis.io/topics/twitter-clone
2011年2月更新目前我有5000万个活动活动,但我没有改变任何事情.做类似的事情的一件好事是,它使用紧凑的小行.我计划进行一些更改,这些更改将涉及更多活动和更多这些活动的查询,我肯定会使用Redis来保持速度迅速.我在其他领域使用了Redis,它确实可以解决某些问题.
2014年7月更新我们每月大约有70万个活跃用户.在过去的几年中,我一直使用Redis(如项目符号列表中所述)为每个用户存储最后1000个活动ID.该系统中通常大约有1亿个活动记录,它们仍存储在MySQL中,并且仍然是相同的布局.这些记录让我们以更少的重新记忆逃脱,它们是活动数据的记录,如果用户需要及时及时页面查找某些内容,我们会使用它们.
这不是一个聪明或特别有趣的解决方案,但它已经为我服务了.
其他推荐答案
这是我使用MySQL的活动流的实现. 共有三个类:活动,活动馈送,订阅者.
活动代表一个活动条目,其表格看起来像这样:
id subject_id object_id type verb data time
Subject_id是执行操作的对象的ID,object_id接收操作的对象的ID. type和verb描述操作本身(例如,如果用户在文章中添加评论,他们将分别为"注释"和"创建"),数据包含其他数据以避免加入(例如,它可以包含主题名称和姓氏,文章标题和URL,评论主体等).
每个活动都属于一个或多个活动源,它们与看起来这样的表相关:
feed_name activity_id
在我的应用程序中,我为每个用户提供一个供稿,每个项目(通常是博客文章)都有一个供稿,但是它们可以是您想要的.
订户通常是您网站的用户,但它也可以是对象模型中的任何对象(例如,可以将文章订阅其创建者的feed_action).
每个订户属于一个或多个活动源,就像上面一样,它们与此类链接表相关:
feed_name subscriber_id reason
此处的reason字段解释了订户为何已订阅了提要.例如,如果用户为博客文章添加书签,则原因是"书签".这有助于我稍后对用户通知的过滤操作.
要检索订户的活动,我可以简单地加入三张表.加入很快,因为我选择了一些活动,这要归功于现在的WHERE状态 - time > some hours.由于活动表中的数据字段.
,我避免了其他加入.reason字段上的进一步说明.例如,如果我想过滤给用户的电子邮件通知的操作,并且用户为博客添加了一篇博客文章(因此他以"书签"的原因订阅了帖子提要),我不希望用户收到用户关于该项目操作的电子邮件通知,如果他评论帖子(因此它会以"评论"的理由订阅帖子提要),我希望当其他用户在同一帖子中添加评论时,我会收到通知他.原因领域在这种歧视中有助于我(我通过ActivityFilter类实施)以及用户的通知偏好.
其他推荐答案
目前有一个针对活动流的格式,该格式正在由一群知识渊博的人开发.
基本上,每个活动都有一个参与者(执行活动),动词(活动的动作),一个对象(演员在其上执行)和目标.
例如:麦克斯已经发布了指向亚当墙的链接.
他们的JSON规格在编写时已达到1.0版,这显示了您可以应用的活动的模式.
他们的格式已经被BBC,GNIP,Google Buzz Gowalla,IBM,MySpace,Opera,Socialcast,SuperFeedr,Typepad,Windows Live,Yiid等采用.
问题描述
I'm developing my own social network, and I haven't found on the web examples of implementation the stream of users' actions... For example, how to filter actions for each users? How to store the action events? Which data model and object model can I use for the actions stream and for the actions itselves?
推荐答案
Summary: For about 1 million active users and 150 million stored activities, I keep it simple:
- Use a relational database for storage of unique activities (1 record per activity / "thing that happened") Make the records as compact as you can. Structure so that you can quickly grab a batch of activities by activity ID or by using a set of friend IDs with time constraints.
- Publish the activity IDs to Redis whenever an activity record is created, adding the ID to an "activity stream" list for every user who is a friend/subscriber that should see the activity.
Query Redis to get the activity stream for any user and then grab the related data from the db as needed. Fall back to querying the db by time if the user needs to browse far back in time (if you even offer this)
I use a plain old MySQL table for dealing with about 15 million activities.
It looks something like this:
id user_id (int) activity_type (tinyint) source_id (int) parent_id (int) parent_type (tinyint) time (datetime but a smaller type like int would be better)
activity_type tells me the type of activity, source_id tells me the record that the activity is related to. So if the activity type means "added favorite" then I know that the source_id refers to the ID of a favorite record.
The parent_id/parent_type are useful for my app - they tell me what the activity is related to. If a book was favorited, then parent_id/parent_type would tell me that the activity relates to a book (type) with a given primary key (id)
I index on (user_id, time) and query for activities that are user_id IN (...friends...) AND time > some-cutoff-point. Ditching the id and choosing a different clustered index might be a good idea - I haven't experimented with that.
Pretty basic stuff, but it works, it's simple, and it is easy to work with as your needs change. Also, if you aren't using MySQL you might be able to do better index-wise.
For faster access to the most recent activities, I've been experimenting with Redis. Redis stores all of its data in-memory, so you can't put all of your activities in there, but you could store enough for most of the commonly-hit screens on your site. The most recent 100 for each user or something like that. With Redis in the mix, it might work like this:
- Create your MySQL activity record
- For each friend of the user who created the activity, push the ID onto their activity list in Redis.
- Trim each list to the last X items
Redis is fast and offers a way to pipeline commands across one connection - so pushing an activity out to 1000 friends takes milliseconds.
For a more detailed explanation of what I am talking about, see Redis' Twitter example: http://redis.io/topics/twitter-clone
Update February 2011 I've got 50 million active activities at the moment and I haven't changed anything. One nice thing about doing something similar to this is that it uses compact, small rows. I am planning on making some changes that would involve many more activities and more queries of those activities and I will definitely be using Redis to keep things speedy. I'm using Redis in other areas and it really works well for certain kinds of problems.
Update July 2014 We're up to about 700K monthly active users. For the last couple years, I've been using Redis (as described in the bulleted list) for storing the last 1000 activity IDs for each user. There are usually about 100 million activity records in the system and they are still stored in MySQL and are still the same layout. These records let us get away with less Redis memory, they serve as the record of activity data, and we use them if users need to page further back in time to find something.
This wasn't a clever or especially interesting solution but it has served me well.
其他推荐答案
This is my implementation of an activity stream, using mysql. There are three classes: Activity, ActivityFeed, Subscriber.
Activity represents an activity entry, and its table looks like this:
id subject_id object_id type verb data time
Subject_id is the id of the object performing the action, object_id the id of the object that receives the action. type and verb describes the action itself (for example, if a user add a comment to an article they would be "comment" and "created" respectively), data contains additional data in order to avoid joins (for example, it can contain the subject name and surname, the article title and url, the comment body etc.).
Each Activity belongs to one or more ActivityFeeds, and they are related by a table that looks like this:
feed_name activity_id
In my application I have one feed for each User and one feed for each Item (usually blog articles), but they can be whatever you want.
A Subscriber is usually an user of your site, but it can also be any object in your object model (for example an article could be subscribed to the feed_action of his creator).
Every Subscriber belongs to one or more ActivityFeeds, and, like above, they are related by a link table of this kind:
feed_name subscriber_id reason
The reason field here explains why the subscriber has subscribed the feed. For example, if a user bookmark a blog post, the reason is 'bookmark'. This helps me later in filtering actions for notifications to the users.
To retrieve the activity for a subscriber, I do a simple join of the three tables. The join is fast because I select few activities thanks to a WHERE condition that looks like now - time > some hours. I avoid other joins thanks to data field in Activity table.
Further explanation on reason field. If, for example, I want to filter actions for email notifications to the user, and the user bookmarked a blog post (and so he subscribes to the post feed with the reason 'bookmark'), I don't want that the user receives email notifications about actions on that item, while if he comments the post (and so it subscribes to the post feed with reason 'comment') I want he is notified when other users add comments to the same post. The reason field helps me in this discrimination (I implemented it through an ActivityFilter class), together with the notifications preferences of the user.
其他推荐答案
There is a current format for activity stream that is being developed by a bunch of well-know people.
Basically, every activity has an actor (who performs the activity), a verb (the action of the activity), an object (on which the actor performs on), and a target.
For example: Max has posted a link to Adam's wall.
Their JSON's Spec has reached version 1.0 at the time of writing, which shows the pattern for the activity that you can apply.
Their format has already been adopted by BBC, Gnip, Google Buzz Gowalla, IBM, MySpace, Opera, Socialcast, Superfeedr, TypePad, Windows Live, YIID, and many others.