如何有效地实现LinkedIn的 "你是如何连接的 "功能?[英] Efficient way to implement LinkedIn like "How you are connected to" feature?

本文是小编为大家收集整理的关于如何有效地实现LinkedIn的 "你是如何连接的 "功能?的处理/解决方法,可以参考本文帮助大家快速定位并解决问题,中文翻译不准确的可切换到English标签页查看源文。

问题描述

LinkedIn具有此很酷的功能,在访问某些用户的个人资料时,LinkedIn提示您如何通过网络连接该用户.

假设访问者和配置文件所有者是图表的两个节点,节点代表用户,而边缘代表友谊,那么一个简单的解决方案可以是从两个节点开始至一定级别的BFS,看看是否有任何交叉点.交集将是网络链接节点.

尽管这听起来很整洁,但问题在于,为了确定每个人的朋友,需要单独的DB查询.当网络深度超过2个级别时,它将是非常耗时的算法.有更好的替代方案吗?如果没有,我们如何添加更好的硬件支持(并行计算,网格,分布式数据库等),以降低计算所需的时间?

推荐答案

您可以在文章 rdbms 为此表现良好.我写了一篇文章,讲述了如何使用本机图数据库进行与上述文章相同的事情,在这种情况下, neo4j : database中的社交网络:图数据库.除了性能的差异外,图形数据库还通过提供图形API来简化任务,该图形API易于处理在SQL中写入非常复杂的遍历(或通过使用存储的过程).我在中写了更多信息. =" https://stackoverflow.com/questions/1000162/has-anyone-used-graph-databases-databases-http-neo4j-org">这是一个.

其他推荐答案

没有某种递归存储过程(SQL Server 2005+中的CTE),随着级别的深度,您需要多次往返行程.但是,良好的缓存基础架构确实可以帮助性能,因为最受欢迎的/活动用户的连接列表将保持缓存.通过缓存机制进行读取/写入会使情况变得更好(缓存更新到数据库更新的级联,缓存读取cascade to db reads)

)

本文地址:https://www.itbaoku.cn/post/597480.html

问题描述

LinkedIn has this cool feature in which while visiting some user's profile, LinkedIn prompts how you are connecting to that user through the network.

Assuming that the visitor and the profile owner are two nodes of a graph where the nodes represent users and edge represents friendship, a simple solution could be a bfs starting from both the nodes up to certain level and see if there are any intersections. The intersections would be the network link-nodes.

Although this sounds neat, the problem is that in order to determine friends of each person, a separate DB query is needed. When the network goes deeper than 2 levels, it would be highly time consuming algorithm. Is there a better efficient alternative? If not, how can we add better hardware support (parallel computing, grids, distributed database etc) in order to bring down the time required for computation?

推荐答案

You can see how this can be done in the article Graphs in the database: SQL meets social networks by Lorenzo Alberton. The example code is written for PostgreSQL using CTEs. However, I doubt that using a RDBMS for this will perform well. I wrote up an article on how to do the same stuff as in the mentioned article using a native graph database, in this case Neo4j: Social networks in the database: using a graph database. Other than the differences in performance, a graph database also simplifies the task by providing a graph API that makes it easy to handle traversals that would be extremely complex to write in SQL (or by using stored procedures). I wrote a bit more on graph databases in this thread and see this one too.

其他推荐答案

Without some kind of recursive stored procedure (CTE in SQL Server 2005+), you'll need multiple round trips as the levels get deeper. However, a good cache infrastructure could really help performance as the most popular / active users' connection lists would remain cached. A read/write through cache mechanism would make things even better (cache updates cascade to db updates, cache reads cascade to db reads)