图形数据库和RDF triplestores:用python存储图形数据[英] Graph databases and RDF triplestores: storage of graph data in python

本文是小编为大家收集整理的关于图形数据库和RDF triplestores:用python存储图形数据的处理/解决方法,可以参考本文帮助大家快速定位并解决问题,中文翻译不准确的可切换到English标签页查看源文。

问题描述

我需要在Python中开发图形数据库(如果有人可以加入我的开发.我已经有一些代码,但是我很乐意讨论它).

>

我在互联网上进行了研究.在Java中, neo4j 是候选人,但我找不到有关实际磁盘存储的任何东西.在Python中,有许多图形数据模型(请参阅此Prep Pep提案,但没有一个满足我从磁盘中存储和检索的需求.

但是,我确实知道Triplestores. TripLestores基本上是RDF数据库,因此可以在RDF中映射图形数据模型并存储,但是我通常对此解决方案感到不安(主要是由于缺乏经验).一个例子是 sesame .事实是,在任何情况下,您都必须从内存图表示形式转换为RDF表示和Viceversa,除非客户端代码希望直接在RDF文档上破解,这大多是不太可能的.这就像直接处理DB元组,而不是创建对象.

目前,python中图数据的存储和检索( a la dbms)的最先进是什么?希望在对其感兴趣的人的帮助下以及与图形API PEP的提议者合作的帮助下开始开发实施措施是有意义的吗?请注意,这将是接下来几个月我工作的一部分,因此我对这个最终项目的贡献非常严重;)

编辑:还找到 directededge ,但它似乎是商业产品

推荐答案

我已经使用了 jena ,这是一个Java框架, Allograph (LISP,Java,Python Bindings). Jena拥有存储图形数据的姐妹项目,并且已经很长时间了.寓言仪非常好,并且有一个免费的版本,我想我会建议这样做,因为它易于安装,免费,快速,并且您很快就会起床.您从学习一点RDF和SPARQL中获得的力量很值得您一段时间.如果您已经知道SQL,那么您就可以开端了.能够使用SPARQL查询图形会为您带来一些巨大的好处.序列化至RDF三元组很容易,并且某些文件格式非常简单(例如,NT).我举一个例子.假设您有以下图形节点 - edge节点ID:

1 <- 2 -> 3
3 <- 4 -> 5

这些已经是主题谓词对象形式,因此只需在其上拍一些uri符号,将其加载到三重存储中,然后通过SPARQL查询Will.这里是NT格式:

<http://mycompany.com#1> <http://mycompany.com#2> <http://mycompany.com#3> .
<http://mycompany.com#3> <http://mycompany.com#4> <http://mycompany.com#5> .

现在查询所有节点的所有节点的两个节点1:

SELECT ?node
WHERE {
    <http://mycompany.com#1> ?p1 ?o1 .
    ?o1 ?p2 ?node .
}

这当然会产生< http://mycompany.com#5 >.

另一个候选人将是 mulgara ,用纯Java写成.由于您似乎对Python更感兴趣,尽管我认为您应该首先看题题.

其他推荐答案

我认为该解决方案实际上取决于您要在数据库中存储在磁盘/数据库中时要使用该图的方法,这在您的问题中还不清楚.但是,您可能希望考虑的几件事是:

  • 如果您只想在不使用RDBMS解决方案(例如酸)​​的任何功能或属性的情况下持续存在图形,那么如何将对象腌入平坦文件中呢?非常基本,但就像我说的那样,完全取决于您想要实现的目标.
  • zodb 是python的对象数据库(我想从Zope Project中脱离了我, ).我不能说我在高性能环境中有很多经验,但是限制了一些限制,您确实可以在本地存储Python对象.
  • 如果您想追求RDF,则有一个 rdf alchemy 帮助减轻您对从图形转换为RDF结构的一些担忧,我认为芝麻作为堆栈的一部分.

还有其他一些 persistence工具在Python网站上详细介绍,这可能是感兴趣的,这可能是一个很可能的但是,去年我花了很长时间去研究这个领域,最终我发现没有一个本地的Python解决方案满足我的要求.

我最成功的是将mySQL与自定义ORM一起使用,我在答案中发布了一些相关链接,以此问题的过程.此外,如果您想为RDBMS项目做出贡献,当我与Open查询中的某人交谈时,关于一个图形存储MySQL的引擎他们似乎有兴趣积极参与自己的项目.

对不起,我不能给出更明确的答案,但是我认为没有一个...如果您确实开始开发自己的实施,我很想了解您的方式继续.

其他推荐答案

您的Serius Cyber​​netics Intelligent Agent的问候!

一些有用的链接...

本文地址:https://www.itbaoku.cn/post/597462.html

问题描述

I need to develop a graph database in python (I would enjoy if anybody can join me in the development. I already have a bit of code, but I would gladly discuss about it).

I did my research on the internet. in Java, neo4j is a candidate, but I was not able to find anything about actual disk storage. In python, there are many graph data models (see this pre-PEP proposal, but none of them satisfy my need to store and retrieve from disk.

I do know about triplestores, however. triplestores are basically RDF databases, so a graph data model could be mapped in RDF and stored, but I am generally uneasy (mainly due to lack of experience) about this solution. One example is Sesame. Fact is that, in any case, you have to convert from in-memory graph representation to RDF representation and viceversa in any case, unless the client code wants to hack on the RDF document directly, which is mostly unlikely. It would be like handling DB tuples directly, instead of creating an object.

What is the state-of-the-art for storage and retrieval (a la DBMS) of graph data in python, at the moment? Would it make sense to start developing an implementation, hopefully with the help of someone interested in it, and in collaboration with the proposers for the Graph API PEP ? Please note that this is going to be part of my job for the next months, so my contribution to this eventual project is pretty damn serious ;)

Edit: Found also directededge, but it appears to be a commercial product

推荐答案

I have used both Jena, which is a Java framework, and Allegrograph (Lisp, Java, Python bindings). Jena has sister projects for storing graph data and has been around a long, long time. Allegrograph is quite good and has a free edition, I think I would suggest this cause it is easy to install, free, fast and you could be up and going in no time. The power you would get from learning a little RDF and SPARQL may very well be worth your while. If you know SQL already then you are off to a great start. Being able to query your graph using SPARQL would yield some great benefits to you. Serializing to RDF triples would be easy, and some of the file formats are super easy ( NT for instance ). I'll give an example. Lets say you have the following graph node-edge-node ids:

1 <- 2 -> 3
3 <- 4 -> 5

these are already subject predicate object form so just slap some URI notation on it, load it in the triple store and query at-will via SPARQL. Here it is in NT format:

<http://mycompany.com#1> <http://mycompany.com#2> <http://mycompany.com#3> .
<http://mycompany.com#3> <http://mycompany.com#4> <http://mycompany.com#5> .

Now query for all nodes two hops from node 1:

SELECT ?node
WHERE {
    <http://mycompany.com#1> ?p1 ?o1 .
    ?o1 ?p2 ?node .
}

This would of course yield <http://mycompany.com#5>.

Another candidate would be Mulgara, written in pure Java. Since you seem more interested in Python though I think you should take a look at Allegrograph first.

其他推荐答案

I think the solution really depends on exactly what it is you want to do with the graph once you have managed to store it on disk/in database, and this is a little unclear in your question. However, a couple of things you might wish to consider are:

  • if you just want to persist the graph without using any of the features or properties you might expect from an rdbms solution (such as ACID), then how about just pickling the objects into a flat file? Very rudimentary, but like I say, depends on exactly what you want to achieve.
  • ZODB is an object database for Python (a spin off from the Zope project I think). I can't say I've had much experience of it in a high performance environment, but bar a few restrictions does allow you to store Python objects natively.
  • if you wish to pursue RDF, there is an RDF Alchemy project which might help to alleviate some of your concerns about converting from your graph to RDF structures and I think has Sesame as part of it's stack.

There are some other persistence tools detailed on the python site which may be of interest, however I spent quite a while looking into this area last year, and ultimately I found there wasn't a native Python solution that met my requirements.

The most success I had was using MySQL with a custom ORM and I posted a couple of relevant links in an answer to this question. Additionally, if you want to contribute to an RDBMS project, when I spoke to someone from Open Query about a Graph storage engine for MySQL them seemed interested in getting active participation in their project.

Sorry I can't give a more definitive answer, but I don't think there is one... If you do start developing your own implementation, I'd be interested to keep up-to-date with how you get on.

其他推荐答案

Greetings from your Serius Cybernetics Intelligent Agent!

Some useful links...