Spark、OrientDB 整合——图计算应该这么玩

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/yitengtongweishi/article/details/81607364

友情提示图数据库 OrientDB 安装 及 初步使用

额外的Jar包

  • OrientDB JDBC 驱动

https://orientdb.com/download-2/,单击此链接,如下图所示
这里写图片描述
下载矩形方框中的驱动。

  • spark-orientdb

https://dl.bintray.com/sbcd90/org.apache.spark/org/apache/spark/,点击此链接,下载特定版本的Jar包。参考链接

至于,如何在 IDEA 中搭建 Spark 开发环境,我觉得就没必要废话了。下面老规矩,直接上代码。

Spark DataFrame 写入 OrientDB

    val spark = SparkSession.builder().appName("SparkOrientDB").getOrCreate()
    import spark.implicits._
    import spark.sql

    // Vertex DataFrame
    spark.createDataFrame(List(
      ("a", "Alice", 34),
      ("b", "Bob", 36),
      ("c", "Charlie", 30),
      ("d", "David", 29),
      ("e", "Esther", 32),
      ("f", "Fanny", 36),
      ("g", "Gabby", 60)
    )).toDF("id", "name", "age")
      .write.format("org.apache.spark.orientdb.graphs")
      .option("dburl", "remote:localhost/graphdb")
      .option("user", "root")
      .option("password", "root")
      .option("vertextype", "Vgraphx")
      .mode("overwrite")
      .save()

    // Edge DataFrame
    spark.createDataFrame(List(
      ("a", "b", "friend"),
      ("b", "c", "follow"),
      ("c", "b", "follow"),
      ("f", "c", "follow"),
      ("e", "f", "follow"),
      ("e", "d", "friend"),
      ("d", "a", "friend"),
      ("a", "e", "friend")
    )).toDF("src", "dst", "relationship")
      .write.format("org.apache.spark.orientdb.graphs")
      .option("dburl", "remote:localhost/graphdb")
      .option("user", "root")
      .option("password", "root")
      .option("vertextype", "Vgraphx")
      .option("edgetype", "Egraphx")
      .mode("overwrite")
      .save()

单击 http://localhost:2480,查询 写入的顶点和边。如下图所示,
这里写图片描述

Spark 读取 OrientDB 返回 DataFrame

    val vertices = spark.read
      .format("org.apache.spark.orientdb.graphs")
      .option("dburl", "remote:localhost/graphdb")
      .option("user", "root")
      .option("password", "root")
      .option("vertextype", "Vgraphx")
      .load()

    val edges = spark.read
      .format("org.apache.spark.orientdb.graphs")
      .option("dburl", "remote:localhost/graphdb")
      .option("user", "root")
      .option("password", "root")
      .option("edgetype", "Egraphx")
      .load()

    val g = GraphFrame(vertices, edges)

顶点输出如下,

g.vertices.show(false)

+-------+---+---+
|name   |id |age|
+-------+---+---+
|Bob    |b  |36 |
|David  |d  |29 |
|Charlie|c  |30 |
|Esther |e  |32 |
|Fanny  |f  |36 |
|Gabby  |g  |60 |
|Alice  |a  |34 |
+-------+---+---+

边的输出如下,

g.edges.show(false)

+---+------------+---+
|dst|relationship|src|
+---+------------+---+
|c  |follow      |b  |
|b  |follow      |c  |
|f  |follow      |e  |
|a  |friend      |d  |
|c  |follow      |f  |
|d  |friend      |e  |
|b  |friend      |a  |
|e  |friend      |a  |
+---+------------+---+

友情链接1

友情链接2

友情链接3

猜你喜欢

转载自blog.csdn.net/yitengtongweishi/article/details/81607364