SparkSQL编程之RDD与DateFrame转换

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/m0_37294838/article/details/90199088

RDD转换为DataFrame

注意:如果需要RDD与DF或者DS之间操作,那么都需要引入 import spark.implicits._  spark不是包名,而是sparkSession对象的名称

前置条件:导入隐式转换并创建一个RDD

scala> import spark.implicits._

import spark.implicits._

scala> val peopleRDD = sc.textFile("examples/src/main/resources/people.txt")

peopleRDD: org.apache.spark.rdd.RDD[String] = examples/src/main/resources/people.txt MapPartitionsRDD[3] at textFile at <console>:27

1)通过手动确定转换

scala> peopleRDD.map{x=>val para = x.split(",");(para(0),para(1).trim.toInt)}.toDF("name","age")

res1: org.apache.spark.sql.DataFrame = [name: string, age: int]

2)通过反射确定(需要用到样例类)

(1)创建一个样例类

scala> case class People(name:String, age:Int)

(2)根据样例类将RDD转换为DataFrame

scala> peopleRDD.map{ x => val para = x.split(",");People(para(0),para(1).trim.toInt)}.toDF
res2: org.apache.spark.sql.DataFrame = [name: string, age: int]

 

DateFrame转换为RDD

直接调用rdd即可

1)创建一个DataFrame

scala> val df = spark.read.json("/opt/module/spark/examples/src/main/resources/people.json")

df: org.apache.spark.sql.DataFrame = [age: bigint, name: string]

2)将DataFrame转换为RDD

scala> val dfToRDD = df.rdd

dfToRDD: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = MapPartitionsRDD[19] at rdd at <console>:29

3)打印RDD

scala> dfToRDD.collect

res13: Array[org.apache.spark.sql.Row] = Array([Michael, 29], [Andy, 30], [Justin, 19])

猜你喜欢

转载自blog.csdn.net/m0_37294838/article/details/90199088