dataframe
scala> teenagersDF
res14: org.apache.spark.sql.DataFrame = [name: string, age: bigint]
scala> teenagersDF.
!= flatMap repartition
## foreach rollup
+ foreachPartition sample
-> formatted schema
== getClass select
agg groupBy selectExpr
alias groupByKey show
apply hashCode sort
as head sortWithinPartitions
asInstanceOf inputFiles sparkSession
cache intersect sqlContext
coalesce isInstanceOf stat
col isLocal synchronized
collect isStreaming take
collectAsList javaRDD takeAsList
columns join toDF
count joinWith toJSON
createOrReplaceTempView limit toJavaRDD
createTempView map toLocalIterator
cube mapPartitions toString
describe na transform
distinct ne union
drop notify unionAll
dropDuplicates notifyAll unpersist
dtypes orderBy wait
ensuring persist where
eq printSchema withColumn
equals queryExecution withColumnRenamed
except randomSplit write
explain randomSplitAsList writeStream
explode rdd →
filter reduce
first registerTempTable
dataset
In the Scala API, DataFrame
is simply a type alias of Dataset[Row]
val df = spark.read.json("examples/src/main/resources/people.json")
df
res13: org.apache.spark.sql.DataFrame = [age: bigint, name: string]
scala> df.
agg foreachPartition sample
alias groupBy schema
apply groupByKey select
as head selectExpr
cache inputFiles show
coalesce intersect sort
col isLocal sortWithinPartitions
collect isStreaming sparkSession
collectAsList javaRDD sqlContext
columns join stat
count joinWith take
createOrReplaceTempView limit takeAsList
createTempView map toDF
cube mapPartitions toJSON
describe na toJavaRDD
distinct orderBy toLocalIterator
drop persist toString
dropDuplicates printSchema transform
dtypes queryExecution union
except randomSplit unionAll
explain randomSplitAsList unpersist
explode rdd where
filter reduce withColumn
first registerTempTable withColumnRenamed
flatMap repartition write
foreach rollup writeStream
两者 对象类型一样,但是 ,所拥有的方法并不是完全一样?