[Spark基础]-- Spark 内置数据源 options 名称

在 Spark-2.1.0 以后支持的 Options 如下：

--------- JDBC’s options  ---------
 user
 password
 url
 dbtable
 driver
 partitionColumn
 lowerBound
 upperBound
 numPartitions
 fetchsize
 truncate
 createTableOptions
 batchsize
 isolationLevel

--------- CSV’s options  ---------
 path
 sep
 delimiter
 mode
 encoding
 charset
 quote
 escape
 comment
 header
 inferSchema
 ignoreLeadingWhiteSpace
 ignoreTrailingWhiteSpace
 nullValue
 nanValue
 positiveInf
 negativeInf
 compression
 codec
 dateFormat
 timestampFormat
 maxColumns
 maxCharsPerColumn
 escapeQuotes
 quoteAll

---------  JSON’s options  ---------
 path
 samplingRatio
 primitivesAsString
 prefersDecimal
 allowComments
 allowUnquotedFieldNames
 allowSingleQuotes
 allowNumericLeadingZeros
 allowNonNumericNumbers
 allowBackslashEscapingAnyCharacter
 compression
 mode
 columnNameOfCorruptRecord
 dateFormat
 timestampFormat

--------- Parquet’s options  ---------
 path
 compression
 mergeSchema.

---------  ORC’s options  --------- 
 path
 compression
 orc.compress.

---------  FileStream’s options --------- 
 path
 maxFilesPerTrigger
 maxFileAge
 latestFirst.

--------- Text’s options ---------
 path 
 compression

--------- LibSVM’s options -------
 path
 vectorType 
 numFeatures

注意：在 Spark-2.1.0 以前，他们都是区分大小写的。

参考：https://spark.apache.org/docs/2.3.0/api/java/org/apache/spark/sql/DataFrameReader.html

[Spark基础]-- Spark 内置数据源 options 名称

猜你喜欢