1.代码
// 让框架推断字段类型
val dfInfer: DataFrame = spark.read.option("inferSchema","true").csv("data/stu.csv")
//字段名 重设
val df2: DataFrame = dfInfer.toDF("id","name","age","city","score")
df2.printSchema()
df2.show()
1.1 来源
/**
- Loads an
Dataset[String]
storing CSV rows and returns the result as aDataFrame
. - If the schema is not specified using
schema
function andinferSchema
option is enabled, - this function goes through the input once to determine the input schema.
- If the schema is not specified using
schema
function andinferSchema
option is disabled, - it determines the columns as string types and it reads only the first line to determine the
- names and the number of fields.
- @param csvDataset input Dataset with one CSV row per record
- @since 2.2.0
*/
1.2 不建议使用
因为该方法进行推断需要进行全局扫描,比较慢,可以用自定义Schema来进行代替
1.3 自定义Schema代码样例
val schema = new StructType(Array(
StructField("id",DataTypes.IntegerType),
StructField("name",DataTypes.StringType),
StructField("age",DataTypes.IntegerType),
StructField("city",DataTypes.StringType),
StructField("score",DataTypes.DoubleType)
))
val df3 = spark.read.schema(schema).csv("data/stu.csv")