环境:
- Hive client (version 1.2.1)
- Spark version 2.1.1.2.6.1.0-129
现有一测试表datadev.t_student,字段如下
col_name | data_type | comment |
id | string | |
score | int |
此时,我们在Spark-SQL中执行以下命令,会报错。
create view datadev.t_student_view as select NVL(id, 'xx') as id from datadev.t_student;
错误信息如下:
21/12/20 16:57:44 ERROR SparkSQLDriver: Failed in [create view datadev.t_student_view as select NVL(id, 'xx') as id from datadev.t_student]
java.lang.RuntimeException: Failed to analyze the canonicalized SQL: SELECT `gen_attr_0` AS `id` FROM (SELECT nvl(t_student.`id`, 'xx') AS `gen_attr_0` FROM (SELECT `id` AS `gen_attr_1`, `score` AS `gen_attr_2` FROM `datadev`.`t_student`) AS gen_subquery_0) AS gen_subquery_1
Caused by: org.apache.spark.sql.AnalysisException: cannot resolve '`t_student.id`' given input columns: [gen_attr_1, gen_attr_2]; line 1 pos 45;
'Project ['gen_attr_0 AS id#31]
+- 'SubqueryAlias gen_subquery_1
+- 'Project ['nvl('t_student.id, xx) AS gen_attr_0#30]
+- SubqueryAlias gen_subquery_0
+- Project [id#32 AS gen_attr_1#28, score#33 AS gen_attr_2#29]
+- MetastoreRelation datadev, t_student
这里分析错误日志可知,Spark-SQL 在执行的时候给id字段取了别名gen_attr_0,但是在NVL函数里面却无法取到gen_attr_0,因此报错。
但是此SQL在Hive SQL中没问题,应该是Spark-SQL底层解析出了问题。
解决办法:将NVL函数换成COALESCE函数