spark不支持hive表分区字段中存在大写字母

建表语句：

DROP TABLE IF EXISTS ldldwd.ec_jd_intransit_detail;
CREATE TABLE IF NOT EXISTS ldldwd.ec_jd_intransit_detail(
  DataLable     string,
  DocumentNumber     string,
  CurrentRecordNumber     int,
  VendorProductID     string,
  BuyerProductID     string,
  ProductCode     string,
  ProductName     string,
  ListPrice     double,
  Quantity     int,
  ReceivingQuantity     int,
  DamagedQuantity     int,
  RefuseQuantity     int,
  PurchasedBy     string,
  CostPrice     double,
  Discount     double,
  PackageNumber     string,
  ErrorReason     string,
  Comments     string,
  InsertDate     timestamp
)
PARTITIONED BY (InsertDate_month string)
STORED AS PARQUET;

插入语句：

SET spark.sql.parser.quotedRegexColumnNames = true;

INSERT INTO TABLE ldldwd.ec_jd_intransit_detail PARTITION(InsertDate_month)

  SELECT `(load_ts|load_date|rk)?+.+`, DATE_FORMAT(InsertDate, 'yyyy-MM') as InsertDate_month

  FROM

  (

    SELECT *, RANK()OVER (ORDER BY load_ts DESC) AS rk

    FROM ldlsrc.ec_jd_intransit_detail

    WHERE load_date = '@src_partition_date@'

  )

  WHERE rk = 1;

报错：

说我的分区字段是insertdate_month和InsertDate_month，其中包含非分区字段。

那insertdate_month应该是非分区字段，可insertdate_month是从哪来的呢？

删除表再重建，仍然报这个错误。

那既然如此，我改InsertDate_month还不行吗？

于是修改建表语句和插入语句中的InsertDate_month为insertDate_month

报错

证明一点，insertdate_month才是表的分区字段，我真真切切写在建表语句和插入语句中的InsertDate_month竟然成了非法分区字段？！

好吧，那依你。将建表语句和插入语句中的InsertDate_month修改为insertdate_month，插入成功。

思考：难道分区字段不支持大写字母？

经过验证，貌似spark真的不支持hive中大写的分区字段，spark会自动把大写字母转换成小写，但是hive本身是支持的。

spark不支持hive表分区字段中存在大写字母

猜你喜欢