今天需要通过导入文本中的数据到HIVE数据库,而且因为预设该表的数据会比较大,所以采用分区表的设计方案。将表按地区和日期分区。在这个过程出现过一些BUG,记录以便后期查看。
spark.sql("use oracledb") spark.sql("CREATE TABLE IF NOT EXISTS " + tablename + " (OBUID STRING, BUS_ID STRING,REVTIME STRING,OBUTIME STRING,LONGITUDE STRING,LATITUDE STRING,\ GPSKEY STRING,DIRECTION STRING,SPEED STRING,RUNNING_NO STRING,DATA_SERIAL STRING,GPS_MILEAGE STRING,SATELLITE_COUNT STRING,ROUTE_CODE STRING,SERVICE STRING)\ PARTITIONED BY(AREASTRING,OBUDATE STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ") spark.sql("set hive.exec.dynamic.partition.mode = nonstrict") spark.sql("set hive.exec.dynamic.partition = true") # print("创建数据库完成") if addoroverwrite: # 追加 spark.sql("INSERT INTO TABLE " + tablename + " PARTITION(AREA,OBUDATE) SELECT OBUID,BUS_ID, REVTIME, OBUTIME,LONGITUDE ,LATITUDE,GPSKEY,DIRECTION,SPEED,\ RUNNING_NO,DATA_SERIAL,GPS_MILEAGE, SATELLITE_COUNT ,ROUTE_CODE,SERVICE,'gz' AS AREA,SUBSTR(OBUTIME,1,10) AS OBUDATEFROM " + tablename + "_tmp")z执行脚本后出现以下错误:
Partition spec {area=, obudate=, AREA=gz, OBUDATE=2017-01-} contains non-partition columns;
经过度娘,有提到分区表中大小写的BUG,于是修改脚本,将分区字段小写,执行成功。修改后的脚本:
spark.sql("use oracledb") spark.sql("CREATE TABLE IF NOT EXISTS " + tablename + " (OBUID STRING, BUS_ID STRING,REVTIME STRING,OBUTIME STRING,LONGITUDE STRING,LATITUDE STRING,\ GPSKEY STRING,DIRECTION STRING,SPEED STRING,RUNNING_NO STRING,DATA_SERIAL STRING,GPS_MILEAGE STRING,SATELLITE_COUNT STRING,ROUTE_CODE STRING,SERVICE STRING)\ PARTITIONED BY(area STRING,obudate STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ") # 设置参数 # hive > set hive.exec.dynamic.partition.mode = nonstrict; # hive > set hive.exec.dynamic.partition = true; spark.sql("set hive.exec.dynamic.partition.mode = nonstrict") spark.sql("set hive.exec.dynamic.partition = true") # print("创建数据库完成") if addoroverwrite: # 追加 spark.sql("INSERT INTO TABLE " + tablename + " PARTITION(area,obudate) SELECT OBUID,BUS_ID, REVTIME, OBUTIME,LONGITUDE ,LATITUDE,GPSKEY,DIRECTION,SPEED,\ RUNNING_NO,DATA_SERIAL,GPS_MILEAGE, SATELLITE_COUNT ,ROUTE_CODE,SERVICE,'gz' AS area ,SUBSTR(OBUTIME,1,10) AS obudate FROM " + tablename + "_tmp")