Hive导出数据到本地CSV

https://www.iteblog.com/archives/955.html

https://cloud.tencent.com/developer/article/1352376

https://blog.csdn.net/pzw_0612/article/details/48064697

https://blog.csdn.net/gezailushang/article/details/83586042

有五种方法:

一,先把Hive表转化为DataFrame,再基于DataFrame.writer.csv()(DataFrameWriter.csv)导出到HDFS

df = spark.sql("select * from test.student3")

df.write.csv()

二,是pyspark利用spark.sql(sql_str),spark是HiveContext。

spark.sql("")

三,使用hive的insert语法导出文件,导出的是hive的文件,不是完整的csv文件。

insert overwrite local directory '/url/lxb/hive'
row format delimited
fields terminated by ','
select * from table_name limit 100

四,hive执行参数-e

hive -e 'select * from test.student3' >> /usr/lxb/student.txt

hive -e 'set hive.execution.engine=tez; set hive.cli.print.header=true; set hive.resultset.use.unique.column.names=false; select * from database.table' | sed 's/x01/,/g' > /usr/lxb/hive/test.csv

五, 先把Hive表转化为DataFrame,再基于DataFrame.toPandas()转化为pandas的DataFrame,然后再利用DataFrame.to_csv导出到本地。

to_csv参数可以参照下面的链接:

https://blog.csdn.net/u010801439/article/details/80033341

https://blog.csdn.net/qton_csdn/article/details/70493196

猜你喜欢

转载自blog.csdn.net/gezailushang/article/details/83583621