NO | 步骤 |
1 | 安装配置jdk1.8 |
2 | 安装配置scala2.11.18 |
3 | 写操作hive的demo |
4 | 导入pom.xml中依赖jar |
5 | 下载hadoop的binary包,我的版本是2.7.3 |
6 | 下载winutils.exe将其放到$HADOOP_HOME/bin/目录下 |
7 | 在启动类的运行参数中设置环境变量,HADOOP_HOME=D:\software1\hadoop-2.7.3,=后面是hadoop的安装目录 |
8 | 将hive-site.xml配置文件放到工程的resource目录中 |
9 | 如果有权限问题,可以关闭namenode权限检查 |
pom.xml的依赖关系如下:
<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.lwq</groupId> <artifactId>sparkcase2</artifactId> <version>1.0-SNAPSHOT</version> <properties> <java.version>1.8</java.version> <spark.version>2.1.1</spark.version> </properties> <dependencies> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.11</artifactId> <version>${spark.version}</version> </dependency> <dependency> <groupId>mysql</groupId> <artifactId>mysql-connector-java</artifactId> <version>5.1.38</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-hive_2.11</artifactId> <version>${spark.version}</version> </dependency> </dependencies> </project>hive-site.xml的配置如下:
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> # hive元数据服务的地址 <property> <name>hive.metastore.uris</name> <value>thrift://master:9083</value> </property> <property> <name>hive.server2.thrift.port</name> <value>10000</value> </property> <property> #元数据地址 <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://master/metastore?createDatabaseIfNotExist=true</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property> <property> #连接元数据数据库账号 <name>javax.jdo.option.ConnectionUserName</name> <value>hadoop</value> </property> <property> #连接元数据数据库密码 <name>javax.jdo.option.ConnectionPassword</name> <value>hadoop</value> </property> <property> #hive数据存储在hdfs上的目录 <name>hive.metastore.warehouse.dir</name> <value>/warehouse</value> </property> <property> <name>fs.defaultFS</name> <value>hdfs://master:9000</value> </property> <property> #元数据schema验证 <name>hive.metastore.schema.verification</name> <value>false</value> </property> <property> <name>datanucleus.autoCreateSchema</name> <value>true</value> </property> <property> <name>datanucleus.autoStartMechanism</name> <value>SchemaTable</value> </property> <property> <name>datanucleus.schema.autoCreateTables</name> <value>true</value> </property> <property> <name>beeline.hs2.connection.user</name> <value>bigdata</value> </property> <property> <name>beeline.hs2.connection.password</name> <value>root</value> </property> </configuration>
演示代码:
package com.lwq.spark import org.apache.spark.sql.SparkSession object HiveCaseJob { def main(args: Array[String]): Unit = { val sparkSession = SparkSession.builder().appName("HiveCaseJob").master("local[*]").enableHiveSupport().getOrCreate() sparkSession.sql("drop table if exists users") sparkSession.sql("show tables").show() sparkSession.sql("create table if not exists users(id int,name string) row format delimited fields terminated by ' ' stored as textfile") sparkSession.sql("show tables").show() sparkSession.sql("select * from users").show() sparkSession.sql("load data local inpath 'src/main/resources/a.txt' overwrite into table users") sparkSession.sql("select * from users").show() } }
配置运行参数:
运行过程中可能的报错:
这是HADOOP_HOME变量和winutils.exe的原因,查看变量是否配好,winutils是否放在正确目录。
因为是在windows环境,在运行的时候会模拟linux,需要winutils.exe,winutils.exe的路径代码中默认是从hadoop的安装路径的bin目录下。所以还需要上面的HADOOP_HOME变量。
扫描二维码关注公众号,回复:
477079 查看本文章
还有一个错误就是提示权限不足的,因为在windows下是通过windows的用户来访问的,可以关闭hdfs的权限检查
如下: