数据及需求
有如下数据:
a,100,50,120
b,220,150,20
c,220,450,220
3个字段分别表示: 用户id,基本工资,业绩提成,股权收益
需要查询出每个人的三类收益中最高的是哪一种收益
Java部分
1.新建一个maven工程
2.在pom.xml中导入依赖
<dependencies>
<!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.8.3</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.hive/hive-common -->
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-common</artifactId>
<version>1.2.2</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.hive/hive-service -->
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-service</artifactId>
<version>1.2.2</version>
</dependency>
</dependencies>
3.先开发一个java类,继承UDF,并重载evaluate方法
public class MyUDF extends UDF{
public int evaluate(int a,int b,int c) {
System.out.println(max(max(a,b),c));
}
public static int max(int x,int y) {
return (x>y)?x:y;
}
}
4.将项目打成jar包。上传到集群上
hive部分
1、将jar包添加到hive的classpath
hive>add jar /root/udf.jar;
2、创建临时函数与开发好的java class关联
create temporary function get_max_index as ‘cn.huihui.MyUDF’;
as 后加你要执行的Java方法的全类名
3、创建表、导入数据
create table t_employee(uid string,salary int,ticheng int,guquan int)
row format delimited fields terminated by ‘,’;load data local inpath ‘/root/emp.dat’ into table t_employee;
4、用自定义函数查询:
select uid,salary,ticheng,guquan,get_max_index(salary,ticheng,guquan) as idx
from t_employee;