Hive自定义函数UDF--求三个数据最大值

数据及需求

有如下数据:
a,100,50,120
b,220,150,20
c,220,450,220
3个字段分别表示: 用户id,基本工资,业绩提成,股权收益
需要查询出每个人的三类收益中最高的是哪一种收益

Java部分

1.新建一个maven工程

2.在pom.xml中导入依赖

<dependencies>
  <!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client -->
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>2.8.3</version>
        </dependency>
  <!-- https://mvnrepository.com/artifact/org.apache.hive/hive-common -->
<dependency>
    <groupId>org.apache.hive</groupId>
    <artifactId>hive-common</artifactId>
    <version>1.2.2</version>
</dependency>

  <!-- https://mvnrepository.com/artifact/org.apache.hive/hive-service -->
<dependency>
    <groupId>org.apache.hive</groupId>
    <artifactId>hive-service</artifactId>
    <version>1.2.2</version>
</dependency>
  </dependencies>

3.先开发一个java类,继承UDF,并重载evaluate方法

public class MyUDF extends UDF{
    public int evaluate(int a,int b,int c) {
        System.out.println(max(max(a,b),c));
    }
public static int max(int x,int y) {
        return (x>y)?x:y;   
    }
}

4.将项目打成jar包。上传到集群上

hive部分

1、将jar包添加到hive的classpath

hive>add jar /root/udf.jar;

2、创建临时函数与开发好的java class关联

create temporary function get_max_index as ‘cn.huihui.MyUDF’;
as 后加你要执行的Java方法的全类名

3、创建表、导入数据

create table t_employee(uid string,salary int,ticheng int,guquan int)
row format delimited fields terminated by ‘,’;

load data local inpath ‘/root/emp.dat’ into table t_employee;

4、用自定义函数查询:

select uid,salary,ticheng,guquan,get_max_index(salary,ticheng,guquan) as idx
from t_employee;

猜你喜欢

转载自blog.csdn.net/amin_hui/article/details/82262117