当自带函数不满足我们的需求时可以自定义函数。
自定义函数的步骤
方法一
1 编写一个类继承UDF类(user define function)
2 并编写一个方法 evaluate(),(方法一定要叫这个名字,且这个方法可以重载)
3 添加hive的maven依赖
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>2.1.1</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>2.1.1</version>
</dependency>
4 打包并上传到linux上:如何打包之前已经介绍过了。
5 在hive中添加jar包
add jar /home/hr/soft/firstUDF.jar
list jar //查看jar包
6 创建临时函数使其生效
create temporary function testhello as 'com.oracle.hive.firstUDF';
7 使用该函数
select testhello();
PS:这样定义的函数是临时的,退出hive之后就没有了。
方法二
另一种方式是继承GenericUDF,重写initialize(),evaluate()和getDispalyString()三个方法。
①initialize用于做一些初始化的判断和过滤非法数据,而且要在initialize的返回值中确定函数的返回类型;
②evaluate中是核心的业务逻辑;
③getDispalyString没啥用,但是不允许返回null。
④用XXXObjectInspector工具来处理参数的类型以获得真正的参数值(使用见例2)
例1:根据参数求参数的累加和
package com.oracle.hive;
import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
import org.apache.hadoop.io.IntWritable;
import java.util.Arrays;
public class ForGenericUDF extends GenericUDF{
@Override
public ObjectInspector initialize(ObjectInspector[] objectInspectors) throws UDFArgumentException {
if(objectInspectors.length<2){
throw new IllegalArgumentException("There are at least two parameters");
}
return PrimitiveObjectInspectorFactory.javaIntObjectInspector;//函数返回类型为int
}
@Override
public Object evaluate(DeferredObject[] deferredObjects) throws HiveException {
int result=0;
for(DeferredObject dobj:deferredObjects){
result+=((IntWritable)dobj.get()).get();
}
return result;
}
@Override
public String getDisplayString(String[] strings) {
return Arrays.toString(strings);
}
}
例2:求税率和税后工资,数据如下
/**
* 税率
*/
package com.oracle.hive;
import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
import org.apache.hadoop.hive.serde2.objectinspector.MapObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
import java.util.Arrays;
import java.util.Collection;
public class Tax extends GenericUDF {
private MapObjectInspector moi;
@Override
public ObjectInspector initialize(ObjectInspector[] objectInspectors) throws UDFArgumentException {
if(objectInspectors.length!=1){
throw new IllegalArgumentException("There can only be one parameter");
}else{
moi=(MapObjectInspector)objectInspectors[0];
}
return PrimitiveObjectInspectorFactory.javaDoubleObjectInspector;
}
@Override
public Object evaluate(DeferredObject[] deferredObjects) throws HiveException {
double result=0;
Collection<?> taxs= moi.getMap(deferredObjects[0].get()).values();//税率们
for(Object obj:taxs)
{
Double d=new Double(obj.toString());
result+=d;
}
return result;
}
@Override
public String getDisplayString(String[] strings) {
return Arrays.toString(strings);
}
}
/**
* 计算税后工资
* 第一个参数为工资,第二个参数为税率们
*/
package com.oracle.hive;
import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
import org.apache.hadoop.hive.serde2.objectinspector.MapObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
import java.util.Arrays;
import java.util.Collection;
public class AfterTaxSal extends GenericUDF {
private MapObjectInspector moi;
@Override
public ObjectInspector initialize(ObjectInspector[] objectInspectors) throws UDFArgumentException {
if(objectInspectors.length!=2){
throw new IllegalArgumentException("There can only be two parameter");
}else{
moi=(MapObjectInspector)objectInspectors[1];
}
return PrimitiveObjectInspectorFactory.javaDoubleObjectInspector;
}
@Override
public Object evaluate(DeferredObject[] deferredObjects) throws HiveException {
double result=0;
Collection<?> taxs= moi.getMap(deferredObjects[1].get()).values();
for(Object obj:taxs)
{
Double d=new Double(obj.toString());
result+=d;
}
Double salary=new Double(deferredObjects[0].get().toString());
return salary*(1-result);
}
@Override
public String getDisplayString(String[] strings) {
return Arrays.toString(strings);
}
}
例3:找出元素在数组中的位置,没有返回-1
package com.oracle.hive;
import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
import org.apache.hadoop.hive.serde2.objectinspector.ListObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
import java.util.Arrays;
import java.util.List;
public class FindIndex extends GenericUDF {
private ListObjectInspector loi;
@Override
public ObjectInspector initialize(ObjectInspector[] objectInspectors) throws UDFArgumentException {
if(objectInspectors.length!=2){
throw new IllegalArgumentException("There can only be two parameter");
}else{
loi=(ListObjectInspector)objectInspectors[1];
}
return PrimitiveObjectInspectorFactory.javaIntObjectInspector;
}
@Override
public Object evaluate(DeferredObject[] deferredObjects) throws HiveException {
int index=-1;
List<?> list=loi.getList(deferredObjects[1].get());
String target=deferredObjects[0].get().toString();
for(int i=0;i<list.size();i++)
{
if(list.get(i).toString().equals(target))
{
index=i;
break;
}
}
return index;
}
@Override
public String getDisplayString(String[] strings) {
return Arrays.toString(strings);
}
}