Hive实现UDF自定义函数

前面讲过很多函数，今天就说下UDF自定义函数

编写UDF有两种方式
1.继承 UDF ，重写evaluate方法
2.继承 GenericUDF，重写initialize、getDisplayString、evaluate方法

先说下第一种方式：
下载依赖包到pom.xml

<dependency>
	<groupId>org.apache.hadoop</groupId>
	<artifactId>hadoop-common</artifactId>
	<version>2.6.0</version>
</dependency>
<dependency>
	<groupId>org.apache.hive</groupId>
	<artifactId>hive-exec</artifactId>
	<version>1.1.0</version>
</dependency>

    创建类继承UDF并重写evaluate方法
    public class MyFunc extends UDF {
//自定义函数来计算male和female人数
	public Text evaluate(ArrayList<Text> txt){
	    int male = 0;
	    int female = 0;
	    for(Text tx : txt){
            String sex = tx.toString();
	    if(sex.equalsIgnoreCase("male")){
                male++;
	    }else {
                female++;
		}
	}

	return new Text("male"+male+",female"+female);
}
}

导成瘦包放到linux的 opt目录下，准备上传到hdfs
建文件夹
hdfs dfs -mkdir /func;
上传到hdfs
!hdfs dfs -put /opt/myfun.jar /func;
从hdfs中加载jar包
add jar hdfs://192.168.56.100:9000/func/myfun.jar;
ip是hadoop中core-site.xml中配置的hdfs路径

想永久上传jar包的话就不用add
直接创建函数，名称自己定义并加载
create function mytest as “com.demo.hive.MyFunc” using jar “hdfs:/func/myfun.jar”;
重启hive

解析：
create function mytest 创建自定义函数
com.demo.hive.MyFunc 继承UDF的类的路径
hdfs:/func/myfun.jar 上传到hdfs的jar包路径

调用自定义函数
select mytest(male) from userinfos;
删除自定义函数函数
drop function mytest;

第二种方式：
继承org.apache.hadoop.hive.ql.udf.generic.GenericUDF之后，

需要重写几个重要的方法：
public void configure(MapredContext context) {}
//可选，该方法中可以通过context.getJobConf()获取job执行时候的Configuration；
//可以通过Configuration传递参数值

public ObjectInspector initialize(ObjectInspector[] arguments)
//必选，该方法用于函数初始化操作，并定义函数的返回值类型；
//比如，在该方法中可以初始化对象实例，初始化数据库链接，初始化读取文件等；

public Object evaluate(DeferredObject[] args){}
//必选，函数处理的核心方法，用途和UDF中的evaluate一样；

public String getDisplayString(String[] children)
//必选，显示函数的帮助信息

public void close(){}
//可选，map完成后，执行关闭操作

Hive实现UDF自定义函数

猜你喜欢