英文:
Understanding LongWritable in Hive UDFs
问题
我试过在谷歌上搜索,但我不太理解文档。有人能解释一下这行代码的作用吗?
这是Hive UDF的一部分。我不完全理解LongWritable或者1L代表什么。
public class CustomUDF extends UDF {
public LongWritable evaluate(Text schema) {
if (schema == null) {
return null;
}
try {
return new LongWritable(1l);
} catch (Exception ex) {
// 捕获错误
}
}
}
我是Hive UDF的新手,对于这个方法感到困惑。谢谢!
英文:
I tried googling it but i don't understand the documentation much. Can anyone explain what this line of code does.
It's part of a Hive UDF. I don't fully understand LongWritable or what 1L means.
public class CustomUDF extends UDF {
public LongWritable evaluate(Text schema) { // what is Text schema??
if (schema == null) {
return null;
}
try {
return new LongWritable(1l); // what does this do??
} catch (Exception ex) {
// catch error
}
}
}
I'm new to Hive UDFs and I'm having trouble understanding this method. Thank you!!
答案1
得分: 2
- LongWritable类
Hadoop需要通过DataInput和DataOutput对象(通常是IO流)能够序列化Java类型的数据进出。通过实现write(DataOuput)
和readFields(DataInput)
两个方法,Writable类可以实现这一点。具体来说,LongWritable是一个包装了Java long类型的Writable类。
对于相同类型的其他类 - https://blog.dataiku.com/2013/05/01/a-complete-guide-to-writing-hive-udf
'evaluate'方法是UDF的入口点。因此,如果在Hive中调用UDF为'select myudf('aa')',则输入'aa'将传递给您的evaluate方法。(根据用例,我们还可以重载此方法)
现在来看看您的代码。首先,这段代码存在错误,因为如果进入catch块,它不会返回任何内容。但是让我们假设如果输入不为空,它将返回一个新的LongWritable(1L)。然后这段代码将会:
- 如果将null传递给您的UDF,它将返回null。Hive命令 - select myudf(null)
- 如果未向UDF传递任何内容,它将产生错误,指出在此类中找不到匹配的方法,因为在这种情况下,它将寻找不带任何参数的evaluate方法。Hive命令 - select myudf();
- 如果在UDF中传递任何可以转换为Text的内容,它将返回1(long)。Hive命令 - select myudf('aa');
此外,1和1L之间的区别在于1是int类型,而1L是long类型。
英文:
- LongWritable Class
Hadoop needs to be able to serialise data in and out of Java types via DataInput and DataOutputobjects (IO Streams usually). The Writable classes do this by implementing two methods `write(DataOuput) and readFields(DataInput). Specifically LongWritable is a Writable class that wraps a java long.
For other classes of same type - https://blog.dataiku.com/2013/05/01/a-complete-guide-to-writing-hive-udf
'evaluate' method is the entry point for udf. So if you call udf in Hive as 'select myudf('aa')' then input 'aa' will be passed to your evaluate method. (We can also overload this method, according to the use case)
Now coming to your code. First of all this code contains error since if it goes to catch it will not return anything. But let us assume that if input is not null, it will return a new LongWritable(1L). Then this code will
- return null if null is passed to your udf. Hive command - select myudf(null)
- If nothing is passed to udf, it will give error stating that no matching method found in this class, since in this case it will look for the evaluate method which doesn't take any arguments. Hive command - select myudf();
- If you pass anything in your udf that could be converted to Text, then it will return 1(long). Hive command - select myudf('aa');
Also, the difference between 1 and 1L is that 1 is of int type and 1L is of long type.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论