理解Hive UDFs中的LongWritable

huangapple go评论147阅读模式
英文:

Understanding LongWritable in Hive UDFs

问题

我试过在谷歌上搜索,但我不太理解文档。有人能解释一下这行代码的作用吗?

这是Hive UDF的一部分。我不完全理解LongWritable或者1L代表什么。

public class CustomUDF extends UDF {
    public LongWritable evaluate(Text schema) { 
        if (schema == null) {
            return null;
        }
        try {
            return new LongWritable(1l); 
        } catch (Exception ex) {
            // 捕获错误
        }
    }
}

我是Hive UDF的新手,对于这个方法感到困惑。谢谢!

英文:

I tried googling it but i don't understand the documentation much. Can anyone explain what this line of code does.

It's part of a Hive UDF. I don't fully understand LongWritable or what 1L means.

public class CustomUDF extends UDF {
    public LongWritable evaluate(Text schema) { // what is Text schema??
        if (schema == null) {
            return null;
        }
        try {
            return new LongWritable(1l); // what does this do??
        } catch (Exception ex) {
            // catch error
        }
    }
}

I'm new to Hive UDFs and I'm having trouble understanding this method. Thank you!!

答案1

得分: 2

  • LongWritable类

Hadoop需要通过DataInput和DataOutput对象(通常是IO流)能够序列化Java类型的数据进出。通过实现write(DataOuput)readFields(DataInput)两个方法,Writable类可以实现这一点。具体来说,LongWritable是一个包装了Java long类型的Writable类。

参考资料 - https://www.edureka.co/community/29194/understanding-longwritable#:~:text=Hadoop%20needs%20to%20be%20able,that%20wraps%20a%20java%20long.

对于相同类型的其他类 - https://blog.dataiku.com/2013/05/01/a-complete-guide-to-writing-hive-udf

'evaluate'方法是UDF的入口点。因此,如果在Hive中调用UDF为'select myudf('aa')',则输入'aa'将传递给您的evaluate方法。(根据用例,我们还可以重载此方法)

现在来看看您的代码。首先,这段代码存在错误,因为如果进入catch块,它不会返回任何内容。但是让我们假设如果输入不为空,它将返回一个新的LongWritable(1L)。然后这段代码将会:

  • 如果将null传递给您的UDF,它将返回null。Hive命令 - select myudf(null)
  • 如果未向UDF传递任何内容,它将产生错误,指出在此类中找不到匹配的方法,因为在这种情况下,它将寻找不带任何参数的evaluate方法。Hive命令 - select myudf();
  • 如果在UDF中传递任何可以转换为Text的内容,它将返回1(long)。Hive命令 - select myudf('aa');

此外,1和1L之间的区别在于1是int类型,而1L是long类型。

英文:
  • LongWritable Class

Hadoop needs to be able to serialise data in and out of Java types via DataInput and DataOutputobjects (IO Streams usually). The Writable classes do this by implementing two methods `write(DataOuput) and readFields(DataInput). Specifically LongWritable is a Writable class that wraps a java long.

Reference - https://www.edureka.co/community/29194/understanding-longwritable#:~:text=Hadoop%20needs%20to%20be%20able,that%20wraps%20a%20java%20long.

For other classes of same type - https://blog.dataiku.com/2013/05/01/a-complete-guide-to-writing-hive-udf

'evaluate' method is the entry point for udf. So if you call udf in Hive as 'select myudf('aa')' then input 'aa' will be passed to your evaluate method. (We can also overload this method, according to the use case)

Now coming to your code. First of all this code contains error since if it goes to catch it will not return anything. But let us assume that if input is not null, it will return a new LongWritable(1L). Then this code will

  • return null if null is passed to your udf. Hive command - select myudf(null)
  • If nothing is passed to udf, it will give error stating that no matching method found in this class, since in this case it will look for the evaluate method which doesn't take any arguments. Hive command - select myudf();
  • If you pass anything in your udf that could be converted to Text, then it will return 1(long). Hive command - select myudf('aa');

Also, the difference between 1 and 1L is that 1 is of int type and 1L is of long type.

huangapple
  • 本文由 发表于 2020年8月26日 03:08:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/63585594.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定