如何使用Java中的DataFrame-EC库定义一个daysBetween函数?

huangapple go评论72阅读模式
英文:

How can I define a daysBetween function using the DataFrame-EC library in Java?

问题

I have defined a column StartDate as follows for a DataFrame I am loading using the dataframe-ec library.

schema.addColumn("StartDate", ValueType.DATE); 

我已经在使用dataframe-ec库加载的DataFrame中如下定义了一个名为StartDate的列:

schema.addColumn("StartDate", ValueType.DATE); 

I would like to add a computed column named DaysToEvent but am unsure how to define a function leveraging the Java time library so the following expression code will work.

dataFrame.attachColumn(
        dataFrame.createComputedColumn(
                "DaysToEvent",
                ValueType.LONG,
                "daysBetween(toDate(2023, 3, 11), StartDate)"));

我想要添加一个名为DaysToEvent的计算列,但不确定如何使用Java时间库定义一个函数,以便以下表达式代码能够工作:

dataFrame.attachColumn(
        dataFrame.createComputedColumn(
                "DaysToEvent",
                ValueType.LONG,
                "daysBetween(toDate(2023, 3, 11), StartDate)"));

I saw there was a built-in function named withinDays but am hoping to not have to change the library to add this function. I tried defining the expression using Java Code inline but that didn't work.

dataFrame.attachColumn(
        dataFrame.createComputedColumn(
                "DaysToEvent",
                ValueType.LONG,
                "java.time.temporal.ChronoUnit.DAYS.between(toDate(2023, 3, 11), StartDate)"));

我看到有一个内置函数名为withinDays,但希望不必更改库以添加此函数。我尝试使用内联Java代码来定义表达式,但这并不起作用。

dataFrame.attachColumn(
        dataFrame.createComputedColumn(
                "DaysToEvent",
                ValueType.LONG,
                "java.time.temporal.ChronoUnit.DAYS.between(toDate(2023, 3, 11), StartDate)"));
英文:

I have defined a column StartDate as follows for a DataFrame I am loading using the dataframe-ec library.

schema.addColumn("StartDate", ValueType.DATE); 

I would like to add a computed column named DaysToEvent but am unsure how to define a function leveraging the Java time library so the following expression code will work.

dataFrame.attachColumn(
        dataFrame.createComputedColumn(
                "DaysToEvent",
                ValueType.LONG,
                "daysBetween(toDate(2023, 3, 11), StartDate)"));

I saw there was a built-in function named withinDays but am hoping to not have to change the library to add this function. I tried defining the expression using Java Code inline but that didn't work.

dataFrame.attachColumn(
        dataFrame.createComputedColumn(
                "DaysToEvent",
                ValueType.LONG,
                "java.time.temporal.ChronoUnit.DAYS.between(toDate(2023, 3, 11), StartDate)"));

答案1

得分: 1

你可以在运行时将自定义函数添加到dataframe-ec表达式DSL中。然后,这些函数可以像库中提供的函数一样在计算列、筛选等表达式中使用。因此,你可以编写类似以下的代码:

dataFrame.addColumn("DaysToEvent", "daysBetween(toDate(2023, 3, 11), StartDate)")

(这比你在示例中所示的更简洁:你可以直接在数据帧对象上调用addColumn,无需指定计算列的类型,因为它会根据提供的表达式进行推断)。

要将函数(在本例中是daysBetween)添加到表达式DSL中,你需要调用BuiltInFunctions类的addFunctionDescriptor方法。你可以查看BuiltInFunctions中的现有实现(其中还有有关参数、不同类型、验证等的示例),还可以查看RuntimeAddedFunctionTest。在你的情况下,类似以下的代码应该可以工作:

BuiltInFunctions.addFunctionDescriptor(new IntrinsicFunctionDescriptor("daysBetween", Lists.immutable.of("date1", "date2"))
{
    @Override
    public Value evaluate(EvalContext context)
    {
        LocalDate date1 = ((DateValue) context.getVariable("date1")).dateValue();
        LocalDate date2 = ((DateValue) context.getVariable("date2")).dateValue();

        return new LongValue(ChronoUnit.DAYS.between(date1, date2));
    }

    @Override
    public ValueType returnType(ListIterable<ValueType> paraValueTypes)
    {
        return LONG;
    }
});

注意: 我是dataframe-ec库的原始作者。

英文:

You can add your custom functions to the dataframe-ec expression DSL at runtime. Then these functions can be used just like those that come with the library out of the box in expressions like computed columns, filters, and so on. So you will be able to write code that looks like:

dataFrame.addColumn(&quot;DaysToEvent&quot;, &quot;daysBetween(toDate(2023, 3, 11), StartDate)&quot;)

(this is a bit more streamlined than what you have in your example: you can call addColumn directly on a dataframe object and you don’t need to specify the type of a computed column as it is inferred from the expression you provide).

To add a function (in this case daysBetween) to the expression DSL you need to call the addFunctionDescriptor method on the BuiltInFunctions class.
You can look at the existing implementations in BuiltInFunctions (which also has examples of dealing with parameters, different types, validations, etc.) and also take a look at RuntimeAddedFunctionTest. In your case something like this should work:

BuiltInFunctions.addFunctionDescriptor(new IntrinsicFunctionDescriptor(&quot;daysBetween&quot;, Lists.immutable.of(&quot;date1&quot;, &quot;date2&quot;))
    {
        @Override
        public Value evaluate(EvalContext context)
        {
            LocalDate date1 = ((DateValue) context.getVariable(&quot;date1&quot;)).dateValue();
            LocalDate date2 = ((DateValue) context.getVariable(&quot;date2&quot;)).dateValue();

            return new LongValue(ChronoUnit.DAYS.between(date1, date2));
        }

        @Override
        public ValueType returnType(ListIterable&lt;ValueType&gt; paraValueTypes)
        {
            return LONG;
        }
    }
);

Note: I am the original author of the dataframe-ec library

huangapple
  • 本文由 发表于 2023年3月12日 11:38:08
  • 转载请务必保留本文链接:https://go.coder-hub.com/75710932.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定