英文:
How can I define a daysBetween function using the DataFrame-EC library in Java?
问题
I have defined a column StartDate
as follows for a DataFrame I am loading using the dataframe-ec library.
schema.addColumn("StartDate", ValueType.DATE);
我已经在使用dataframe-ec库加载的DataFrame中如下定义了一个名为StartDate
的列:
schema.addColumn("StartDate", ValueType.DATE);
I would like to add a computed column named DaysToEvent
but am unsure how to define a function leveraging the Java time library so the following expression code will work.
dataFrame.attachColumn(
dataFrame.createComputedColumn(
"DaysToEvent",
ValueType.LONG,
"daysBetween(toDate(2023, 3, 11), StartDate)"));
我想要添加一个名为DaysToEvent
的计算列,但不确定如何使用Java时间库定义一个函数,以便以下表达式代码能够工作:
dataFrame.attachColumn(
dataFrame.createComputedColumn(
"DaysToEvent",
ValueType.LONG,
"daysBetween(toDate(2023, 3, 11), StartDate)"));
I saw there was a built-in function named withinDays
but am hoping to not have to change the library to add this function. I tried defining the expression using Java Code inline but that didn't work.
dataFrame.attachColumn(
dataFrame.createComputedColumn(
"DaysToEvent",
ValueType.LONG,
"java.time.temporal.ChronoUnit.DAYS.between(toDate(2023, 3, 11), StartDate)"));
我看到有一个内置函数名为withinDays
,但希望不必更改库以添加此函数。我尝试使用内联Java代码来定义表达式,但这并不起作用。
dataFrame.attachColumn(
dataFrame.createComputedColumn(
"DaysToEvent",
ValueType.LONG,
"java.time.temporal.ChronoUnit.DAYS.between(toDate(2023, 3, 11), StartDate)"));
英文:
I have defined a column StartDate
as follows for a DataFrame I am loading using the dataframe-ec library.
schema.addColumn("StartDate", ValueType.DATE);
I would like to add a computed column named DaysToEvent
but am unsure how to define a function leveraging the Java time library so the following expression code will work.
dataFrame.attachColumn(
dataFrame.createComputedColumn(
"DaysToEvent",
ValueType.LONG,
"daysBetween(toDate(2023, 3, 11), StartDate)"));
I saw there was a built-in function named withinDays
but am hoping to not have to change the library to add this function. I tried defining the expression using Java Code inline but that didn't work.
dataFrame.attachColumn(
dataFrame.createComputedColumn(
"DaysToEvent",
ValueType.LONG,
"java.time.temporal.ChronoUnit.DAYS.between(toDate(2023, 3, 11), StartDate)"));
答案1
得分: 1
你可以在运行时将自定义函数添加到dataframe-ec表达式DSL中。然后,这些函数可以像库中提供的函数一样在计算列、筛选等表达式中使用。因此,你可以编写类似以下的代码:
dataFrame.addColumn("DaysToEvent", "daysBetween(toDate(2023, 3, 11), StartDate)")
(这比你在示例中所示的更简洁:你可以直接在数据帧对象上调用addColumn
,无需指定计算列的类型,因为它会根据提供的表达式进行推断)。
要将函数(在本例中是daysBetween
)添加到表达式DSL中,你需要调用BuiltInFunctions
类的addFunctionDescriptor
方法。你可以查看BuiltInFunctions
中的现有实现(其中还有有关参数、不同类型、验证等的示例),还可以查看RuntimeAddedFunctionTest
。在你的情况下,类似以下的代码应该可以工作:
BuiltInFunctions.addFunctionDescriptor(new IntrinsicFunctionDescriptor("daysBetween", Lists.immutable.of("date1", "date2"))
{
@Override
public Value evaluate(EvalContext context)
{
LocalDate date1 = ((DateValue) context.getVariable("date1")).dateValue();
LocalDate date2 = ((DateValue) context.getVariable("date2")).dateValue();
return new LongValue(ChronoUnit.DAYS.between(date1, date2));
}
@Override
public ValueType returnType(ListIterable<ValueType> paraValueTypes)
{
return LONG;
}
});
注意: 我是dataframe-ec库的原始作者。
英文:
You can add your custom functions to the dataframe-ec expression DSL at runtime. Then these functions can be used just like those that come with the library out of the box in expressions like computed columns, filters, and so on. So you will be able to write code that looks like:
dataFrame.addColumn("DaysToEvent", "daysBetween(toDate(2023, 3, 11), StartDate)")
(this is a bit more streamlined than what you have in your example: you can call addColumn
directly on a dataframe object and you don’t need to specify the type of a computed column as it is inferred from the expression you provide).
To add a function (in this case daysBetween
) to the expression DSL you need to call the addFunctionDescriptor
method on the BuiltInFunctions
class.
You can look at the existing implementations in BuiltInFunctions
(which also has examples of dealing with parameters, different types, validations, etc.) and also take a look at RuntimeAddedFunctionTest
. In your case something like this should work:
BuiltInFunctions.addFunctionDescriptor(new IntrinsicFunctionDescriptor("daysBetween", Lists.immutable.of("date1", "date2"))
{
@Override
public Value evaluate(EvalContext context)
{
LocalDate date1 = ((DateValue) context.getVariable("date1")).dateValue();
LocalDate date2 = ((DateValue) context.getVariable("date2")).dateValue();
return new LongValue(ChronoUnit.DAYS.between(date1, date2));
}
@Override
public ValueType returnType(ListIterable<ValueType> paraValueTypes)
{
return LONG;
}
}
);
Note: I am the original author of the dataframe-ec library
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论