2023年8月5日 00:02:01go评论138阅读模式

英文:

Impala Query(dbplyr) Error : Encountered Identifier : expected: (

问题

我目前正在处理一项野牛数据库（Impala DB）的工作，我在使用dbplyr的SQL翻译时遇到了一些问题。

这是我的代码的第一次迭代，如果在之前收集（collect）表格的话，它可以在R中运行（这并不是我想要的，因为这需要很长时间）：

DF2_V1 <- DF1 %>%
  filter(indicator != "N") %>%
  group_by(id) %>%
  filter(!("Y" %in% indicator) | (indicator == "Y"),
         !("ANALYSIS" %in% indicator) | (indicator != "RECOMMENDED")) %>%
  filter(time1 == min(time1)) %>%
  ungroup() %>%
  mutate(time_diff = time1 - time2) %>%
  select(id, indicator, time1, time2, time_diff) %>% 
  show_query() %>%
  collect()

基本上，这段代码的目标是从DF1表格中进行以下逻辑操作：

如果给定的id具有任何Y指示符，则删除其他指示符并保留该指示符的最早迭代。如果没有Y出现，则我们更喜欢指示符ANALYSIS，并选择第一个迭代（最早的时间），如果不存在ANALYSIS，则选择RECOMMENDED。这段代码在R中运行良好（在之前收集DF1的情况下），并且做我想要的事情。但是当DF1表格没有被收集（uncollected）并且我们执行SQL查询时，我遇到了以下错误：

Error in new_result(connection@ptr, statement, immediate) :   
nanodbc/nanodbc.cpp:1412: 00000: [RStudio][ImpalaODBC] (360)   
Syntax error occurred during query execution: [HY000] :   
AnalysisException: Syntax error in line 32:
WHEN ('Y' IN indicator) THEN 'Y'  
                  ^  
Encountered: IDENTIFIER  
Expected: (
CAUSED BY: Exception: Syntax error

我对数据库查询仍然很陌生，不确定如何解决这个问题，所以我尝试在R中使用dbplyr中的SQL脚本进行代码重写，希望能够澄清我的逻辑：

DF2_V2 <- DF1 %>%
  filter(indicator != "NULL") %>%
  group_by(id) %>%
  mutate(indicator = case_when(
    sql("'Y' IN indicator") ~ "Y",
    sql("('ANALYSIS' IN indicator) AND (indicator != 'RECOMMENDED')") ~ "ANALYSIS",
    TRUE ~ "RECOMMENDED")) %>%
  filter(time1 == min(time1)) %>%
  mutate(time_diff = time1 - time2) %>%
  select(...) %>%
  collect()

这也导致了相同的错误。我还尝试直接在数据库中使用show_query翻译来查看是否是R的连接问题，但最终得出了相同的结论。不确定是我的代码本身有问题还是翻译成SQL时出了问题，但我似乎找不到问题所在。

英文:

I'm currently working on an impala db and I'm having some problems with dbplyr's SQL translation.

This is the first iteration of my code which works in R if I collect the table beforehand (which is not something I want to do as it takes forever):

DF2_V1 &lt;- DF1 %&gt;%
  filter(indicator != &quot;N&quot;) %&gt;%
  group_by(id) %&gt;%
  filter(!(&quot;Y&quot; %in% indicator) | (indicator == &quot;Y&quot;),
         !(&quot;ANALYSIS&quot; %in% indicator) | (indicator != &quot;RECOMMENDED&quot;)) %&gt;%
  filter(time1 == min(time1)) %&gt;%
  ungroup() %&gt;%
  mutate(time_diff = time1 - time2) %&gt;%
  select(id,indicator,time1,time2,time_diff %&gt;% show_query() %&gt;% collect()

Essentially the goal of this code is to take DF1 ;

ID	INDICATOR	TIME1	TIME2
1	Y	...	.....
1	N	...	.....
1	RECOMMEND	...	.....
2	RECOMMEND	...	.....
2	ANALYSIS	...	.....

And perform the following logic: If a given id has any Y indicator remove the others and keep the earliest iteration of that indicator. If Y is not present, we favor the indicator ANALYSIS instead and take the first iteration (earliest time), if not we take RECOMMENDED. This code works fine in R (when collecting DF1 beforehand) and does what I want however when the DF1 table is uncollected and we're performing a SQL query I get the following error:

> Error in new_result(connection@ptr, statement, immediate) :
> nanodbc/nanodbc.cpp:1412: 00000: [RStudio][ImpalaODBC] (360)
> Syntax error occurred during query execution: [HY000] :
> AnalysisException: Syntax error in line 32:
>
> WHEN ('Y' IN indicator) THEN 'Y'
> ^
> Encountered: IDENTIFIER
> Expected: (
>
> CAUSED BY: Exception: Syntax error

I'm still quite new to db queries and I was not sure what to make of this so I tried rewriting the code in R using SQL script in dbplyr which made some minor modifications hoping to clarify my logic:

DF2_V2 &lt;- DF1 %&gt;%
  filter(indicator != &quot;NULL&quot;) %&gt;%
  group_by(id) %&gt;%
  mutate(indicator = case_when(
    sql(&quot;&#39;Y&#39; IN indicator&quot;) ~ &quot;Y&quot;,
    sql(&quot;(&#39;ANALYSIS&#39; IN indicator) AND (indicator != &#39;RECOMMENDED&#39;)&quot;) ~ &quot;ANALYSIS&quot;,
    TRUE ~ &quot;RECOMMENDED&quot;)) %&gt;%
  filter(time1 == min(time1)) %&gt;%
  mutate(time_diff = time1 - time2) %&gt;% select(...) %&gt;% collect()

This presented with the same error. I also tried my queries directly in the db using the show_query translation to see if it was a problem with R's connection but inevitably came to the same conclusion. Not sure if my code itself is faulty or the translation into SQL is being messed up but I cant seem to find the problem.

答案1

得分: 1

我不确定这是否是dbplyr中的一个错误，但强制使用括号应该有效：

DF2_V1 <- DF1 %>%
  filter(indicator != "N") %>%
  group_by(id) %>%
  filter(!("Y" %in% indicator) | (indicator == "Y"),
         !("ANALYSIS" %in% indicator) | (indicator != "RECOMMENDED")) %>%
  filter(time1 == min(time1)) %>%
  ungroup() %>%
  mutate(time_diff = time1 - time2) %>%
  select(id, indicator, time1, time2, time_diff %>% show_query() %>% collect()

（这可能是特定于Impala后端/驱动程序，不确定。）

英文:

I don't know if it's a bug in dbplyr, but forcing the parens should work:

DF2_V1 &lt;- DF1 %&gt;%
  filter(indicator != &quot;N&quot;) %&gt;%
  group_by(id) %&gt;%
  filter(!(&quot;Y&quot; %in% (indicator)) | (indicator == &quot;Y&quot;),
         !(&quot;ANALYSIS&quot; %in% (indicator)) | (indicator != &quot;RECOMMENDED&quot;)) %&gt;%
  filter(time1 == min(time1)) %&gt;%
  ungroup() %&gt;%
  mutate(time_diff = time1 - time2) %&gt;%
  select(id,indicator,time1,time2,time_diff %&gt;% show_query() %&gt;% collect()

(This might be specific to the impala backend/driver, not sure.)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Impala查询（dbplyr）错误：遇到标识符：预期：（

问题

答案1

用R将NA值替换为所有方向的第一个值。

BULK INSERT/UPDATE语句语法错误？

如何自动创建列来识别每个数值变量的异常值。

优化大型 PostgreSQL 表的索引

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。