将Spark SQL转换为Python Spark / Databricks管道事件日志。

huangapple go评论72阅读模式
英文:

Convert spark sql to python spark / Databricks pipeline event logs

问题

我有以下的SQL语句用于查询Databricks管道事件日志,它有效。<br>
我尝试将其重写为Python代码,但失败了。<br>

有人能提供一些建议吗?非常感谢!!

SELECT timestamp, details:user_action:action, details:user_action:user_name
FROM event_log_raw 
WHERE event_type = 'user_action'

请注意 这里的details列是字符串类型,不是结构体或数组。

我尝试的以下解决方案没有起作用<br>
df是从表event_log_raw生成的Spark DataFrame

df.filter(df.event_type == 'user_action').select("timestamp", "details:user_action:action", "details:user_action:user_name") 
df.filter(df.event_type == 'user_action').select("timestamp", "details.user_action.action", "details.user_action.user_name") 
英文:

I have the following sql statement to query the databricks pipeline event logs and it works.<br>
I tried to rewrite it into a python code, but I failed.<br>

Could somebody provide me any advice? Many thanks!!

SELECT timestamp, details:user_action:action, details:user_action:user_name
FROM event_log_raw 
WHERE event_type = &#39;user_action&#39;

Please Note the details column here is string type, not struct nor array

The following solutions I tried didn't work <br>
df is a spark dataFrame generated from the table event_log_raw

df.filter(df.event_type == &#39;user_action&#39;).select(&quot;timestamp&quot;, &quot;details:user_action:action&quot;, &quot;details:user_action:user_name&quot;) 
df.filter(df.event_type == &#39;user_action&#39;).select(&quot;timestamp&quot;, &quot;details.user_action.action&quot;, &quot;details.user_action.user_name&quot;) 

答案1

得分: 1

不要使用select,而是使用selectExpr,因为类似details:user_action:action的字符串是用于从JSON字符串中提取数据的JSON路径表达式(文档):

df.filter(df.event_type == &#39;user_action&#39;)
  .selectExpr(&quot;timestamp&quot;, &quot;details:user_action:action&quot;, &quot;details:user_action:user_name&quot;)
英文:

Instead of using select you need to use selectExpr because the string like details:user_action:action is a JSON path expression for extracting data from the JSON string (doc):

df.filter(df.event_type == &#39;user_action&#39;)
  .selectExpr(&quot;timestamp&quot;, &quot;details:user_action:action&quot;, &quot;details:user_action:user_name&quot;)

huangapple
  • 本文由 发表于 2023年6月2日 11:39:05
  • 转载请务必保留本文链接:https://go.coder-hub.com/76386975.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定