英文:
Convert spark sql to python spark / Databricks pipeline event logs
问题
我有以下的SQL语句用于查询Databricks管道事件日志,它有效。<br>
我尝试将其重写为Python代码,但失败了。<br>
有人能提供一些建议吗?非常感谢!!
SELECT timestamp, details:user_action:action, details:user_action:user_name
FROM event_log_raw
WHERE event_type = 'user_action'
请注意 这里的details
列是字符串类型,不是结构体或数组。
我尝试的以下解决方案没有起作用<br>
df
是从表event_log_raw
生成的Spark DataFrame
df.filter(df.event_type == 'user_action').select("timestamp", "details:user_action:action", "details:user_action:user_name")
df.filter(df.event_type == 'user_action').select("timestamp", "details.user_action.action", "details.user_action.user_name")
英文:
I have the following sql statement to query the databricks pipeline event logs and it works.<br>
I tried to rewrite it into a python code, but I failed.<br>
Could somebody provide me any advice? Many thanks!!
SELECT timestamp, details:user_action:action, details:user_action:user_name
FROM event_log_raw
WHERE event_type = 'user_action'
Please Note the details column here is string type, not struct nor array
The following solutions I tried didn't work <br>
df
is a spark dataFrame generated from the table event_log_raw
df.filter(df.event_type == 'user_action').select("timestamp", "details:user_action:action", "details:user_action:user_name")
df.filter(df.event_type == 'user_action').select("timestamp", "details.user_action.action", "details.user_action.user_name")
答案1
得分: 1
不要使用select
,而是使用selectExpr
,因为类似details:user_action:action
的字符串是用于从JSON字符串中提取数据的JSON路径表达式(文档):
df.filter(df.event_type == 'user_action')
.selectExpr("timestamp", "details:user_action:action", "details:user_action:user_name")
英文:
Instead of using select
you need to use selectExpr
because the string like details:user_action:action
is a JSON path expression for extracting data from the JSON string (doc):
df.filter(df.event_type == 'user_action')
.selectExpr("timestamp", "details:user_action:action", "details:user_action:user_name")
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论