英文: What is the purpose of Apache Spark job, task and stage? 问题 我正在学习Apache Spark,并想了解Spark历史记录中每个任务...
如何根据日期范围筛选Parquet分区?
英文: How to filter parquet partitions based on date range? 问题 我已经分区了Parquet数据: dir/batch_date=2023-02...
How to Iterate though scala dataframe rows and store the column name in variables which can be used for some opertions inside for loop?
英文: How to Iterate though scala dataframe rows and store the column name in variables which can be u...
Not able to write spark dataframe. Error Found nested NullType in column 'colname' which is of ArrayType
英文: Not able to write spark dataframe. Error Found nested NullType in column 'colname' which...
如何将以字符串格式表示的字典转换为Scala中的表格数据框?
英文: How to convert a dictionary which is in string format to tabular dataframe in scala? 问题 我有一个返回字符...
Unable to write to redshift via PySpark.
英文: Unable to write to redshift via PySpark 问题 我尝试使用PySpark写入Redshift。我的Spark版本是3.2.0,使用的Scala版本是2.1...
Spark 3.2读取JSON文件时出现驱动程序GC问题,而在Spark 2.3中正常工作。
英文: Spark 3.2 driver GC while reading JSON file, same works in spark 2.3 问题 I am running a simple fi...
有没有更有效的方法来使用Pyspark筛选上个月(或X个上个月)的数据?
英文: Is there a more efficient way to filter previous month's (or X previous months') data us...
获取列中数组的相关矩阵
英文: Get correlation matrix for array in a column 问题 我理解你想要的是计算相关性矩阵,交叉id列,不同的天,根据交叉的数量来填充矩阵,如果标签与自身交...
How can we read historical data using databricks from kinesis or kafka by specifying starting and ending time stamp?
英文: How can we read historical data using databricks from kinesis or kafka by specifying starting an...
49