英文:
How to convert text file to spark dataframe using spark scala UDF functions
问题
我有以下格式的输入数据:
id###name##salary#dept
1##John#10000########IT
2####Mindhack Diva#20000########IT
3####Michel#30000########IT
4###Ryan#40000########IT
5####Sahoo#10000########IT
如何使用Spark Scala将文本文件转换为数据框?
我需要输出如下数据框,有人可以帮助我吗?
英文:
I have a input data like below:
id###name##salary#dept
1##John#10000########IT
2####Mindhack Diva#20000########IT
3####Michel#30000########IT
4###Ryan#40000########IT
5####Sahoo#10000########IT
How to convert text file to dataframe using spark scala?
I need output like the below dataframe, can anyone pls help me on this:
答案1
得分: 0
PySpark
我知道如何使用PySpark获取结果,这可能对您有所帮助或不帮助。
import re
rdd = sc.textFile('test.txt').map(lambda r: re.split('[#]+', r))
cols = rdd.first()
df = spark.createDataFrame(rdd.filter(lambda r: r != cols)).toDF(*cols)
df.show(truncate=False)
+---+-------------+------+----+
|id |name |salary|dept|
+---+-------------+------+----+
|1 |John |10000 |IT |
|2 |Mindhack Diva|20000 |IT |
|3 |Michel |30000 |IT |
|4 |Ryan |40000 |IT |
|5 |Sahoo |10000 |IT |
+---+-------------+------+----+
英文:
PySpark
I know how to get the result by pyspark and this might help you or not.
import re
rdd = sc.textFile('test.txt').map(lambda r: re.split('[#]+', r))
cols = rdd.first()
df = spark.createDataFrame(rdd.filter(lambda r: r != cols)).toDF(*cols)
df.show(truncate=False)
+---+-------------+------+----+
|id |name |salary|dept|
+---+-------------+------+----+
|1 |John |10000 |IT |
|2 |Mindhack Diva|20000 |IT |
|3 |Michel |30000 |IT |
|4 |Ryan |40000 |IT |
|5 |Sahoo |10000 |IT |
+---+-------------+------+----+
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论