英文:
Using column after renaming it in Apache Spark
问题
我正在尝试理解为什么Spark在某种相同的情况下表现不同。我重命名了两列并尝试在某些计算中使用它们,但一条语句抛出了无法找到重命名列的错误。以下是代码:
intermediateDF = intermediateDF.drop("GEO.id")
.withColumnRenamed("GEO.id2", "id")
.withColumnRenamed("GEO.display-label", "label")
.withColumn("stateid", functions.expr("int(id/1000)"))
.withColumn("countyId", functions.expr("id%1000"))
//.withColumn("countyState", functions.split(intermediateDF.col("label"), ","))
.withColumnRenamed("rescen42010", "real2010")
.drop("resbase42010")
.withColumnRenamed("respop72010", "est2010")
.withColumnRenamed("respop72011", "est2011")
.withColumnRenamed("respop72012", "est2012")
.withColumnRenamed("respop72013", "est2013")
.withColumnRenamed("respop72014", "est2014")
.withColumnRenamed("respop72015", "est2015")
.withColumnRenamed("respop72016", "est2016")
.withColumnRenamed("respop72017", "est2017");
被注释掉的那一行是引发下面错误的行:
Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot resolve column name "label" among (GEO.id, GEO.id2, GEO.display-label, rescen42010, resbase42010, respop72010, respop72011, respop72012, respop72013, respop72014, respop72015, respop72016, respop72017);
有人可以帮忙解释为什么Spark能够找到一个重命名的列(从GEO.id2
到id
),对其进行计算,但在另一个列(从GEO.display-label
到label
)上失败吗?我正在使用Apache Spark 3和Java。谢谢。
英文:
I am trying to understand why Spark is behaving differently in somewhat same scenario.I renamed two columns and tried to use both of them in some calculation but one statement is throwing en error with unable to find the renamed column .Below is the code
intermediateDF = intermediateDF.drop("GEO.id")
.withColumnRenamed("GEO.id2", "id")
.withColumnRenamed("GEO.display-label", "label")
.withColumn("stateid", functions.expr("int(id/1000)"))
.withColumn("countyId", functions.expr("id%1000"))
//.withColumn("countyState", functions.split(intermediateDF.col("label"), ","))
.withColumnRenamed("rescen42010", "real2010")
.drop("resbase42010")
.withColumnRenamed("respop72010", "est2010")
.withColumnRenamed("respop72011", "est2011")
.withColumnRenamed("respop72012", "est2012")
.withColumnRenamed("respop72013", "est2013")
.withColumnRenamed("respop72014", "est2014")
.withColumnRenamed("respop72015", "est2015")
.withColumnRenamed("respop72016", "est2016")
.withColumnRenamed("respop72017", "est2017")
The line commented out is the one that is throwing below error
Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot resolve column name "label" among (GEO.id, GEO.id2, GEO.display-label, rescen42010, resbase42010, respop72010, respop72011, respop72012, respop72013, respop72014, respop72015, respop72016, respop72017);
Can someone please help me out in understanding why Spark can find one renamed column(from GEO.id2
to id
), runs calculations on it
but fails on other (from GEO.display-label to label). I am using Apache Spark 3 with Java.Thanks
答案1
得分: 0
尝试使用这个语法:
.withColumn("countyState", functions.split(col("label"), ","))
它应该正常工作。
英文:
Try this syntax:
.withColumn("countyState", functions.split(col("label"), ","))
It should work just fine.
答案2
得分: 0
这是代码部分,无需翻译:
intermediateDF.select( \
col("GEO.id2").alias("id"), \
functions.expr("int(id/1000)").alias("stateid"), \
functions.expr("id%1000").alias("countyId"), \
split(col("GEO.display-label"),",").alias("countyState"), \
col("rescen42010").as("real2010"), \
col("respop72010").alias("est2010"), \
col("respop72011").alias("est2011"), \
col("respop72012").alias("est2012"), \
col("respop72013").alias("est2013"), \
col("respop72014").alias("est2014"), \
col("respop72015").alias("est2015"), \
col("respop72016").alias("est2016"), \
col("respop72017").alias("est2017"))
英文:
Check below code.
intermediateDF.select( \
col("GEO.id2").alias("id"), \
functions.expr("int(id/1000)").alias("stateid"), \
functions.expr("id%1000").alias("countyId"), \
split(col("GEO.display-label"),",").alias("countyState"), \
col("rescen42010").as("real2010"), \
col("respop72010").alias("est2010"), \
col("respop72011").alias("est2011"), \
col("respop72012").alias("est2012"), \
col("respop72013").alias("est2013"), \
col("respop72014").alias("est2014"), \
col("respop72015").alias("est2015"), \
col("respop72016").alias("est2016"), \
col("respop72017").alias("est2017"))
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论