AttributeError: ‘NoneType’ object has no attribute ‘randomSplit’

huangapple go评论67阅读模式
英文:

AttributeError: 'NoneType' object has no attribute 'randomSplit'

问题

我在尝试在pySpark中执行randomSplit时一直收到错误。

我已经添加了这些依赖项:

#步骤1:安装依赖
!apt-get install openjdk-8-jdk-headless -qq > /dev/null
!wget -q https://archive.apache.org/dist/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz
!tar xf spark-3.3.0-bin-hadoop3.tgz
!pip install -q findspark

#步骤2:添加环境变量
import os
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
os.environ["SPARK_HOME"] = "spark-3.3.0-bin-hadoop3"

#步骤3:初始化Pyspark
import findspark
findspark.init()

创建了pySpark环境:

#创建Spark上下文
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('lr_example').getOrCreate()

并添加了这些:

# 导入VectorAssembler和Vectors
from pyspark.ml.linalg import Vectors
from pyspark.ml.feature import VectorAssembler
from pyspark.ml.regression import LinearRegression

然而,每次我运行以下代码时:

final_df = output.select("features", "medv").show()
train_data, test_data = final_df.randomSplit([0.7, 0.3])

我收到以下错误:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-76-e27b8ca71b51> in <cell line: 1>()
----> 1 train_data, test_data = final_df.randomSplit([0.7, 0.3])

AttributeError: 'NoneType' object has no attribute 'randomSplit'

有什么想法吗?我搜索了需要导入的内容,似乎已经拥有一切,但它无法加载。GitHub文档链接

英文:

I keep receiving an error when trying to randomSplit in pySpark.

I've added these dependencies:

#Step 1: Install Dependencies
!apt-get install openjdk-8-jdk-headless -qq &gt; /dev/null
!wget -q https://archive.apache.org/dist/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz
!tar xf spark-3.3.0-bin-hadoop3.tgz
!pip install -q findspark

#Step 2: Add environment variables
import os
os.environ[&quot;JAVA_HOME&quot;] = &quot;/usr/lib/jvm/java-8-openjdk-amd64&quot;
os.environ[&quot;SPARK_HOME&quot;] = &quot;spark-3.3.0-bin-hadoop3&quot;

#Step 3: Initialize Pyspark
import findspark
findspark.init()

Created the pySpark environment:

#creating spark context
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName(&#39;lr_example&#39;).getOrCreate()

and added these:

# Import VectorAssembler and Vectors
from pyspark.ml.linalg import Vectors
from pyspark.ml.feature import VectorAssembler
from pyspark.ml.regression import LinearRegression

However, every time I run this:

final_df = output.select(&quot;features&quot;, &quot;medv&quot;).show()
train_data, test_data = final_df.randomSplit([0.7, 0.3])

I get this:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
&lt;ipython-input-76-e27b8ca71b51&gt; in &lt;cell line: 1&gt;()
----&gt; 1 train_data, test_data = final_df.randomSplit([0.7, 0.3])

AttributeError: &#39;NoneType&#39; object has no attribute &#39;randomSplit&#39;

Any ideas? I searched around for what needs to be imported and it seems I have everything but it won't load. Link to Github doc

答案1

得分: 1

final_df = output.select("features", "medv").show()

final_df = output.select("features", "medv") # create df
final_df.show() # print it

英文:

you left out the only important line

final_df = output.select(&quot;features&quot;, &quot;medv&quot;).show()

show prints the results but returns None ... so you are setting final_df to none

instead

final_df = output.select(&quot;features&quot;, &quot;medv&quot;) # create df
final_df.show() # print it

huangapple
  • 本文由 发表于 2023年4月4日 09:56:24
  • 转载请务必保留本文链接:https://go.coder-hub.com/75924947.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定