pytest unittest spark java.io.FileNotFoundException: HADOOP_HOME 和 hadoop.home.dir 未设置

huangapple go评论91阅读模式
英文:

pytest unittest spark java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset

问题

使用pytest运行pyspark代码的单元测试。以下是来自给定代码的代码片段示例。看起来需要Spark运行时或Hadoop运行时库,但我认为单元测试实际上不需要Spark库。只需pyspark Python包足够,因为像Jenkins这样的工具不会安装Spark运行时。请指导

def read_inputfile_from_ADLS(self):
    try:
        if self.segment == "US":
            if self.input_path_2 is None or self.input_path_2 == "":
                df = self.spark.read.format("delta").load(self.input_path)
            else:
                df = self.spark.read.format("delta").load(self.input_path_2)
    except Exception as e: 
        resultmsg = "error reading input file"

# Pytest code

import pytest
from unittest.mock import patch, MagicMock, Mock

class TestInputPreprocessor:
    inpprcr = None
    dataframe_reader = 'pyspark.sql.readwriter.DataFrameReader'

    def test_read_inputfile_from_ADLS(self, spark, tmp_path):
        self.segment = 'US'
        self.input_path_2 = tmp_path 
        with patch(f'{self.dataframe_reader}.format', MagicMock(autospec=True)) as mock_adls_read:
            self.inpprcr.read_inputfile_from_ADLS()
            assert mock_adls_read.call_count == 1

错误:

AssertionError
---------------------------------------------- Captured stderr setup ------------------- 
--------------------------- 
23/07/12 23:58:42 WARN Shell: Did not find winutils.exe: java.io.FileNotFoundException: 
java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset. -see 
https://wiki.apache.org/hadoop/WindowsProblems
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
23/07/12 23:58:42 WARN NativeCodeLoader: Unable to load native-hadoop library for your 
platform... using builtin-java classes where applicable
英文:

Running unit testing using pytest for pyspark code. Code snippet sample from code given below. Looks like spark runtime or hadoop runtime libraries expected , but i thought unit testing does not really need spark libraries. Just pyspark python package is enough because tools like Jenkins won't have spark runtime installed. Please guide

    def read_inputfile_from_ADLS(self):
    try:
        if self.segment == "US":
            if self.input_path_2 is None or self.input_path_2 == "":
                df = self.spark.read.format("delta").load(self.input_path)
            else:
                df = self.spark.read.format("delta").load(self.input_path_2)
    except Exception as e: 
        resultmsg = "error reading input file"

Pytest code

import pytest
from unittest.mock import patch,MagicMock , Mock

class TestInputPreprocessor:
    inpprcr = None
    dataframe_reader = 'pyspark.sql.readwriter.DataFrameReader'

    def test_read_inputfile_from_ADLS(self,spark,tmp_path):
        self.segment = 'US'
        self.input_path_2 = tmp_path 
        with patch(f'{self.dataframe_reader}.format', MagicMock(autospec=True)) as 
           mock_adls_read:
            self.inpprcr.read_inputfile_from_ADLS()
            assert mock_adls_read.call_count == 1

Error:

AssertionError
---------------------------------------------- Captured stderr setup ------------------- 
--------------------------- 
23/07/12 23:58:42 WARN Shell: Did not find winutils.exe: java.io.FileNotFoundException: 
java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset. -see 
https://wiki.apache.org/hadoop/WindowsProblems
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
23/07/12 23:58:42 WARN NativeCodeLoader: Unable to load native-hadoop library for your 
platform... using builtin-java classes where applicable

答案1

得分: 1

已解决此问题。必须下载winutils.exe并映射到HADOOP_HOME,SPARK_HOME到Python lib中的pyspark位置
'C:\Users<networkid>\AppData\Local\Programs\Python\Python310\Lib\site-packages\pyspark'
本地笔记本电脑上无需安装Hadoop或Spark进行单元测试。

英文:

Fixed this issue. Have to download winutils.exe and map to HADOOP_HOME , SPARK_HOME to pyspark location in python lib
'C:\Users&lt;networkid>\AppData\Local\Programs\Python\Python310\Lib\site-packages\pyspark'

No need to install Hadoop or Spark on local laptop for unit testing

huangapple
  • 本文由 发表于 2023年7月13日 12:10:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/76675858.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定