AWS –extra-py-files 抛出 ModuleNotFoundError: No module named ‘pg8000’

huangapple go评论55阅读模式
英文:

AWS --extra-py-files throwing ModuleNotFoundError: No module named 'pg8000'

问题

我正在尝试在我的Glue脚本中使用pg8000,在Glue作业中的参数如下:

--extra-py-files s3://mybucket/pg8000libs.zip //NOTE: my zip contains __init__.py

一些有关代码的见解:

import sys
import os
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
import boto3
from pyspark.sql import Row
from datetime import datetime, date

zip_path = os.path.join('/tmp', 'pg8000libs.zip')
sys.path.insert(0, zip_path)

def dump_python_path():
    print("python path:", sys.path)

    for path in sys.path:
        if os.path.isdir(path):
            print(f"dir: {path}")
            print("\t" + str(os.listdir(path)))
        print(path)

print(os.listdir('/tmp'))
dump_python_path()
# Import the library
import pg8000

CloudWatch中的Dump:

英文:

I am trying to use pg8000 in my Glue Script, following are params in Glue Job

--extra-py-files	s3://mybucket/pg8000libs.zip  //NOTE: my zip contains __init__.py

Some Insights towards code

import sys
import os
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
import boto3
from pyspark.sql import Row
from datetime import datetime, date

zip_path = os.path.join('/tmp', 'pg8000libs.zip')
sys.path.insert(0, zip_path)


def dump_python_path():
    print("python path:", sys.path)

    for path in sys.path:
        if os.path.isdir(path):
            print(f"dir: {path}")
            print("\t" + str(os.listdir(path)))
        print(path)

print(os.listdir('/tmp'))
dump_python_path()
# Import the library
import pg8000

Dump in cloudwatch

python path: ['/tmp/pg8000libs.zip', '/opt/amazon/bin', '/tmp/pg8000libs.zip', '/opt/amazon/spark/jars/spark-core_2.12-3.1.1-amzn-0.jar', '/opt/amazon/spark/python/lib/pyspark.zip', '/opt/amazon/spark/python/lib/py4j-0.10.9-src.zip', '/opt/amazon/lib/python3.6/site-packages', '/usr/lib64/python37.zip', '/usr/lib64/python3.7', '/usr/lib64/python3.7/lib-dynload', '/home/spark/.local/lib/python3.7/site-packages', '/usr/lib64/python3.7/site-packages', '/usr/lib/python3.7/site-packages']

答案1

得分: 1

在尝试了所有标准方法后,我找到了一种使用 sys.path 的解决方法。通过将当前目录添加到 Python 导入搜索路径,Glue 作业能够成功定位和导入额外的 .py 文件。我将整个目录添加到 Python 路径。以下是我使用的代码示例:

import sys
import os

current_dir = os.path.dirname(os.path.abspath(__file__))
sys.path.append(current_dir)

from utils import *

重要说明:

修改导入搜索路径应谨慎使用,因为它可能引入模块名称冲突或意外导入。建议确保正确的文件组织并进行必要的调整,以获得更稳健和可维护的解决方案。

英文:

After exhausting all the standard approaches, I found a workaround using sys.path. By adding the current directory to the Python import search path, the Glue job was able to locate and import the additional .py file successfully. I added the whole directory to python path. Here's an example of the code I used:

import sys
import os

current_dir = os.path.dirname(os.path.abspath(__file__))
sys.path.append(current_dir)

from utils import *

Important Note:

Modifying the import search path should be used carefully, as it may introduce module name conflicts or unintended imports. It's recommended to ensure proper file organization and make the necessary adjustments for a more robust and maintainable solution.

huangapple
  • 本文由 发表于 2023年2月16日 07:14:39
  • 转载请务必保留本文链接:https://go.coder-hub.com/75466297.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定