英文:
AWS --extra-py-files throwing ModuleNotFoundError: No module named 'pg8000'
问题
我正在尝试在我的Glue脚本中使用pg8000,在Glue作业中的参数如下:
--extra-py-files s3://mybucket/pg8000libs.zip //NOTE: my zip contains __init__.py
一些有关代码的见解:
import sys
import os
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
import boto3
from pyspark.sql import Row
from datetime import datetime, date
zip_path = os.path.join('/tmp', 'pg8000libs.zip')
sys.path.insert(0, zip_path)
def dump_python_path():
print("python path:", sys.path)
for path in sys.path:
if os.path.isdir(path):
print(f"dir: {path}")
print("\t" + str(os.listdir(path)))
print(path)
print(os.listdir('/tmp'))
dump_python_path()
# Import the library
import pg8000
CloudWatch中的Dump:
英文:
I am trying to use pg8000 in my Glue Script, following are params in Glue Job
--extra-py-files s3://mybucket/pg8000libs.zip //NOTE: my zip contains __init__.py
Some Insights towards code
import sys
import os
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
import boto3
from pyspark.sql import Row
from datetime import datetime, date
zip_path = os.path.join('/tmp', 'pg8000libs.zip')
sys.path.insert(0, zip_path)
def dump_python_path():
print("python path:", sys.path)
for path in sys.path:
if os.path.isdir(path):
print(f"dir: {path}")
print("\t" + str(os.listdir(path)))
print(path)
print(os.listdir('/tmp'))
dump_python_path()
# Import the library
import pg8000
Dump in cloudwatch
python path: ['/tmp/pg8000libs.zip', '/opt/amazon/bin', '/tmp/pg8000libs.zip', '/opt/amazon/spark/jars/spark-core_2.12-3.1.1-amzn-0.jar', '/opt/amazon/spark/python/lib/pyspark.zip', '/opt/amazon/spark/python/lib/py4j-0.10.9-src.zip', '/opt/amazon/lib/python3.6/site-packages', '/usr/lib64/python37.zip', '/usr/lib64/python3.7', '/usr/lib64/python3.7/lib-dynload', '/home/spark/.local/lib/python3.7/site-packages', '/usr/lib64/python3.7/site-packages', '/usr/lib/python3.7/site-packages']
答案1
得分: 1
在尝试了所有标准方法后,我找到了一种使用 sys.path 的解决方法。通过将当前目录添加到 Python 导入搜索路径,Glue 作业能够成功定位和导入额外的 .py 文件。我将整个目录添加到 Python 路径。以下是我使用的代码示例:
import sys
import os
current_dir = os.path.dirname(os.path.abspath(__file__))
sys.path.append(current_dir)
from utils import *
重要说明:
修改导入搜索路径应谨慎使用,因为它可能引入模块名称冲突或意外导入。建议确保正确的文件组织并进行必要的调整,以获得更稳健和可维护的解决方案。
英文:
After exhausting all the standard approaches, I found a workaround using sys.path. By adding the current directory to the Python import search path, the Glue job was able to locate and import the additional .py file successfully. I added the whole directory to python path. Here's an example of the code I used:
import sys
import os
current_dir = os.path.dirname(os.path.abspath(__file__))
sys.path.append(current_dir)
from utils import *
Important Note:
Modifying the import search path should be used carefully, as it may introduce module name conflicts or unintended imports. It's recommended to ensure proper file organization and make the necessary adjustments for a more robust and maintainable solution.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论