英文:
How to define Sagemaker Estimator with entry_point and source_dir once you have your own python package( having setup.py in your root)
问题
我的代码结构如下:
|-my_directory
|----- README.md
|----- setup.py
|----- src
|---------- my_train_script.py
|---------- __init__.py
|----- requirements.txt
我想为训练步骤定义SageMaker估算器。如果我将"my_directory"作为source_dir并将"src/my_train_script.py"作为entry_point传递,我会收到错误消息,指示找不到模块src/my_train_script。如果我们将my_train_script.py移动到根目录并将entry_point设置为my_train_script.py,或者从my目录中删除setup.py,代码将正常工作。这不是最佳解决方案,我希望保留setup.py以供其他用途,是否有正确的方式来定义估算器?
估算器的示例(Tensorflow):
TensorFlow(
entry_point="src/my_train_script.py",
source_dir="my_directory",
role=get_execution_role(),
instance_count=1,
instance_type="ml.m5.2xlarge",
framework_version="2.10.1",
py_version="py39",
debugger_hook_config=None,
disable_profiler=True,
base_job_name="base_job_name",
)
如果我将"my_directory"作为source_dir并将"src/my_train_script.py"作为entry_point传递,我会收到错误消息,指示找不到模块src/my_train_script。如果我们将my_train_script.py移动到根目录并将entry_point设置为my_train_script.py,或者从my目录中删除setup.py,代码将正常工作。
英文:
My code structure is like this:
|-my_directory
|----- README.md
|----- setup.py
|----- src
|---------- my_train_script.py
|---------- __init__.py
|----- requirements.txt
I want to define sagemaker estimator for training step. If I pass "my_directory" as source_dir and
"src/my_train_script.py" as entry_point, I get error saying No module named src/my_train_script
The code work fine if we move my_train_script.py under to root and entry_point=my_train_script.py or we remove setup.py from my directory.
This is not the optimal solution, I want to keep the setup.py for other purposes, is there a right way to define the estimator ?
Example of estimator (Tensorflow)
TensorFlow(
entry_point="src/my_train_script.py",
source_dir="my_directory",
role=get_execution_role(),
instance_count=1,
instance_type="ml.m5.2xlarge",
framework_version="2.10.1",
py_version="py39",
debugger_hook_config=None,
disable_profiler=True,
base_job_name="base_job_name",
)
I want to define sagemaker estimator for training step. If I pass "my_directory" as source_dir and
"src/my_train_script.py" as entry_point, I get error saying No module named src/my_train_script
The code work fine if we move my_train_script.py under to root and entry_point=my_train_script.py or we remove setup.py from my directory.
答案1
得分: 1
根据官方文档的说法:
entry_point
(str或PipelineVariable) -应该作为训练的入口点执行的本地Python源文件的绝对或相对路径。(默认值:无)。 如果指定了
source_dir
,那么entry_point
必须指向位于source_dir
根目录下的文件。如果提供了'git_config','entry_point'应该是Git存储库中Python源文件的相对位置。
将setup.py
文件与训练脚本保持在根目录一起有什么问题?
您可以尝试以以下方式重新考虑文件夹结构:
|-my_directory
|----- README.md
|----- my_train_script.py
|----- utils
|---------- setup.py
|---------- __init__.py
|----- requirements.txt
当然,在my_train_script.py中,您可以使用以下方式调用setup
:
from utils import setup
英文:
As the official documentation says:
> entry_point
(str or PipelineVariable) –
>
> The absolute or relative path to the local Python source file that
> should be executed as the entry point to training. (Default: None). If
> source_dir is specified, then entry_point must point to a file located
> at the root of source_dir. If ‘git_config’ is provided, ‘entry_point’
> should be a relative location to the Python source file in the Git
> repo.
What is the problem with keeping the setup.py file at root level together with the training script?
You can try rethinking the structure of your folder in this way:
|-my_directory
|----- README.md
|----- my_train_script.py
|----- utils
|---------- setup.py
|---------- __init__.py
|----- requirements.txt
and of course inside my_train_script.py you can call setup with
from utils import setup
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论