vall-e model throws 'NoneType' object has no attribute 'optimizer_name' error while training, how to fix it?

huangapple go评论66阅读模式
英文:

vall-e model throws 'NoneType' object has no attribute 'optimizer_name' error while training, how to fix it?

问题

I'm trying to use the vall-e model available on this github repo: https://github.com/enhuiz/vall-e

I clone the repo with the command
git clone --recurse-submodules https://github.com/enhuiz/vall-e.git

I create a virtual environment with the command
python3 -m venv vall_e_env

I prepare my environment with the command
source vall_e_env/bin/activate
then
pip install .

finally I use the commands provided in the read me :
python -m vall_e.emb.qnt data/test
python -m vall_e.emb.qnt data/test

and finally

python -m vall_e.emb.qnt data/test

only this last command produces the following error:

2it [00:00, 6961.50it/s]
2023-05-31 16:58:32 - vall_e.data - INFO - GR=0;LR=0 -
{'</s>': 1, '<s>': 2, 'AH0': 3, 'D': 4, 'ER1': 5, 'HH': 6, 'L': 7, 'OW1': 8, 'W': 9, '_': 10}
2023-05-31 16:58:32 - vall_e.data - INFO - GR=0;LR=0 -
{ 'test': 0}
2023-05-31 16:58:32 - vall_e.data - INFO - GR=0;LR=0 -
#samples (train): 2.
2023-05-31 16:58:32 - vall_e.data - INFO - GR=0;LR=0 -
#samples (val): 0.
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/leandre/Info/OpenValue/vall-e/vall_e/train.py", line 128, in <module>
    main()
  File "/home/leandre/Info/OpenValue/vall-e/vall_e/train.py", line 119, in main
    trainer.train(
  File "/home/leandre/Info/OpenValue/vall-e/vall_e/utils/trainer.py", line 125, in train
    engines = engines_loader()
  File "/home/leandre/Info/OpenValue/vall-e/vall_e/train.py", line 21, in load_engines
    model=trainer.Engine(
  File "/home/leandre/Info/OpenValue/vall-e/vall_e/utils/engines.py", line 22, in __init__
    super().__init__(None, *args, **kwargs)
  File "/home/leandre/Info/OpenValue/vall-e/vall_e_env/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 244, in __init__
    self._do_sanity_check()
  File "/home/leandre/Info/OpenValue/vall-e/vall_e_env/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 974, in _do_sanity_check
    if self.optimizer_name() is not None:
  File "/home/leandre/Info/OpenValue/vall-e/vall_e_env/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 628, in optimizer_name
    return (self.client_optimizer.__class__.__name__ if self.client_optimizer else self._config.optimizer_name)
AttributeError: 'NoneType' object has no attribute 'optimizer_name'

newer error of python -m vall_e.train yaml=config/test/ar.yml after the first answer

2it [00:00, 6743.25it/s]
2023-05-31 17:25:31 - vall_e.data - INFO - GR=0;LR=0 -
{'</s>': 1, '<s>': 2, 'AH0': 3, 'D': 4, 'ER1': 5, 'HH': 6, 'L': 7, 'OW1': 8, 'W': 9, '_': 10}
2023-05-31 17:25:31 - vall_e.data - INFO - GR=0;LR=0 -
{'test': 0}
2023-05-31 17:25:31 - vall_e.data - INFO - GR=0;LR=0 -
#samples (train): 2.
2023-05-31 17:25:31 - vall_e.data - INFO - GR=0;LR=0 -
#samples (val): 0.
[2023-05-31 17:25:31,146] [INFO] [comm.py:654:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
2023-05-31 17:25:31 - torch.distributed.distributed_c10d - INFO - GR=0;LR=0 -
Added key: store_based_barrier_key:1 to store for rank: 0
2023-05-31 17:25:31 - torch.distributed.distributed_c10d - INFO - GR=0;LR=0 -
Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.
2023-05-31 17:25:32 - torch.distributed.distributed_c10d - INFO - GR=0;LR=0 -
Added key: store_based_barrier_key:2 to store for rank: 0
2023-05-31 17:25:32 - torch.distributed.distributed_c10d - INFO - GR=0;LR=0 -
Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 1 nodes.
[2023-05-31 17:25:32,243] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
Installed CUDA version 11.5 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /home/leandre/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Creating extension directory /home/leandre/.cache/torch_extensions/py310_cu117/fused_adam...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/leandre/Info/OpenValue/vall-e/vall_e_env/lib/python3.10/site-packages/deepspeed/ops/csrc/adam/build.ninja...
Building extension module fused_adam...
Allowing ninja to set a default number of workers... (

<details>
<summary>英文:</summary>

I&#39;m trying to use the vall-e model available on this github repo: https://github.com/enhuiz/vall-e

I clone the repo with the command 
``` git clone --recurse-submodules https://github.com/enhuiz/vall-e.git ```

I create a virtual environment with the command 
``` python3 -m venv vall_e_env ```

I prepare my environment with the command 
``` source vall_e_env/bin/activate ```
then 
``` pip install . ```

finally I use the commands provided in the read me :
```python -m vall_e.emb.qnt data/test```
```python -m vall_e.emb.qnt data/test```

and finally 

python -m vall_e.emb.qnt data/test

only this last command produces the following error: 

2it [00:00, 6961.50it/s]
2023-05-31 16:58:32 - vall_e.data - INFO - GR=0;LR=0 -
{'</s>': 1, '<s>': 2, 'AH0': 3, 'D': 4, 'ER1': 5, 'HH': 6, 'L': 7, 'OW1': 8, 'W': 9, '_': 10}
2023-05-31 16:58:32 - vall_e.data - INFO - GR=0;LR=0 -
{ 'test': 0}
2023-05-31 16:58:32 - vall_e.data - INFO - GR=0;LR=0 -
#samples (train): 2.
2023-05-31 16:58:32 - vall_e.data - INFO - GR=0;LR=0 -
#samples (val): 0.
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/leandre/Info/OpenValue/vall-e/vall_e/train.py", line 128, in <module>
main()
File "/home/leandre/Info/OpenValue/vall-e/vall_e/train.py", line 119, in main
trainer.train(
File "/home/leandre/Info/OpenValue/vall-e/vall_e/utils/trainer.py", line 125, in train
engines = engines_loader()
File "/home/leandre/Info/OpenValue/vall-e/vall_e/train.py", line 21, in load_engines
model=trainer.Engine(
File "/home/leandre/Info/OpenValue/vall-e/vall_e/utils/engines.py", line 22, in init
super().init(None, *args, **kwargs)
File "/home/leandre/Info/OpenValue/vall-e/vall_e_env/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 244, in init
self._do_sanity_check()
File "/home/leandre/Info/OpenValue/vall-e/vall_e_env/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 974, in _do_sanity_check
if self.optimizer_name() is not None:
File "/home/leandre/Info/OpenValue/vall-e/vall_e_env/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 628, in optimizer_name
return (self.client_optimizer.class.name if self.client_optimizer else self._config.optimizer_name)
AttributeError: 'NoneType' object has no attribute 'optimizer_name'


newer error of `python -m vall_e.train yaml=config/test/ar.yml` after the first answer

2it [00:00, 6743.25it/s]
2023-05-31 17:25:31 - vall_e.data - INFO - GR=0;LR=0 -
{'</s>': 1, '<s>': 2, 'AH0': 3, 'D': 4, 'ER1': 5, 'HH': 6, 'L': 7, 'OW1': 8, 'W': 9, '_': 10}
2023-05-31 17:25:31 - vall_e.data - INFO - GR=0;LR=0 -
{'test': 0}
2023-05-31 17:25:31 - vall_e.data - INFO - GR=0;LR=0 -
#samples (train): 2.
2023-05-31 17:25:31 - vall_e.data - INFO - GR=0;LR=0 -
#samples (val): 0.
[2023-05-31 17:25:31,146] [INFO] [comm.py:654:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
2023-05-31 17:25:31 - torch.distributed.distributed_c10d - INFO - GR=0;LR=0 -
Added key: store_based_barrier_key:1 to store for rank: 0
2023-05-31 17:25:31 - torch.distributed.distributed_c10d - INFO - GR=0;LR=0 -
Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.
2023-05-31 17:25:32 - torch.distributed.distributed_c10d - INFO - GR=0;LR=0 -
Added key: store_based_barrier_key:2 to store for rank: 0
2023-05-31 17:25:32 - torch.distributed.distributed_c10d - INFO - GR=0;LR=0 -
Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 1 nodes.
[2023-05-31 17:25:32,243] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
Installed CUDA version 11.5 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /home/leandre/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Creating extension directory /home/leandre/.cache/torch_extensions/py310_cu117/fused_adam...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/leandre/.cache/torch_extensions/py310_cu117/fused_adam/build.ninja...
Building extension module fused_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/3] /usr/bin/nvcc -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=&quot;gcc&quot; -DPYBIND11_STDLIB=&quot;libstdcpp&quot; -DPYBIND11_BUILD_ABI=&quot;cxxabi1011&quot; -I/home/leandre/Info/OpenValue/vall-e/vall_e_env/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -I/home/leandre/Info/OpenValue/vall-e/vall_e_env/lib/python3.10/site-packages/deepspeed/ops/csrc/adam -isystem /home/leandre/Info/OpenValue/vall-e/vall_e_env/lib/python3.10/site-packages/torch/include -isystem /home/leandre/Info/OpenValue/vall-e/vall_e_env/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/leandre/Info/OpenValue/vall-e/vall_e_env/lib/python3.10/site-packages/torch/include/TH -isystem /home/leandre/Info/OpenValue/vall-e/vall_e_env/lib/python3.10/site-packages/torch/include/THC -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_61,code=compute_61 -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -lineinfo --use_fast_math -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_61,code=compute_61 -std=c++14 -c /home/leandre/Info/OpenValue/vall-e/vall_e_env/lib/python3.10/site-packages/deepspeed/ops/csrc/adam/multi_tensor_adam.cu -o multi_tensor_adam.cuda.o
FAILED: multi_tensor_adam.cuda.o
/usr/bin/nvcc -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=&quot;gcc&quot; -DPYBIND11_STDLIB=&quot;libstdcpp&quot; -DPYBIND11_BUILD_ABI=&quot;cxxabi1011&quot; -I/home/leandre/Info/OpenValue/vall-e/vall_e_env/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -I/home/leandre/Info/OpenValue/vall-e/vall_e_env/lib/python3.10/site-packages/deepspeed/ops/csrc/adam -isystem /home/leandre/Info/OpenValue/vall-e/vall_e_env/lib/python3.10/site-packages/torch/include -isystem /home/leandre/Info/OpenValue/vall-e/vall_e_env/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/leandre/Info/OpenValue/vall-e/vall_e_env/lib/python3.10/site-packages/torch/include/TH -isystem /home/leandre/Info/OpenValue/vall-e/vall_e_env/lib/python3.10/site-packages/torch/include/THC -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_61,code=compute_61 -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -lineinfo --use_fast_math -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_61,code=compute_61 -std=c++14 -c /home/leandre/Info/OpenValue/vall-e/vall_e_env/lib/python3.10/site-packages/deepspeed/ops/csrc/adam/multi_tensor_adam.cu -o multi_tensor_adam.cuda.o
/usr/include/c++/11/bits/std_function.h:435:145: error: parameter packs not expanded with ‘...’:
435 | function(_Functor&& __f)
| ^
/usr/include/c++/11/bits/std_function.h:435:145: note: ‘_ArgTypes’
/usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with ‘...’:
530 | operator=(_Functor&& __f)
| ^
/usr/include/c++/11/bits/std_function.h:530:146: note: ‘_ArgTypes’
[2/3] c++ -MMD -MF fused_adam_frontend.o.d -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=&quot;_gcc&quot; -DPYBIND11_STDLIB=&quot;_libstdcpp&quot; -DPYBIND11_BUILD_ABI=&quot;_cxxabi1011&quot; -I/home/leandre/Info/OpenValue/vall-e/vall_e_env/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -I/home/leandre/Info/OpenValue/vall-e/vall_e_env/lib/python3.10/site-packages/deepspeed/ops/csrc/adam -isystem /home/leandre/Info/OpenValue/vall-e/vall_e_env/lib/python3.10/site-packages/torch/include -isystem /home/leandre/Info/OpenValue/vall-e/vall_e_env/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/leandre/Info/OpenValue/vall-e/vall_e_env/lib/python3.10/site-packages/torch/include/TH -isystem /home/leandre/Info/OpenValue/vall-e/vall_e_env/lib/python3.10/site-packages/torch/include/THC -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++14 -g -Wno-reorder -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -c /home/leandre/Info/OpenValue/vall-e/vall_e_env/lib/python3.10/site-packages/deepspeed/ops/csrc/adam/fused_adam_frontend.cpp -o fused_adam_frontend.o
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/home/leandre/Info/OpenValue/vall-e/vall_e_env/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1900, in _run_ninja_build
subprocess.run(
File "/usr/lib/python3.10/subprocess.py", line 524, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/leandre/Info/OpenValue/vall-e/vall_e/train.py", line 128, in <module>
main()
File "/home/leandre/Info/OpenValue/vall-e/vall_e/train.py", line 119, in main
trainer.train(
File "/home/leandre/Info/OpenValue/vall-e/vall_e/utils/trainer.py", line 125, in train
engines = engines_loader()
File "/home/leandre/Info/OpenValue/vall-e/vall_e/train.py", line 21, in load_engines
model=trainer.Engine(
File "/home/leandre/Info/OpenValue/vall-e/vall_e/utils/engines.py", line 22, in init
super().init(None, *args, **kwargs)
File "/home/leandre/Info/OpenValue/vall-e/vall_e_env/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 330, in init
self._configure_optimizer(optimizer, model_parameters)
File "/home/leandre/Info/OpenValue/vall-e/vall_e_env/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1195, in _configure_optimizer
basic_optimizer = self._configure_basic_optimizer(model_parameters)
File "/home/leandre/Info/OpenValue/vall-e/vall_e_env/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1272, in _configure_basic_optimizer
optimizer = FusedAdam(
File "/home/leandre/Info/OpenValue/vall-e/vall_e_env/lib/python3.10/site-packages/deepspeed/ops/adam/fused_adam.py", line 72, in init
fused_adam_cuda = FusedAdamBuilder().load()
File "/home/leandre/Info/OpenValue/vall-e/vall_e_env/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 460, in load
return self.jit_load(verbose)
File "/home/leandre/Info/OpenValue/vall-e/vall_e_env/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 495, in jit_load
op_module = load(
File "/home/leandre/Info/OpenValue/vall-e/vall_e_env/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1284, in load
return _jit_compile(
File "/home/leandre/Info/OpenValue/vall-e/vall_e_env/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1508, in _jit_compile
_write_ninja_file_and_build_library(
File "/home/leandre/Info/OpenValue/vall-e/vall_e_env/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1623, in _write_ninja_file_and_build_library
_run_ninja_build(
File "/home/leandre/Info/OpenValue/vall-e/vall_e_env/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1916, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'fused_adam'


Does anyone know where the error may have come from?

</details>


# 答案1
**得分**: 1

你正在使用一个与存储库版本不同的库。由于他没有使用 venv 创建存储库,你没有正确安装这些库。尝试从他的 setup.py 创建一个 requirements.txt 文件。

        coloredlogs~=15.0.1
        deepspeed~=0.7.7
        diskcache~=5.4.0
        einops~=0.6.0
        encodec~=0.1.1
        g2p_en~=2.1.0
        humanize~=4.4.0
        matplotlib~=3.6.0
        numpy~=1.23.3
        omegaconf~=2.2.3
        openTSNE~=0.6.2
        pandas~=1.5.0
        soundfile~=0.11.0
        torch~=1.13.0
        torchaudio~=0.13.0
        tqdm~=4.64.1

现在,删除你的 venv,重新创建它,然后运行 `pip -r install requirements.txt`。

<details>
<summary>英文:</summary>

You are using a library that has different version than the repo. Since he&#39;s not creating the repo with venv, you are not installing the libraries properly. Try to create a requirements.txt from his setup.py

        coloredlogs~=15.0.1
        deepspeed~=0.7.7
        diskcache~=5.4.0
        einops~=0.6.0
        encodec~=0.1.1
        g2p_en~=2.1.0
        humanize~=4.4.0
        matplotlib~=3.6.0
        numpy~=1.23.3
        omegaconf~=2.2.3
        openTSNE~=0.6.2
        pandas~=1.5.0
        soundfile~=0.11.0
        torch~=1.13.0
        torchaudio~=0.13.0
        tqdm~=4.64.1

Now, delete your venv, create it again and run `pip -r install requirements.txt`

</details>



huangapple
  • 本文由 发表于 2023年5月31日 23:10:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/76374960.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定