2023年5月10日 22:19:41go评论157阅读模式

英文:

Can't install Flash Attention in Azure Databricks GPU (for Hugging Face model)

问题

我可以在Databricks的CPU集群上成功运行以下代码。

import torch
import transformers
model =  transformers.AutoModelForCausalLM.from_pretrained(
    "mosaicml/mpt-7b",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
)

在CPU的Databricks集群上，我首先安装了PyTorch 2.0.1、Transformers 4.28.1 和 einops 0.6.1。

然而，同样的Python代码在GPU集群上失败，显示以下错误：

ImportError: This modeling file requires the following packages that were not found in your environment: flash_attn. Run pip install flash_attn

然后，我尝试在Databricks的GPU集群上安装所需的包 pip install flash-attn，但未能成功安装 Flash Attention。

在GPU上，我尝试了以下两步：

尝试在GPU集群上安装 'flash-attn' 库。pip install flash-attn 导致以下错误：

Collecting flash-attn
  Downloading flash_attn-1.0.4.tar.gz (2.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 28.1 MB/s eta 0:00:0000:010:01
...
[317 lines of output]
...
RuntimeError: Error compiling objects for extension

更新GPU上的 torch 库到版本 2.0.1（以匹配CPU上的成功方法），然后再次尝试重新安装 flash-attn。但这仍然不起作用。

我怀疑代码在CPU集群上正常工作是因为它没有与较早版本的PyTorch预打包。我在Databricks的CPU上安装了PyTorch 2.0.1，并且它可以正常工作。但是，GPU集群预先安装了较早版本的PyTorch，并且无法在GPU上安装Flash Attention。

英文:

I can successfully run the following code on a CPU cluster in Databricks.

import torch
import transformers
model =  transformers.AutoModelForCausalLM.from_pretrained(
    &quot;mosaicml/mpt-7b&quot;,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
)

On the CPU databricks cluster, I first installed Pytorch 2.0.1 ;Transformers 4.28.1 & einops 0.6.1.

However, the same Python code fails on a GPU cluster - with the following error:

ImportError: This modeling file requires the following packages that were not found in your environment: flash_attn. Run pip install flash_attn

I then tried to install the required package pip install flash-attn on the Databricks GPU cluster. (based on instructions HERE

However, I have been unable to install Flash Attention on the GPU cluster.

On GPU I tried the following:

Attempted to install 'flash-attn' library on the GPU cluster. pip install flash-attn resulted in the following error:

Collecting flash-attn
Downloading flash_attn-1.0.4.tar.gz (2.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 28.1 MB/s eta 0:00:0000:010:01
Preparing metadata (setup.py) ... done
Requirement already satisfied: torch in /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages (from flash-attn) (1.13.1)
Requirement already satisfied: einops in /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages (from flash-attn) (0.6.1)
Requirement already satisfied: packaging in /databricks/python3/lib/python3.10/site-packages (from flash-attn) (21.3)
Requirement already satisfied: pyparsing!=3.0.5,&gt;=2.0.2 in /databricks/python3/lib/python3.10/site-packages (from packaging-&gt;flash-attn) (3.0.9)
Requirement already satisfied: nvidia-cublas-cu11==11.10.3.66 in /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages (from torch-&gt;flash-attn) (11.10.3.66)
Requirement already satisfied: nvidia-cudnn-cu11==8.5.0.96 in /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages (from torch-&gt;flash-attn) (8.5.0.96)
Requirement already satisfied: typing-extensions in /databricks/python3/lib/python3.10/site-packages (from torch-&gt;flash-attn) (4.3.0)
Requirement already satisfied: nvidia-cuda-nvrtc-cu11==11.7.99 in /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages (from torch-&gt;flash-attn) (11.7.99)
Requirement already satisfied: nvidia-cuda-runtime-cu11==11.7.99 in /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages (from torch-&gt;flash-attn) (11.7.99)
Requirement already satisfied: setuptools in /databricks/python3/lib/python3.10/site-packages (from nvidia-cublas-cu11==11.10.3.66-&gt;torch-&gt;flash-attn) (63.4.1)
Requirement already satisfied: wheel in /databricks/python3/lib/python3.10/site-packages (from nvidia-cublas-cu11==11.10.3.66-&gt;torch-&gt;flash-attn) (0.37.1)
Building wheels for collected packages: flash-attn
Building wheel for flash-attn (setup.py) ... error
error: subprocess-exited-with-error
&#215; python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─&gt; [317 lines of output]
torch.__version__  = 1.13.1+cu117
fatal: not a git repository (or any of the parent directories): .git
running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-x86_64-cpython-310
creating build/lib.linux-x86_64-cpython-310/flash_attn
copying flash_attn/flash_blocksparse_attn_interface.py -&gt; build/lib.linux-x86_64-cpython-310/flash_attn
copying flash_attn/rotary.py -&gt; build/lib.linux-x86_64-cpython-310/flash_attn
copying flash_attn/__init__.py -&gt; build/lib.linux-x86_64-cpython-310/flash_attn
copying flash_attn/flash_attn_triton_tmp_og.py -&gt; build/lib.linux-x86_64-cpython-310/flash_attn
copying flash_attn/flash_blocksparse_attention.py -&gt; build/lib.linux-x86_64-cpython-310/flash_attn
copying flash_attn/flash_attn_triton.py -&gt; build/lib.linux-x86_64-cpython-310/flash_attn
copying flash_attn/bert_padding.py -&gt; build/lib.linux-x86_64-cpython-310/flash_attn
copying flash_attn/flash_attn_triton_og.py -&gt; build/lib.linux-x86_64-cpython-310/flash_attn
copying flash_attn/flash_attention.py -&gt; build/lib.linux-x86_64-cpython-310/flash_attn
copying flash_attn/fused_softmax.py -&gt; build/lib.linux-x86_64-cpython-310/flash_attn
copying flash_attn/flash_attn_triton_varlen.py -&gt; build/lib.linux-x86_64-cpython-310/flash_attn
copying flash_attn/flash_attn_interface.py -&gt; build/lib.linux-x86_64-cpython-310/flash_attn
copying flash_attn/flash_attn_triton_tmp.py -&gt; build/lib.linux-x86_64-cpython-310/flash_attn
copying flash_attn/attention_kernl.py -&gt; build/lib.linux-x86_64-cpython-310/flash_attn
copying flash_attn/flash_attn_triton_single_query.py -&gt; build/lib.linux-x86_64-cpython-310/flash_attn
creating build/lib.linux-x86_64-cpython-310/flash_attn/modules
copying flash_attn/modules/embedding.py -&gt; build/lib.linux-x86_64-cpython-310/flash_attn/modules
copying flash_attn/modules/__init__.py -&gt; build/lib.linux-x86_64-cpython-310/flash_attn/modules
copying flash_attn/modules/mlp.py -&gt; build/lib.linux-x86_64-cpython-310/flash_attn/modules
copying flash_attn/modules/block.py -&gt; build/lib.linux-x86_64-cpython-310/flash_attn/modules
copying flash_attn/modules/mha.py -&gt; build/lib.linux-x86_64-cpython-310/flash_attn/modules
creating build/lib.linux-x86_64-cpython-310/flash_attn/utils
copying flash_attn/utils/__init__.py -&gt; build/lib.linux-x86_64-cpython-310/flash_attn/utils
copying flash_attn/utils/generation.py -&gt; build/lib.linux-x86_64-cpython-310/flash_attn/utils
copying flash_attn/utils/benchmark.py -&gt; build/lib.linux-x86_64-cpython-310/flash_attn/utils
copying flash_attn/utils/distributed.py -&gt; build/lib.linux-x86_64-cpython-310/flash_attn/utils
copying flash_attn/utils/pretrained.py -&gt; build/lib.linux-x86_64-cpython-310/flash_attn/utils
creating build/lib.linux-x86_64-cpython-310/flash_attn/layers
copying flash_attn/layers/rotary.py -&gt; build/lib.linux-x86_64-cpython-310/flash_attn/layers
copying flash_attn/layers/__init__.py -&gt; build/lib.linux-x86_64-cpython-310/flash_attn/layers
copying flash_attn/layers/patch_embed.py -&gt; build/lib.linux-x86_64-cpython-310/flash_attn/layers
creating build/lib.linux-x86_64-cpython-310/flash_attn/triton
copying flash_attn/triton/__init__.py -&gt; build/lib.linux-x86_64-cpython-310/flash_attn/triton
copying flash_attn/triton/fused_attention.py -&gt; build/lib.linux-x86_64-cpython-310/flash_attn/triton
creating build/lib.linux-x86_64-cpython-310/flash_attn/losses
copying flash_attn/losses/cross_entropy_apex.py -&gt; build/lib.linux-x86_64-cpython-310/flash_attn/losses
copying flash_attn/losses/__init__.py -&gt; build/lib.linux-x86_64-cpython-310/flash_attn/losses
copying flash_attn/losses/cross_entropy_parallel.py -&gt; build/lib.linux-x86_64-cpython-310/flash_attn/losses
copying flash_attn/losses/cross_entropy.py -&gt; build/lib.linux-x86_64-cpython-310/flash_attn/losses
creating build/lib.linux-x86_64-cpython-310/flash_attn/ops
copying flash_attn/ops/__init__.py -&gt; build/lib.linux-x86_64-cpython-310/flash_attn/ops
copying flash_attn/ops/layer_norm.py -&gt; build/lib.linux-x86_64-cpython-310/flash_attn/ops
copying flash_attn/ops/gelu_activation.py -&gt; build/lib.linux-x86_64-cpython-310/flash_attn/ops
copying flash_attn/ops/rms_norm.py -&gt; build/lib.linux-x86_64-cpython-310/flash_attn/ops
copying flash_attn/ops/fused_dense.py -&gt; build/lib.linux-x86_64-cpython-310/flash_attn/ops
copying flash_attn/ops/activations.py -&gt; build/lib.linux-x86_64-cpython-310/flash_attn/ops
creating build/lib.linux-x86_64-cpython-310/flash_attn/models
copying flash_attn/models/gptj.py -&gt; build/lib.linux-x86_64-cpython-310/flash_attn/models
copying flash_attn/models/vit.py -&gt; build/lib.linux-x86_64-cpython-310/flash_attn/models
copying flash_attn/models/__init__.py -&gt; build/lib.linux-x86_64-cpython-310/flash_attn/models
copying flash_attn/models/llama.py -&gt; build/lib.linux-x86_64-cpython-310/flash_attn/models
copying flash_attn/models/gpt.py -&gt; build/lib.linux-x86_64-cpython-310/flash_attn/models
copying flash_attn/models/bert.py -&gt; build/lib.linux-x86_64-cpython-310/flash_attn/models
copying flash_attn/models/gpt_j.py -&gt; build/lib.linux-x86_64-cpython-310/flash_attn/models
copying flash_attn/models/opt.py -&gt; build/lib.linux-x86_64-cpython-310/flash_attn/models
copying flash_attn/models/gpt_neox.py -&gt; build/lib.linux-x86_64-cpython-310/flash_attn/models
running build_ext
building &#39;flash_attn_cuda&#39; extension
creating /tmp/pip-install-nluf5697/flash-attn_79c1dfb03cc4482ba86c435b2db7a8b6/build/temp.linux-x86_64-cpython-310
creating /tmp/pip-install-nluf5697/flash-attn_79c1dfb03cc4482ba86c435b2db7a8b6/build/temp.linux-x86_64-cpython-310/csrc
creating /tmp/pip-install-nluf5697/flash-attn_79c1dfb03cc4482ba86c435b2db7a8b6/build/temp.linux-x86_64-cpython-310/csrc/flash_attn
creating /tmp/pip-install-nluf5697/flash-attn_79c1dfb03cc4482ba86c435b2db7a8b6/build/temp.linux-x86_64-cpython-310/csrc/flash_attn/src
Emitting ninja build file /tmp/pip-install-nluf5697/flash-attn_79c1dfb03cc4482ba86c435b2db7a8b6/build/temp.linux-x86_64-cpython-310/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/9] /usr/local/cuda/bin/nvcc  -I/tmp/pip-install-nluf5697/flash-attn_79c1dfb03cc4482ba86c435b2db7a8b6/csrc/flash_attn -I/tmp/pip-install-nluf5697/flash-attn_79c1dfb03cc4482ba86c435b2db7a8b6/csrc/flash_attn/src -I/tmp/pip-install-nluf5697/flash-attn_79c1dfb03cc4482ba86c435b2db7a8b6/csrc/flash_attn/cutlass/include -I/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/torch/include -I/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/torch/include/TH -I/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/torch/include/THC -I/usr/local/cuda/include -I/local_disk0/.ephemeral_nfs/envs/pythonEnv-69bd3443-3436-4892-a827-1f1b494c1c35/include -I/usr/include/python3.10 -c -c /tmp/pip-install-nluf5697/flash-attn_79c1dfb03cc4482ba86c435b2db7a8b6/csrc/flash_attn/src/fmha_bwd_hdim32.cu -o /tmp/pip-install-nluf5697/flash-attn_79c1dfb03cc4482ba86c435b2db7a8b6/build/temp.linux-x86_64-cpython-310/csrc/flash_attn/src/fmha_bwd_hdim32.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options &#39;&#39;&quot;&#39;&quot;&#39;-fPIC&#39;&quot;&#39;&quot;&#39;&#39; -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --ptxas-options=-v -lineinfo -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H &#39;-DPYBIND11_COMPILER_TYPE=&quot;_gcc&quot;&#39; &#39;-DPYBIND11_STDLIB=&quot;_libstdcpp&quot;&#39; &#39;-DPYBIND11_BUILD_ABI=&quot;_cxxabi1011&quot;&#39; -DTORCH_EXTENSION_NAME=flash_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0
FAILED: /tmp/pip-install-nluf5697/flash-attn_79c1dfb03cc4482ba86c435b2db7a8b6/build/temp.linux-x86_64-cpython-310/csrc/flash_attn/src/fmha_bwd_hdim32.o
/usr/local/cuda/bin/nvcc  -I/tmp/pip-install-nluf5697/flash-attn_79c1dfb03cc4482ba86c435b2db7a8b6/csrc/flash_attn -I/tmp/pip-install-nluf5697/flash-attn_79c1dfb03cc4482ba86c435b2db7a8b6/csrc/flash_attn/src -I/tmp/pip-install-nluf5697/flash-attn_79c1dfb03cc4482ba86c435b2db7a8b6/csrc/flash_attn/cutlass/include -I/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/torch/include -I/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/torch/include/TH -I/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/torch/include/THC -I/usr/local/cuda/include -I/local_disk0/.ephemeral_nfs/envs/pythonEnv-69bd3443-3436-4892-a827-1f1b494c1c35/include -I/usr/include/python3.10 -c -c /tmp/pip-install-nluf5697/flash-attn_79c1dfb03cc4482ba86c435b2db7a8b6/csrc/flash_attn/src/fmha_bwd_hdim32.cu -o /tmp/pip-install-nluf5697/flash-attn_79c1dfb03cc4482ba86c435b2db7a8b6/build/temp.linux-x86_64-cpython-310/csrc/flash_attn/src/fmha_bwd_hdim32.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options &#39;&#39;&quot;&#39;&quot;&#39;-fPIC&#39;&quot;&#39;&quot;&#39;&#39; -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --ptxas-options=-v -lineinfo -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H &#39;-DPYBIND11_COMPILER_TYPE=&quot;_gcc&quot;&#39; &#39;-DPYBIND11_STDLIB=&quot;_libstdcpp&quot;&#39; &#39;-DPYBIND11_BUILD_ABI=&quot;_cxxabi1011&quot;&#39; -DTORCH_EXTENSION_NAME=flash_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0
In file included from /tmp/pip-install-nluf5697/flash-attn_79c1dfb03cc4482ba86c435b2db7a8b6/csrc/flash_attn/src/fmha.h:39,
from /tmp/pip-install-nluf5697/flash-attn_79c1dfb03cc4482ba86c435b2db7a8b6/csrc/flash_attn/src/fmha_bwd_launch_template.h:6,
from /tmp/pip-install-nluf5697/flash-attn_79c1dfb03cc4482ba86c435b2db7a8b6/csrc/flash_attn/src/fmha_bwd_hdim32.cu:5:
/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/torch/include/ATen/cuda/CUDAContext.h:6:10: fatal error: cusparse.h: No such file or directory
6 | #include &lt;cusparse.h&gt;
|          ^~~~~~~~~~~~
compilation terminated.
In file included from /tmp/pip-install-nluf5697/flash-attn_79c1dfb03cc4482ba86c435b2db7a8b6/csrc/flash_attn/src/fmha.h:39,
from /tmp/pip-install-nluf5697/flash-attn_79c1dfb03cc4482ba86c435b2db7a8b6/csrc/flash_attn/src/fmha_bwd_launch_template.h:6,
from /tmp/pip-install-nluf5697/flash-attn_79c1dfb03cc4482ba86c435b2db7a8b6/csrc/flash_attn/src/fmha_bwd_hdim32.cu:5:
/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/torch/include/ATen/cuda/CUDAContext.h:6:10: fatal error: cusparse.h: No such file or directory
6 | #include &lt;cusparse.h&gt;
|          ^~~~~~~~~~~~
compilation terminated.
......................................... [ removed middle of error message] 
cmd_obj.run()
File &quot;/databricks/python/lib/python3.10/site-packages/setuptools/command/build.py&quot;, line 24, in run
super().run()
File &quot;/databricks/python/lib/python3.10/site-packages/setuptools/_distutils/command/build.py&quot;, line 132, in run
self.run_command(cmd_name)
File &quot;/databricks/python/lib/python3.10/site-packages/setuptools/_distutils/cmd.py&quot;, line 319, in run_command
self.distribution.run_command(command)
File &quot;/databricks/python/lib/python3.10/site-packages/setuptools/dist.py&quot;, line 1217, in run_command
super().run_command(command)
File &quot;/databricks/python/lib/python3.10/site-packages/setuptools/_distutils/dist.py&quot;, line 992, in run_command
cmd_obj.run()
File &quot;/databricks/python/lib/python3.10/site-packages/setuptools/command/build_ext.py&quot;, line 79, in run
_build_ext.run(self)
File &quot;/databricks/python/lib/python3.10/site-packages/Cython/Distutils/old_build_ext.py&quot;, line 186, in run
_build_ext.build_ext.run(self)
File &quot;/databricks/python/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py&quot;, line 346, in run
self.build_extensions()
File &quot;/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/torch/utils/cpp_extension.py&quot;, line 843, in build_extensions
build_ext.build_extensions(self)
File &quot;/databricks/python/lib/python3.10/site-packages/Cython/Distutils/old_build_ext.py&quot;, line 195, in build_extensions
_build_ext.build_ext.build_extensions(self)
File &quot;/databricks/python/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py&quot;, line 466, in build_extensions
self._build_extensions_serial()
File &quot;/databricks/python/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py&quot;, line 492, in _build_extensions_serial
self.build_extension(ext)
File &quot;/databricks/python/lib/python3.10/site-packages/setuptools/command/build_ext.py&quot;, line 202, in build_extension
_build_ext.build_extension(self, ext)
File &quot;/databricks/python/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py&quot;, line 547, in build_extension
objects = self.compiler.compile(
File &quot;/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/torch/utils/cpp_extension.py&quot;, line 658, in unix_wrap_ninja_compile
_write_ninja_file_and_compile_objects(
File &quot;/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/torch/utils/cpp_extension.py&quot;, line 1573, in _write_ninja_file_and_compile_objects
_run_ninja_build(
File &quot;/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/torch/utils/cpp_extension.py&quot;, line 1916, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure
&#215; Encountered error while trying to install package.
╰─&gt; flash-attn
note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.

Second, I tried the following:
2. updated torch library on GPU to version 2.0.1 (to match the successful approach on CPU) , tried again to reinstall flash-attn . This still didn't work.

My suspicion is that the code works fine on CPU cluster because it is NOT prepackaged with an earlier version of PyTorch. I install PyTorch 2.0.1 on Databricks CPU and it works fine. However, the GPU Cluster is preinstalled with an earlier version of Pytorch, and Flash Attention has not installed on GPU.

答案1

得分: 2

This is likely related to missing CUDA dependencies.

请尝试重新启动集群，并在笔记本单元格中运行以下命令以重新安装flash_attn，然后再试一次：

!wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/libcusparse-dev-11-7_11.7.3.50-1_amd64.deb -O /tmp/libcusparse-dev-11-7_11.7.3.50-1_amd64.deb &&
  wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/libcublas-dev-11-7_11.10.1.25-1_amd64.deb -O /tmp/libcublas-dev-11-7_11.10.1.25-1_amd64.deb &&
  wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/libcusolver-dev-11-7_11.4.0.1-1_amd64.deb -O /tmp/libcusolver-dev-11-7_11.4.0.1-1_amd64.deb &&
  wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/libcurand-dev-11-7_10.2.10.91-1_amd64.deb -O /tmp/libcurand-dev-11-7_10.2.10.91-1_amd64.deb &&
  dpkg -i /tmp/libcusparse-dev-11-7_11.7.3.50-1_amd64.deb &&
  dpkg -i /tmp/libcublas-dev-11-7_11.10.1.25-1_amd64.deb &&
  dpkg -i /tmp/libcusolver-dev-11-7_11.4.0.1-1_amd64.deb &&
  dpkg -i /tmp/libcurand-dev-11-7_10.2.10.91-1_amd64.deb

更新 - 将初始化脚本和一般说明添加到此存储库中：https://github.com/rafaelvp-db/databricks-llm-prompt-engineering

英文:

This is likely related to missing CUDA dependencies.

Please try restarting the cluster, running the following on a notebook cell and reinstalling flash_attn - and then give it another shot:

!wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/libcusparse-dev-11-7_11.7.3.50-1_amd64.deb -O /tmp/libcusparse-dev-11-7_11.7.3.50-1_amd64.deb &amp;&amp; \
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/libcublas-dev-11-7_11.10.1.25-1_amd64.deb -O /tmp/libcublas-dev-11-7_11.10.1.25-1_amd64.deb &amp;&amp; \
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/libcusolver-dev-11-7_11.4.0.1-1_amd64.deb -O /tmp/libcusolver-dev-11-7_11.4.0.1-1_amd64.deb &amp;&amp; \
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/libcurand-dev-11-7_10.2.10.91-1_amd64.deb -O /tmp/libcurand-dev-11-7_10.2.10.91-1_amd64.deb &amp;&amp; \
dpkg -i /tmp/libcusparse-dev-11-7_11.7.3.50-1_amd64.deb &amp;&amp; \
dpkg -i /tmp/libcublas-dev-11-7_11.10.1.25-1_amd64.deb &amp;&amp; \
dpkg -i /tmp/libcusolver-dev-11-7_11.4.0.1-1_amd64.deb &amp;&amp; \
dpkg -i /tmp/libcurand-dev-11-7_10.2.10.91-1_amd64.deb

Update - added both the init script and general instructions as part of this repo: https://github.com/rafaelvp-db/databricks-llm-prompt-engineering

答案2

得分: 0

Welcome to SO! I suspect this is CUDA shenanigans. Does !nvcc --version yield a version higher than 11.4? According to the docs for flash-attn (https://pypi.org/project/flash-attn/), you need a CUDA more than or equal to than 11.4.

英文:

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

无法在Azure Databricks GPU上安装Flash Attention（用于Hugging Face模型）。

问题

答案1

答案2

AutoModelForSeq2SeqLM 和 AutoModelForCausalLM 之间的区别是什么？

如何在不丢失 grad_fn 的情况下将 PyTorch 参数用作原始值？

如何在Azure数据湖中使用Databricks将SQL表创建在项目路径内而不是外部？

没有从Delta表中返回数据，尽管Delta文件存在。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论