2023年6月18日 20:20:29go评论151阅读模式

英文:

What is the cause of HFValidationError in this code and how do I resolve this error?

问题

import torch as tc
from transformers import GPT2Tokenizer, GPT2Model

def generate_text(txt):
    """
    生成聊天内容
    https://huggingface.co/gpt2
    """
    
    # 加载模型文件
    tokenizer = GPT2Tokenizer.from_pretrained('assets/') # 这行导致了错误
    model = GPT2Model.from_pretrained('assets/')
    # 如果可用，将模型移至 GPU
    device = tc.device("cuda" if tc.cuda.is_available() else "cpu")
    model.to(device)

    encoded_input = tokenizer(txt, return_tensors='pt')
    output = model(**encoded_input)

    return str(output)

英文:

My python code in Chaquopy android studio Project:

import torch as tc
from transformers import GPT2Tokenizer, GPT2Model



def generate_text(txt):
    &quot;&quot;&quot;
    Generate chat
    https://huggingface.co/gpt2
    &quot;&quot;&quot;

    #Load Model files
    tokenizer = GPT2Tokenizer.from_pretrained(&#39;assets/&#39;) #This line causing error
    model = GPT2Model.from_pretrained(&#39;assets/&#39;)
    #Move moel to GPU if avilable
    device = tc.device(&quot;cuda&quot; if tc.cuda.is_available() else &quot;cpu&quot;)
    model.to(device)

    encoded_input = tokenizer(txt, return_tensors=&#39;pt&#39;)
    output = model(**encoded_input)

    return str(output)

Now it is showing following error :

E/AndroidRuntime: FATAL EXCEPTION: main
    Process: com.example.chaquopy_130application, PID: 4867
    com.chaquo.python.PyException: HFValidationError: Repo id must use alphanumeric chars or &#39;-&#39;, &#39;_&#39;, &#39;.&#39;, &#39;--&#39; and &#39;..&#39; are forbidden, &#39;-&#39; and &#39;.&#39; cannot start or end the name, max length is 96: &#39;assets/&#39;.
        at &lt;python&gt;.huggingface_hub.utils._validators.validate_repo_id(_validators.py:164)
        at &lt;python&gt;.huggingface_hub.utils._validators._inner_fn(_validators.py:110)
        at &lt;python&gt;.huggingface_hub.utils._deprecation.inner_f(_deprecation.py:103)
        at &lt;python&gt;.transformers.file_utils.get_list_of_files(file_utils.py:2103)
        at &lt;python&gt;.transformers.tokenization_utils_base.get_fast_tokenizer_file(tokenization_utils_base.py:3486)
        at &lt;python&gt;.transformers.tokenization_utils_base.from_pretrained(tokenization_utils_base.py:1654)
        at &lt;python&gt;.pythonScript.generate_text(pythonScript.py:30)

I have put all files of 124M GPT-2 model checkpoint, encoder.json, hparams.json, model.ckpt.data-00000-of-00001, model.ckpt.index, model.ckpt.meta, vocab.bpe files inside of 'assets' folder.

答案1

得分: 1

The from_pretrained documentation is not entirely clear about how it distinguishes huggingface repository names from local paths, although all the local path examples end with a slash. In any case, when loading data files with Chaquopy, you must always use absolute paths, as it says in the FAQ.

So assuming your "assets" directory is at the same level as the Python code, you can do this:

from os.path import dirname
tokenizer = GPT2Tokenizer.from_pretrained(f'{dirname(__file__)}/assets/')

英文:

So assuming your "assets" directory is at the same level as the Python code, you can do this:

from os.path import dirname
tokenizer = GPT2Tokenizer.from_pretrained(f&#39;{dirname(__file__)}/assets/&#39;)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

HFValidationError在这段代码中的原因是什么，如何解决这个错误？

问题

答案1

基本手写应用问题（不是识别）

BufferedReader的read()太慢，readLine()不返回换行或回车符。

Android：切断约束布局

如何修复Android Studio Dolphin中的logcat问题？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论