HFValidationError在这段代码中的原因是什么,如何解决这个错误?

huangapple go评论140阅读模式
英文:

What is the cause of HFValidationError in this code and how do I resolve this error?

问题

import torch as tc
from transformers import GPT2Tokenizer, GPT2Model

def generate_text(txt):
    """
    生成聊天内容
    https://huggingface.co/gpt2
    """
    
    # 加载模型文件
    tokenizer = GPT2Tokenizer.from_pretrained('assets/') # 这行导致了错误
    model = GPT2Model.from_pretrained('assets/')
    # 如果可用,将模型移至 GPU
    device = tc.device("cuda" if tc.cuda.is_available() else "cpu")
    model.to(device)

    encoded_input = tokenizer(txt, return_tensors='pt')
    output = model(**encoded_input)

    return str(output)
英文:

My python code in Chaquopy android studio Project:

import torch as tc
from transformers import GPT2Tokenizer, GPT2Model



def generate_text(txt):
    """
    Generate chat
    https://huggingface.co/gpt2
    """

    #Load Model files
    tokenizer = GPT2Tokenizer.from_pretrained('assets/') #This line causing error
    model = GPT2Model.from_pretrained('assets/')
    #Move moel to GPU if avilable
    device = tc.device("cuda" if tc.cuda.is_available() else "cpu")
    model.to(device)

    encoded_input = tokenizer(txt, return_tensors='pt')
    output = model(**encoded_input)

    return str(output)

Now it is showing following error :

E/AndroidRuntime: FATAL EXCEPTION: main
    Process: com.example.chaquopy_130application, PID: 4867
    com.chaquo.python.PyException: HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: 'assets/'.
        at <python>.huggingface_hub.utils._validators.validate_repo_id(_validators.py:164)
        at <python>.huggingface_hub.utils._validators._inner_fn(_validators.py:110)
        at <python>.huggingface_hub.utils._deprecation.inner_f(_deprecation.py:103)
        at <python>.transformers.file_utils.get_list_of_files(file_utils.py:2103)
        at <python>.transformers.tokenization_utils_base.get_fast_tokenizer_file(tokenization_utils_base.py:3486)
        at <python>.transformers.tokenization_utils_base.from_pretrained(tokenization_utils_base.py:1654)
        at <python>.pythonScript.generate_text(pythonScript.py:30)

I have put all files of 124M GPT-2 model checkpoint, encoder.json, hparams.json, model.ckpt.data-00000-of-00001, model.ckpt.index, model.ckpt.meta, vocab.bpe files inside of 'assets' folder.

HFValidationError在这段代码中的原因是什么,如何解决这个错误?

答案1

得分: 1

The from_pretrained documentation is not entirely clear about how it distinguishes huggingface repository names from local paths, although all the local path examples end with a slash. In any case, when loading data files with Chaquopy, you must always use absolute paths, as it says in the FAQ.

So assuming your "assets" directory is at the same level as the Python code, you can do this:

from os.path import dirname
tokenizer = GPT2Tokenizer.from_pretrained(f'{dirname(__file__)}/assets/')
英文:

The from_pretrained documentation is not entirely clear about how it distinguishes huggingface repository names from local paths, although all the local path examples end with a slash. In any case, when loading data files with Chaquopy, you must always use absolute paths, as it says in the FAQ.

So assuming your "assets" directory is at the same level as the Python code, you can do this:

from os.path import dirname
tokenizer = GPT2Tokenizer.from_pretrained(f'{dirname(__file__)}/assets/')

huangapple
  • 本文由 发表于 2023年6月18日 20:20:29
  • 转载请务必保留本文链接:https://go.coder-hub.com/76500504.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定