英文:
What is the cause of HFValidationError in this code and how do I resolve this error?
问题
import torch as tc
from transformers import GPT2Tokenizer, GPT2Model
def generate_text(txt):
"""
生成聊天内容
https://huggingface.co/gpt2
"""
# 加载模型文件
tokenizer = GPT2Tokenizer.from_pretrained('assets/') # 这行导致了错误
model = GPT2Model.from_pretrained('assets/')
# 如果可用,将模型移至 GPU
device = tc.device("cuda" if tc.cuda.is_available() else "cpu")
model.to(device)
encoded_input = tokenizer(txt, return_tensors='pt')
output = model(**encoded_input)
return str(output)
英文:
My python code in Chaquopy android studio Project:
import torch as tc
from transformers import GPT2Tokenizer, GPT2Model
def generate_text(txt):
"""
Generate chat
https://huggingface.co/gpt2
"""
#Load Model files
tokenizer = GPT2Tokenizer.from_pretrained('assets/') #This line causing error
model = GPT2Model.from_pretrained('assets/')
#Move moel to GPU if avilable
device = tc.device("cuda" if tc.cuda.is_available() else "cpu")
model.to(device)
encoded_input = tokenizer(txt, return_tensors='pt')
output = model(**encoded_input)
return str(output)
Now it is showing following error :
E/AndroidRuntime: FATAL EXCEPTION: main
Process: com.example.chaquopy_130application, PID: 4867
com.chaquo.python.PyException: HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: 'assets/'.
at <python>.huggingface_hub.utils._validators.validate_repo_id(_validators.py:164)
at <python>.huggingface_hub.utils._validators._inner_fn(_validators.py:110)
at <python>.huggingface_hub.utils._deprecation.inner_f(_deprecation.py:103)
at <python>.transformers.file_utils.get_list_of_files(file_utils.py:2103)
at <python>.transformers.tokenization_utils_base.get_fast_tokenizer_file(tokenization_utils_base.py:3486)
at <python>.transformers.tokenization_utils_base.from_pretrained(tokenization_utils_base.py:1654)
at <python>.pythonScript.generate_text(pythonScript.py:30)
I have put all files of 124M GPT-2 model checkpoint, encoder.json, hparams.json, model.ckpt.data-00000-of-00001, model.ckpt.index, model.ckpt.meta, vocab.bpe files inside of 'assets' folder.
答案1
得分: 1
The from_pretrained
documentation is not entirely clear about how it distinguishes huggingface repository names from local paths, although all the local path examples end with a slash. In any case, when loading data files with Chaquopy, you must always use absolute paths, as it says in the FAQ.
So assuming your "assets" directory is at the same level as the Python code, you can do this:
from os.path import dirname
tokenizer = GPT2Tokenizer.from_pretrained(f'{dirname(__file__)}/assets/')
英文:
The from_pretrained
documentation is not entirely clear about how it distinguishes huggingface repository names from local paths, although all the local path examples end with a slash. In any case, when loading data files with Chaquopy, you must always use absolute paths, as it says in the FAQ.
So assuming your "assets" directory is at the same level as the Python code, you can do this:
from os.path import dirname
tokenizer = GPT2Tokenizer.from_pretrained(f'{dirname(__file__)}/assets/')
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论