英文:
Avoiding Trimmed Summaries of a PEGASUS-Pubmed huggingface summarization model
问题
以下是翻译好的部分:
我是huggingface的新手。
我正在使用PEGASUS - Pubmed huggingface模型生成研究论文的摘要。以下是相应的代码。该模型生成了一个简短的摘要。
有没有办法避免生成截断的摘要,获得更具体的摘要结果?
以下是我尝试的代码。
#加载Pubmed科学文章数据集
dataset_pubmed = load_dataset("scientific_papers", "pubmed")
#获取数据集的一部分
sample_dataset = dataset_pubmed["train"]
sample_dataset
#获取训练数据集的前两篇文章
sample_dataset = sample_dataset['article'][:2]
sample_dataset
###导入PegasusModel和Tokenizer
from transformers import pipeline, PegasusTokenizer, PegasusForConditionalGeneration
model = PegasusForConditionalGeneration.from_pretrained('google/pegasus-pubmed')
tokenizer = PegasusTokenizer.from_pretrained('google/pegasus-pubmed')
summarize_pipe = pipeline("summarization", model=model, tokenizer=tokenizer)
pipe_out = summarize_pipe(sample_dataset, truncation=True)
pipe_out
由于这部分内容是代码,我没有进行翻译。如果您有其他需要翻译的文本,请提供,我将继续翻译。
英文:
I am new to huggingface.
I am using PEGASUS - Pubmed huggingface model to generate summary of the reserach paper. Following is the code for the same. the model gives a trimmed summary.
Any way of avoiding the trimmed summaries and getting more concrete results in summarization.?
Following is the code that I tried.
#Loading Pubmed Dataset for Scientifc Articles
dataset_pubmed = load_dataset("scientific_papers","pubmed")
#Taking piece of Train Dataset
sample_dataset = dataset_pubmed["train"]
sample_dataset
#Taking first two articles of Train Dataset
sample_dataset = sample_dataset['article'][:2]
sample_dataset
###Import PegasusModel and Tokenizer
from transformers import pipeline, PegasusTokenizer, PegasusForConditionalGeneration
model = PegasusForConditionalGeneration.from_pretrained('google/pegasus-pubmed')
tokenizer =PegasusTokenizer.from_pretrained('google/pegasus-pubmed')
summerize_pipe = pipeline("summarization", model=model, tokenizer=tokenizer)
pipe_out = summerize_pipe(sample_dataset, truncation=True)
pipe_out
As a results of this one of the summary output i get is as follows. The last sentence is not complete it gets trimmed for all the papers. How to avoid this.?
[{'summary_text': "background : in iran a national free food program ( nffp ) is implemented in elementary schools of deprived areas to cover all poor students . however , this program is not conducted in slums and poor areas of the big cities so many malnourished children with low socio - economic situation are not covered by nffp . therefore , the present study determines the effects of nutrition intervention in an advocacy process model on the prevalence of underweight in school aged children in the poor area of shiraz , iran.materials and methods : this interventional study has been carried out between 2009 and 2010 in shiraz , iran . in those schools all students ( 2897 , 7 - 13 years old ) were screened based on their body mass index ( bmi ) by nutritionists . according to convenience method all students divided to two groups based on their economic situation ; family revenue and head of household 's job and nutrition situation ; the first group were poor and malnourished students and the other group were well nourished or well - off students . for this report , the children 's height and weight were entered into center for disease control and prevention ( cdc ) to calculate bmi and bmi - for -"}
答案1
得分: 1
你应该将 max_length
增加到一个更大的值,比如 1024 或 2048:
summerize_pipe = pipeline("summarization", model=model, tokenizer=tokenizer, max_length=1024)
英文:
you should increase the max_length
to a larger value, such as 1024 or 2048:
summerize_pipe = pipeline("summarization", model=model, tokenizer=tokenizer, max_length=1024)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论