避免对PEGASUS-Pubmed huggingface摘要模型的修剪摘要

huangapple go评论64阅读模式
英文:

Avoiding Trimmed Summaries of a PEGASUS-Pubmed huggingface summarization model

问题

以下是翻译好的部分:

我是huggingface的新手。
我正在使用PEGASUS - Pubmed huggingface模型生成研究论文的摘要。以下是相应的代码。该模型生成了一个简短的摘要。
有没有办法避免生成截断的摘要,获得更具体的摘要结果?

以下是我尝试的代码。

#加载Pubmed科学文章数据集

dataset_pubmed = load_dataset("scientific_papers", "pubmed")

#获取数据集的一部分

sample_dataset = dataset_pubmed["train"]
sample_dataset

#获取训练数据集的前两篇文章
sample_dataset = sample_dataset['article'][:2]
sample_dataset

###导入PegasusModel和Tokenizer

from transformers import pipeline, PegasusTokenizer, PegasusForConditionalGeneration

model = PegasusForConditionalGeneration.from_pretrained('google/pegasus-pubmed')
tokenizer = PegasusTokenizer.from_pretrained('google/pegasus-pubmed')

summarize_pipe = pipeline("summarization", model=model, tokenizer=tokenizer)
pipe_out = summarize_pipe(sample_dataset, truncation=True)
pipe_out

由于这部分内容是代码,我没有进行翻译。如果您有其他需要翻译的文本,请提供,我将继续翻译。

英文:

I am new to huggingface.
I am using PEGASUS - Pubmed huggingface model to generate summary of the reserach paper. Following is the code for the same. the model gives a trimmed summary.
Any way of avoiding the trimmed summaries and getting more concrete results in summarization.?

Following is the code that I tried.

#Loading Pubmed Dataset for Scientifc Articles

dataset_pubmed = load_dataset("scientific_papers","pubmed")

#Taking piece of  Train Dataset

sample_dataset = dataset_pubmed["train"]
sample_dataset

#Taking first two articles of Train Dataset
sample_dataset = sample_dataset['article'][:2]
sample_dataset

###Import PegasusModel and Tokenizer

from transformers import pipeline, PegasusTokenizer, PegasusForConditionalGeneration


model = PegasusForConditionalGeneration.from_pretrained('google/pegasus-pubmed')
tokenizer =PegasusTokenizer.from_pretrained('google/pegasus-pubmed')

summerize_pipe = pipeline("summarization", model=model, tokenizer=tokenizer)
pipe_out = summerize_pipe(sample_dataset, truncation=True)
pipe_out

As a results of this one of the summary output i get is as follows. The last sentence is not complete it gets trimmed for all the papers. How to avoid this.?

[{'summary_text': "background : in iran a national free food program ( nffp ) is implemented in elementary schools of deprived areas to cover all poor students . however , this program is not conducted in slums and poor areas of the big cities so many malnourished children with low socio - economic situation are not covered by nffp . therefore , the present study determines the effects of nutrition intervention in an advocacy process model on the prevalence of underweight in school aged children in the poor area of shiraz , iran.materials and methods : this interventional study has been carried out between 2009 and 2010 in shiraz , iran . in those schools all students ( 2897 , 7 - 13 years old ) were screened based on their body mass index ( bmi ) by nutritionists . according to convenience method all students divided to two groups based on their economic situation ; family revenue and head of household 's job and nutrition situation ; the first group were poor and malnourished students and the other group were well nourished or well - off students . for this report , the children 's height and weight were entered into center for disease control and prevention ( cdc ) to calculate bmi and bmi - for -"}

答案1

得分: 1

你应该将 max_length 增加到一个更大的值,比如 1024 或 2048:

summerize_pipe = pipeline("summarization", model=model, tokenizer=tokenizer, max_length=1024)
英文:

you should increase the max_length to a larger value, such as 1024 or 2048:

summerize_pipe = pipeline("summarization", model=model, tokenizer=tokenizer, max_length=1024)

huangapple
  • 本文由 发表于 2023年4月10日 20:00:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/75976909.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定