问题

以下是翻译好的部分：

我是huggingface的新手。
我正在使用PEGASUS - Pubmed huggingface模型生成研究论文的摘要。以下是相应的代码。该模型生成了一个简短的摘要。
有没有办法避免生成截断的摘要，获得更具体的摘要结果？

以下是我尝试的代码。

#加载Pubmed科学文章数据集

dataset_pubmed = load_dataset("scientific_papers", "pubmed")

#获取数据集的一部分

sample_dataset = dataset_pubmed["train"]
sample_dataset

#获取训练数据集的前两篇文章
sample_dataset = sample_dataset['article'][:2]
sample_dataset

###导入PegasusModel和Tokenizer

from transformers import pipeline, PegasusTokenizer, PegasusForConditionalGeneration

model = PegasusForConditionalGeneration.from_pretrained('google/pegasus-pubmed')
tokenizer = PegasusTokenizer.from_pretrained('google/pegasus-pubmed')

summarize_pipe = pipeline("summarization", model=model, tokenizer=tokenizer)
pipe_out = summarize_pipe(sample_dataset, truncation=True)
pipe_out

由于这部分内容是代码，我没有进行翻译。如果您有其他需要翻译的文本，请提供，我将继续翻译。

英文:

I am new to huggingface.
I am using PEGASUS - Pubmed huggingface model to generate summary of the reserach paper. Following is the code for the same. the model gives a trimmed summary.
Any way of avoiding the trimmed summaries and getting more concrete results in summarization.?

Following is the code that I tried.

#Loading Pubmed Dataset for Scientifc Articles

dataset_pubmed = load_dataset(&quot;scientific_papers&quot;,&quot;pubmed&quot;)

#Taking piece of  Train Dataset

sample_dataset = dataset_pubmed[&quot;train&quot;]
sample_dataset

#Taking first two articles of Train Dataset
sample_dataset = sample_dataset[&#39;article&#39;][:2]
sample_dataset

###Import PegasusModel and Tokenizer

from transformers import pipeline, PegasusTokenizer, PegasusForConditionalGeneration


model = PegasusForConditionalGeneration.from_pretrained(&#39;google/pegasus-pubmed&#39;)
tokenizer =PegasusTokenizer.from_pretrained(&#39;google/pegasus-pubmed&#39;)

summerize_pipe = pipeline(&quot;summarization&quot;, model=model, tokenizer=tokenizer)
pipe_out = summerize_pipe(sample_dataset, truncation=True)
pipe_out

As a results of this one of the summary output i get is as follows. The last sentence is not complete it gets trimmed for all the papers. How to avoid this.?

[{'summary_text': "background : in iran a national free food program ( nffp ) is implemented in elementary schools of deprived areas to cover all poor students . however , this program is not conducted in slums and poor areas of the big cities so many malnourished children with low socio - economic situation are not covered by nffp . therefore , the present study determines the effects of nutrition intervention in an advocacy process model on the prevalence of underweight in school aged children in the poor area of shiraz , iran.materials and methods : this interventional study has been carried out between 2009 and 2010 in shiraz , iran . in those schools all students ( 2897 , 7 - 13 years old ) were screened based on their body mass index ( bmi ) by nutritionists . according to convenience method all students divided to two groups based on their economic situation ; family revenue and head of household 's job and nutrition situation ; the first group were poor and malnourished students and the other group were well nourished or well - off students . for this report , the children 's height and weight were entered into center for disease control and prevention ( cdc ) to calculate bmi and bmi - for -"}

答案1

得分: 1

你应该将 max_length 增加到一个更大的值，比如 1024 或 2048：

summerize_pipe = pipeline("summarization", model=model, tokenizer=tokenizer, max_length=1024)

英文:

you should increase the max_length to a larger value, such as 1024 or 2048:

summerize_pipe = pipeline(&quot;summarization&quot;, model=model, tokenizer=tokenizer, max_length=1024)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

避免对PEGASUS-Pubmed huggingface摘要模型的修剪摘要

问题

答案1

你能在PyTorch张量之间使用不规则索引进行赋值而无需使用for循环吗？

如何在Windows上为Spacy启用CUDA GPU加速。

PIL.UnidentifiedImageError: cannot identify image file io.BytesIO object for deploying a PyTorch model in Flask

在PyTorch中，是否可以通过系数来冻结一个模块？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论