2023年5月26日 09:06:19go评论181阅读模式

英文:

How to generate sentiment scores using predefined aspects with deberta-v3-base-absa-v1.1 Huggingface model?

问题

使用预定义的方面来生成情感分数

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import pandas as pd
# 加载ABSA模型和分词器
model_name = "yangheng/deberta-v3-base-absa-v1.1"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# 生成方面和情感
aspects = []
sentiments = []
for index, row in df.iterrows():
    text = row['text']
    row_aspects = row['aspects']
    
    aspect_sentiments = []
    
    for aspect in row_aspects:
        inputs = tokenizer(text, aspect, return_tensors="pt")
        
        with torch.inference_mode():
            outputs = model(**inputs)
        
        predicted_sentiment = torch.argmax(outputs.logits).item()
        sentiment_label = model.config.id2label[predicted_sentiment]
        
        aspect_sentiments.append(f"{aspect}: {sentiment_label}")
    
    aspects.append(row_aspects)
    sentiments.append(aspect_sentiments)
# 将生成的方面和情感添加到DataFrame中
df['generated_aspects'] = aspects
df['generated_sentiments'] = sentiments
# 打印更新后的DataFrame
print(df)

生成文本的方面和相应的情感分数

import torch
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModelForSequenceClassification
model_name = "yangheng/deberta-v3-base-absa-v1.1"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
aspects = ["food", "service"]
text = "The food was great but the service was terrible."
sentiment_aspect = {}
for aspect in aspects:
    inputs = tokenizer(text, aspect, return_tensors="pt")
    with torch.inference_mode():
        outputs = model(**inputs)
    scores = F.softmax(outputs.logits[0], dim=-1)
    label_id = torch.argmax(scores).item()
    sentiment_aspect[aspect] = (model.config.id2label[label_id], scores[label_id].item())
print(sentiment_aspect)

英文:

I have a dataframe , where there is text in 1st column and predefine aspect in another column however there is no aspects defined for few text ,for example row 2.

data = {
    &#39;text&#39;: [
        &quot;The camera quality of this phone is amazing.&quot;,
        &quot;The belt is poor quality&quot;,
        &quot;The battery life could be improved.&quot;,
        &quot;The display is sharp and vibrant.&quot;,
        &quot;The customer service was disappointing.&quot;
    ],
    &#39;aspects&#39;: [
        [&quot;camera&quot;, &quot;phone&quot;],
        [],
        [&quot;battery&quot;, &quot;life&quot;],
        [&quot;display&quot;],
        [&quot;customer service&quot;]
    ]
}
df = pd.DataFrame(data)

I want to generate two things

using pre define aspect for the text, generate sentiment score
using text generate aspect and also the sentiment score from the package

Note: This package yangheng/deberta-v3-base-absa-v1.1

1)generate sentiment score based on predefine aspects

2)generate both aspect and it's respective sentiments

Note Row 2 does not have predefine aspect

I tried and getting error

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import pandas as pd
# Load the ABSA model and tokenizer
model_name = &quot;yangheng/deberta-v3-base-absa-v1.1&quot;
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Generate aspects and sentiments
aspects = []
sentiments = []
for index, row in df.iterrows():
    text = row[&#39;text&#39;]
    row_aspects = row[&#39;aspects&#39;]
    
    aspect_sentiments = []
    
    for aspect in row_aspects:
        inputs = tokenizer(text, aspect, return_tensors=&quot;pt&quot;)
        
        with torch.inference_mode():
            outputs = model(**inputs)
        
        predicted_sentiment = torch.argmax(outputs.logits).item()
        sentiment_label = model.config.id2label[predicted_sentiment]
        
        aspect_sentiments.append(f&quot;{aspect}: {sentiment_label}&quot;)
    
    aspects.append(row_aspects)
    sentiments.append(aspect_sentiments)
# Add the generated aspects and sentiments to the DataFrame
df[&#39;generated_aspects&#39;] = aspects
df[&#39;generated_sentiments&#39;] = sentiments
# Print the updated DataFrame
print(df)

generic example to use the package

import torch
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModelForSequenceClassification
model_name = &quot;yangheng/deberta-v3-base-absa-v1.1&quot;
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
aspects = [&quot;food&quot;, &quot;service&quot;]
text = &quot;The food was great but the service was terrible.&quot;
sentiment_aspect = {}
for aspect in aspects:
  inputs = tokenizer(text, aspect, return_tensors=&quot;pt&quot;)
  with torch.inference_mode():
    outputs = model(**inputs)
  scores = F.softmax(outputs.logits[0], dim=-1)
  label_id = torch.argmax(scores).item()
  sentiment_aspect[aspect] = (model.config.id2label[label_id], scores[label_id].item())
print(sentiment_aspect)

Desired Output

答案1

得分: 2

# Load the ABSA model and tokenizer
model_name = "yangheng/deberta-v3-base-absa-v1.1"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
for aspect in ['camera', 'phone']:
   print(aspect, classifier('The camera quality of this phone is amazing.',  text_pair=aspect))

[out]:

camera [{'label': 'Positive', 'score': 0.9967294931411743}]
phone [{'label': 'Neutral', 'score': 0.9472787380218506}]

To get the zero-shot classification scores in general, try using pipeline:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
from transformers import pipeline
# Load the ABSA model and tokenizer
model_name = "yangheng/deberta-v3-base-absa-v1.1"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
pipe = pipeline("zero-shot-classification", model=model, tokenizer=tokenizer)
pipe("The camera quality of this phone is amazing.", candidate_labels=["camera", "phone"])

[out]:

{'sequence': 'The camera quality of this phone is amazing.',
 'labels': ['camera', 'phone'],
 'scores': [0.9036691784858704, 0.09633082151412964]}

Depending on what "text generated aspect" means, perhaps it's keyword extraction, and if so, doing a search on https://huggingface.co/models?search=keyword, gives this as the top downloaded model, https://huggingface.co/yanekyuk/bert-uncased-keyword-extractor

from transformers import AutoTokenizer, AutoModelForTokenClassification
tokenizer2 = AutoTokenizer.from_pretrained("yanekyuk/bert-uncased-keyword-extractor")
model2 = AutoModelForTokenClassification.from_pretrained("yanekyuk/bert-uncased-keyword-extractor")
def extract_aspect(text):
    extractor = pipeline("ner", model=model2, tokenizer=tokenizer2)
    phrasesids = []
    for tag in extractor(text):
        if tag['entity'].startswith('B'):
            phrasesids.append([tag['start'], tag['end']])
        if tag['entity'].startswith('I'):
            phrasesids[-1][-1] = tag['end']
    phrases = [text:p[1]] for p in phrasesids]
    return phrases
text = "The camera quality of this phone is amazing."
extract_aspect(text)

[out]:

camera

Putting the extractor and classifier together:

from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoModelForTokenClassification
from transformers import pipeline
# Load the ABSA model and tokenizer
model_name = "yangheng/deberta-v3-base-absa-v1.1"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
classifier = pipeline("zero-shot-classification", model=model, tokenizer=tokenizer)
tokenizer2 = AutoTokenizer.from_pretrained("yanekyuk/bert-uncased-keyword-extractor")
model2 = AutoModelForTokenClassification.from_pretrained("yanekyuk/bert-uncased-keyword-extractor")
def extract_aspect(text):
    extractor = pipeline("ner", model=model2, tokenizer=tokenizer2)
    phrasesids = []
    for tag in extractor(text):
        if tag['entity'].startswith('B'):
            phrasesids.append([tag['start'], tag['end']])
        if tag['entity'].startswith('I'):
            phrasesids[-1][-1] = tag['end']
    phrases = [text:p[1]] for p in phrasesids]
    return phrases
text = "The camera quality of this phone is amazing."
pipe(text, candidate_labels=extract_aspect(text))

[out]:

{'sequence': 'The camera quality of this phone is amazing.',
 'labels': ['camera'],
 'scores': [0.9983300566673279]}

Q: But the extracted keywords is not "right" or doesn't match the pre-defined ones?

A: No model is perfect and the model example above is a keyword extractor not a product aspect extractor. YMMV.

Q: Why isn't the zero-shot classifier giving me negative / positive labels?

A: The zero-shot classifier is labeling the data based on the extracted labels. Not a sentiment classifier.


<details>
<summary>英文:</summary>
Specific to the `yangheng/deberta-v3-base-absa-v1.1` model this is the usage and you have to loop through the model one time per aspect:

Load the ABSA model and tokenizer

model_name = "yangheng/deberta-v3-base-absa-v1.1"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)

for aspect in ['camera', 'phone']:
print(aspect, classifier('The camera quality of this phone is amazing.', text_pair=aspect))


[out]:

camera [{'label': 'Positive', 'score': 0.9967294931411743}]
phone [{'label': 'Neutral', 'score': 0.9472787380218506}]


----
To get the zero-shot classification scores in general, try using `pipeline`:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
from transformers import pipeline

Load the ABSA model and tokenizer

model_name = "yangheng/deberta-v3-base-absa-v1.1"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

pipe = pipeline("zero-shot-classification", model=model, tokenizer=tokenizer)

pipe("The camera quality of this phone is amazing.", candidate_labels=["camera", "phone"])


[out]:

{'sequence': 'The camera quality of this phone is amazing.',
'labels': ['camera', 'phone'],
'scores': [0.9036691784858704, 0.09633082151412964]}


---- 
Depending on what &quot;text generated aspect&quot; means, perhaps it&#39;s keyword extraction, and if so, doing a search on https://huggingface.co/models?search=keyword, gives this as the top downloaded model, https://huggingface.co/yanekyuk/bert-uncased-keyword-extractor

from transformers import AutoTokenizer, AutoModelForTokenClassification

tokenizer2 = AutoTokenizer.from_pretrained("yanekyuk/bert-uncased-keyword-extractor")
model2 = AutoModelForTokenClassification.from_pretrained("yanekyuk/bert-uncased-keyword-extractor")

def extract_aspect(text):
extractor = pipeline("ner", model=model2, tokenizer=tokenizer2)
phrasesids = []
for tag in extractor(text):
if tag['entity'].startswith('B'):
phrasesids.append([tag['start'], tag['end']])
if tag['entity'].startswith('I'):
phrasesids[-1][-1] = tag['end']
phrases = [text

:p1] for p in phrasesids]
return phrases

text = "The camera quality of this phone is amazing."

extract_aspect(text)


[out]:

camera


Putting the extractor and classifier together:

from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoModelForTokenClassification
from transformers import pipeline

Load the ABSA model and tokenizer

model_name = "yangheng/deberta-v3-base-absa-v1.1"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

classifier = pipeline("zero-shot-classification", model=model, tokenizer=tokenizer)

tokenizer2 = AutoTokenizer.from_pretrained("yanekyuk/bert-uncased-keyword-extractor")
model2 = AutoModelForTokenClassification.from_pretrained("yanekyuk/bert-uncased-keyword-extractor")

:p1] for p in phrasesids]
return phrases

text = "The camera quality of this phone is amazing."

pipe(text, candidate_labels=extract_aspect(text))


[out]:

{'sequence': 'The camera quality of this phone is amazing.',
'labels': ['camera'],
'scores': [0.9983300566673279]}


-----
### Q: But the extracted keywords is not &quot;right&quot; or doesn&#39;t match the pre-defined ones?
A: No model is perfect and the model example above is a keyword extractor not a product aspect extractor. YMMV.
### Q: Why isn&#39;t the zero-shot classifier giving me negative / positive labels? 
A: The zero-shot classifier is labelling the data based on the extracted labels. Not a sentiment classifier. 
</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何使用Huggingface模型deberta-v3-base-absa-v1.1生成预定义方面的情感分数？

问题

答案1

Q: But the extracted keywords is not "right" or doesn't match the pre-defined ones?

Q: Why isn't the zero-shot classifier giving me negative / positive labels?

Load the ABSA model and tokenizer

Load the ABSA model and tokenizer

Load the ABSA model and tokenizer

RuntimeError: LookupError: IndexError: multi_usrp: RX channel 1 out of range for configured RX frontends – GNU Radio

自定义解码二进制数据在Polars中

Python脚本令牌刷新机制问题，用于Spotify API

如何解决Python中的索引超出范围错误？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论