如何使用Huggingface模型deberta-v3-base-absa-v1.1生成预定义方面的情感分数?

huangapple go评论181阅读模式
英文:

How to generate sentiment scores using predefined aspects with deberta-v3-base-absa-v1.1 Huggingface model?

问题

  1. 使用预定义的方面来生成情感分数
  1. import torch
  2. from transformers import AutoTokenizer, AutoModelForSequenceClassification
  3. import pandas as pd
  4. # 加载ABSA模型和分词器
  5. model_name = "yangheng/deberta-v3-base-absa-v1.1"
  6. tokenizer = AutoTokenizer.from_pretrained(model_name)
  7. model = AutoModelForSequenceClassification.from_pretrained(model_name)
  8. # 生成方面和情感
  9. aspects = []
  10. sentiments = []
  11. for index, row in df.iterrows():
  12. text = row['text']
  13. row_aspects = row['aspects']
  14. aspect_sentiments = []
  15. for aspect in row_aspects:
  16. inputs = tokenizer(text, aspect, return_tensors="pt")
  17. with torch.inference_mode():
  18. outputs = model(**inputs)
  19. predicted_sentiment = torch.argmax(outputs.logits).item()
  20. sentiment_label = model.config.id2label[predicted_sentiment]
  21. aspect_sentiments.append(f"{aspect}: {sentiment_label}")
  22. aspects.append(row_aspects)
  23. sentiments.append(aspect_sentiments)
  24. # 将生成的方面和情感添加到DataFrame中
  25. df['generated_aspects'] = aspects
  26. df['generated_sentiments'] = sentiments
  27. # 打印更新后的DataFrame
  28. print(df)
  1. 生成文本的方面和相应的情感分数
  1. import torch
  2. import torch.nn.functional as F
  3. from transformers import AutoTokenizer, AutoModelForSequenceClassification
  4. model_name = "yangheng/deberta-v3-base-absa-v1.1"
  5. tokenizer = AutoTokenizer.from_pretrained(model_name)
  6. model = AutoModelForSequenceClassification.from_pretrained(model_name)
  7. aspects = ["food", "service"]
  8. text = "The food was great but the service was terrible."
  9. sentiment_aspect = {}
  10. for aspect in aspects:
  11. inputs = tokenizer(text, aspect, return_tensors="pt")
  12. with torch.inference_mode():
  13. outputs = model(**inputs)
  14. scores = F.softmax(outputs.logits[0], dim=-1)
  15. label_id = torch.argmax(scores).item()
  16. sentiment_aspect[aspect] = (model.config.id2label[label_id], scores[label_id].item())
  17. print(sentiment_aspect)
英文:

I have a dataframe , where there is text in 1st column and predefine aspect in another column however there is no aspects defined for few text ,for example row 2.

  1. data = {
  2. 'text': [
  3. "The camera quality of this phone is amazing.",
  4. "The belt is poor quality",
  5. "The battery life could be improved.",
  6. "The display is sharp and vibrant.",
  7. "The customer service was disappointing."
  8. ],
  9. 'aspects': [
  10. ["camera", "phone"],
  11. [],
  12. ["battery", "life"],
  13. ["display"],
  14. ["customer service"]
  15. ]
  16. }
  17. df = pd.DataFrame(data)

I want to generate two things

  1. using pre define aspect for the text, generate sentiment score
  2. using text generate aspect and also the sentiment score from the package

Note: This package yangheng/deberta-v3-base-absa-v1.1

1)generate sentiment score based on predefine aspects

2)generate both aspect and it's respective sentiments

Note Row 2 does not have predefine aspect

I tried and getting error

  1. import torch
  2. from transformers import AutoTokenizer, AutoModelForSequenceClassification
  3. import pandas as pd
  4. # Load the ABSA model and tokenizer
  5. model_name = "yangheng/deberta-v3-base-absa-v1.1"
  6. tokenizer = AutoTokenizer.from_pretrained(model_name)
  7. model = AutoModelForSequenceClassification.from_pretrained(model_name)
  8. # Generate aspects and sentiments
  9. aspects = []
  10. sentiments = []
  11. for index, row in df.iterrows():
  12. text = row['text']
  13. row_aspects = row['aspects']
  14. aspect_sentiments = []
  15. for aspect in row_aspects:
  16. inputs = tokenizer(text, aspect, return_tensors="pt")
  17. with torch.inference_mode():
  18. outputs = model(**inputs)
  19. predicted_sentiment = torch.argmax(outputs.logits).item()
  20. sentiment_label = model.config.id2label[predicted_sentiment]
  21. aspect_sentiments.append(f"{aspect}: {sentiment_label}")
  22. aspects.append(row_aspects)
  23. sentiments.append(aspect_sentiments)
  24. # Add the generated aspects and sentiments to the DataFrame
  25. df['generated_aspects'] = aspects
  26. df['generated_sentiments'] = sentiments
  27. # Print the updated DataFrame
  28. print(df)

generic example to use the package

  1. import torch
  2. import torch.nn.functional as F
  3. from transformers import AutoTokenizer, AutoModelForSequenceClassification
  4. model_name = "yangheng/deberta-v3-base-absa-v1.1"
  5. tokenizer = AutoTokenizer.from_pretrained(model_name)
  6. model = AutoModelForSequenceClassification.from_pretrained(model_name)
  7. aspects = ["food", "service"]
  8. text = "The food was great but the service was terrible."
  9. sentiment_aspect = {}
  10. for aspect in aspects:
  11. inputs = tokenizer(text, aspect, return_tensors="pt")
  12. with torch.inference_mode():
  13. outputs = model(**inputs)
  14. scores = F.softmax(outputs.logits[0], dim=-1)
  15. label_id = torch.argmax(scores).item()
  16. sentiment_aspect[aspect] = (model.config.id2label[label_id], scores[label_id].item())
  17. print(sentiment_aspect)

Desired Output

如何使用Huggingface模型deberta-v3-base-absa-v1.1生成预定义方面的情感分数?

答案1

得分: 2

  1. # Load the ABSA model and tokenizer
  2. model_name = "yangheng/deberta-v3-base-absa-v1.1"
  3. tokenizer = AutoTokenizer.from_pretrained(model_name)
  4. model = AutoModelForSequenceClassification.from_pretrained(model_name)
  5. classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
  6. for aspect in ['camera', 'phone']:
  7. print(aspect, classifier('The camera quality of this phone is amazing.', text_pair=aspect))

[out]:

  1. camera [{'label': 'Positive', 'score': 0.9967294931411743}]
  2. phone [{'label': 'Neutral', 'score': 0.9472787380218506}]

To get the zero-shot classification scores in general, try using pipeline:

  1. from transformers import AutoTokenizer, AutoModelForSequenceClassification
  2. from transformers import pipeline
  3. # Load the ABSA model and tokenizer
  4. model_name = "yangheng/deberta-v3-base-absa-v1.1"
  5. tokenizer = AutoTokenizer.from_pretrained(model_name)
  6. model = AutoModelForSequenceClassification.from_pretrained(model_name)
  7. pipe = pipeline("zero-shot-classification", model=model, tokenizer=tokenizer)
  8. pipe("The camera quality of this phone is amazing.", candidate_labels=["camera", "phone"])

[out]:

  1. {'sequence': 'The camera quality of this phone is amazing.',
  2. 'labels': ['camera', 'phone'],
  3. 'scores': [0.9036691784858704, 0.09633082151412964]}

Depending on what "text generated aspect" means, perhaps it's keyword extraction, and if so, doing a search on https://huggingface.co/models?search=keyword, gives this as the top downloaded model, https://huggingface.co/yanekyuk/bert-uncased-keyword-extractor

  1. from transformers import AutoTokenizer, AutoModelForTokenClassification
  2. tokenizer2 = AutoTokenizer.from_pretrained("yanekyuk/bert-uncased-keyword-extractor")
  3. model2 = AutoModelForTokenClassification.from_pretrained("yanekyuk/bert-uncased-keyword-extractor")
  4. def extract_aspect(text):
  5. extractor = pipeline("ner", model=model2, tokenizer=tokenizer2)
  6. phrasesids = []
  7. for tag in extractor(text):
  8. if tag['entity'].startswith('B'):
  9. phrasesids.append([tag['start'], tag['end']])
  10. if tag['entity'].startswith('I'):
  11. phrasesids[-1][-1] = tag['end']
  12. phrases = [text

    :p[1]] for p in phrasesids]

  13. return phrases

  14. text = "The camera quality of this phone is amazing."

  15. extract_aspect(text)

[out]:

  1. camera

Putting the extractor and classifier together:

  1. from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoModelForTokenClassification
  2. from transformers import pipeline
  3. # Load the ABSA model and tokenizer
  4. model_name = "yangheng/deberta-v3-base-absa-v1.1"
  5. tokenizer = AutoTokenizer.from_pretrained(model_name)
  6. model = AutoModelForSequenceClassification.from_pretrained(model_name)
  7. classifier = pipeline("zero-shot-classification", model=model, tokenizer=tokenizer)
  8. tokenizer2 = AutoTokenizer.from_pretrained("yanekyuk/bert-uncased-keyword-extractor")
  9. model2 = AutoModelForTokenClassification.from_pretrained("yanekyuk/bert-uncased-keyword-extractor")
  10. def extract_aspect(text):
  11. extractor = pipeline("ner", model=model2, tokenizer=tokenizer2)
  12. phrasesids = []
  13. for tag in extractor(text):
  14. if tag['entity'].startswith('B'):
  15. phrasesids.append([tag['start'], tag['end']])
  16. if tag['entity'].startswith('I'):
  17. phrasesids[-1][-1] = tag['end']
  18. phrases = [text

    :p[1]] for p in phrasesids]

  19. return phrases

  20. text = "The camera quality of this phone is amazing."

  21. pipe(text, candidate_labels=extract_aspect(text))

[out]:

  1. {'sequence': 'The camera quality of this phone is amazing.',
  2. 'labels': ['camera'],
  3. 'scores': [0.9983300566673279]}

Q: But the extracted keywords is not "right" or doesn't match the pre-defined ones?

A: No model is perfect and the model example above is a keyword extractor not a product aspect extractor. YMMV.

Q: Why isn't the zero-shot classifier giving me negative / positive labels?

A: The zero-shot classifier is labeling the data based on the extracted labels. Not a sentiment classifier.

  1. <details>
  2. <summary>英文:</summary>
  3. Specific to the `yangheng/deberta-v3-base-absa-v1.1` model this is the usage and you have to loop through the model one time per aspect:

Load the ABSA model and tokenizer

model_name = "yangheng/deberta-v3-base-absa-v1.1"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)

for aspect in ['camera', 'phone']:
print(aspect, classifier('The camera quality of this phone is amazing.', text_pair=aspect))

  1. [out]:

camera [{'label': 'Positive', 'score': 0.9967294931411743}]
phone [{'label': 'Neutral', 'score': 0.9472787380218506}]

  1. ----
  2. To get the zero-shot classification scores in general, try using `pipeline`:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
from transformers import pipeline

Load the ABSA model and tokenizer

model_name = "yangheng/deberta-v3-base-absa-v1.1"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

pipe = pipeline("zero-shot-classification", model=model, tokenizer=tokenizer)

pipe("The camera quality of this phone is amazing.", candidate_labels=["camera", "phone"])

  1. [out]:

{'sequence': 'The camera quality of this phone is amazing.',
'labels': ['camera', 'phone'],
'scores': [0.9036691784858704, 0.09633082151412964]}

  1. ----
  2. Depending on what &quot;text generated aspect&quot; means, perhaps it&#39;s keyword extraction, and if so, doing a search on https://huggingface.co/models?search=keyword, gives this as the top downloaded model, https://huggingface.co/yanekyuk/bert-uncased-keyword-extractor

from transformers import AutoTokenizer, AutoModelForTokenClassification

tokenizer2 = AutoTokenizer.from_pretrained("yanekyuk/bert-uncased-keyword-extractor")
model2 = AutoModelForTokenClassification.from_pretrained("yanekyuk/bert-uncased-keyword-extractor")

def extract_aspect(text):
extractor = pipeline("ner", model=model2, tokenizer=tokenizer2)
phrasesids = []
for tag in extractor(text):
if tag['entity'].startswith('B'):
phrasesids.append([tag['start'], tag['end']])
if tag['entity'].startswith('I'):
phrasesids[-1][-1] = tag['end']
phrases = [text

:p1] for p in phrasesids]
return phrases

text = "The camera quality of this phone is amazing."

extract_aspect(text)

  1. [out]:

camera

  1. Putting the extractor and classifier together:

from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoModelForTokenClassification
from transformers import pipeline

Load the ABSA model and tokenizer

model_name = "yangheng/deberta-v3-base-absa-v1.1"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

classifier = pipeline("zero-shot-classification", model=model, tokenizer=tokenizer)

tokenizer2 = AutoTokenizer.from_pretrained("yanekyuk/bert-uncased-keyword-extractor")
model2 = AutoModelForTokenClassification.from_pretrained("yanekyuk/bert-uncased-keyword-extractor")

def extract_aspect(text):
extractor = pipeline("ner", model=model2, tokenizer=tokenizer2)
phrasesids = []
for tag in extractor(text):
if tag['entity'].startswith('B'):
phrasesids.append([tag['start'], tag['end']])
if tag['entity'].startswith('I'):
phrasesids[-1][-1] = tag['end']
phrases = [text

:p1] for p in phrasesids]
return phrases

text = "The camera quality of this phone is amazing."

pipe(text, candidate_labels=extract_aspect(text))

  1. [out]:

{'sequence': 'The camera quality of this phone is amazing.',
'labels': ['camera'],
'scores': [0.9983300566673279]}

  1. -----
  2. ### Q: But the extracted keywords is not &quot;right&quot; or doesn&#39;t match the pre-defined ones?
  3. A: No model is perfect and the model example above is a keyword extractor not a product aspect extractor. YMMV.
  4. ### Q: Why isn&#39;t the zero-shot classifier giving me negative / positive labels?
  5. A: The zero-shot classifier is labelling the data based on the extracted labels. Not a sentiment classifier.
  6. </details>

huangapple
  • 本文由 发表于 2023年5月26日 09:06:19
  • 转载请务必保留本文链接:https://go.coder-hub.com/76337058.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定