如何在具有条件的转录中计算特定关键词的数量

huangapple go评论62阅读模式
英文:

How to count specific keywords in a transcript with a condition

问题

I need to count how many times in the transcript the user is asking for an agent/representative before giving the bot a chance.

df["Transcript"] = df["Transcript"].str.lower()
df["User_Agent_Request"] = df["Transcript"].str.contains(r'^user\s*:\s*(agent|representative)', regex=True).cumsum()
count = df.loc[df["User_Agent_Request"] > 0, "User_Agent_Request"].nunique()
count

这段代码将计算在对话中,用户在给机器人一个机会之前有多少次要求代理或代表的情况。

英文:

I got a big data frame with a "Transcript" column between an bot and a user.
I need to count how many times in the transcript the user is asking for an agent/representative before giving the bot a chance.

The transcript looks as follow but longer:

"User : Order status.\nBot : Your order status is your orders tab. \nUser : representative."

"User : Agent please.\nBot : Waiting time is longer than usual."

I tried to use Regular Expression:

df["Transcript"] = df["Transcript"].str.lower()
df.loc[df["Transcript"].str.contains('agent|representative'),:]

But it will just output observations with those keywords.
How can I output a number that count when user first input is agent/representative?

答案1

得分: 2

我会通过将输入拆分为仅包含第一条转录文本(甚至在机器人回应之前),然后搜索您的术语,然后汇总结果以获取在第一条消息中用户请求代理的次数:

df['Transcript'].str.split('\n').str.get(0).str.contains('agent|representative').sum()

# 输出与您的示例一致:1
英文:

I'd do it by splitting the input to only the first transcript text (before the bot has even responded), then searching for your terms, and then summing the result to get the number of cases where a user has requested an agent within the first message:

df['Transcript'].str.split('\n').str.get(0).str.contains('agent|representative').sum()

# Output with your examples: 1

答案2

得分: 0

You could use re.findall() then take the length of the matches. This is a general way. There are other ways too.

import re

dfTranscript = '''
User: Order status.
Bot: Your order status is in your orders tab.
User: representative.

User: Agent please.
Bot: Waiting time is longer than usual.
'''

res = re.findall(r"(?i)\b(User\s*:.*?\b(?:agent|representative))\b", dfTranscript)

print(len(res))

The regex presumably needed:

(?i)
\b
(                             # (1 start)
   User \s* : .*? \b
   (?: agent | representative )
)                             # (1 end)
\b
英文:

You could use re.findall() then take the length of the matches.
This is general way. There are other ways too.

>>> import re
>>>
>>> dfTranscript = '''
... User : Order status.
... Bot : Your order status is your orders tab.
... User : representative.
...
... User : Agent please.
... Bot : Waiting time is longer than usual.
... '''
>>>
>>> res = re.findall(r"(?i)\b(User\s*:.*?\b(?:agent|representative))\b", dfTranscript)
>>>
>>> print(len(res))
2

The regex presumably needed :

 (?i)
 \b
 (                             # (1 start)
    User \s* : .*? \b
    (?: agent | representative )
 )                             # (1 end)
 \b

huangapple
  • 本文由 发表于 2023年6月5日 23:36:47
  • 转载请务必保留本文链接:https://go.coder-hub.com/76407983.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定