英文:
How to count specific keywords in a transcript with a condition
问题
I need to count how many times in the transcript the user is asking for an agent/representative before giving the bot a chance.
df["Transcript"] = df["Transcript"].str.lower()
df["User_Agent_Request"] = df["Transcript"].str.contains(r'^user\s*:\s*(agent|representative)', regex=True).cumsum()
count = df.loc[df["User_Agent_Request"] > 0, "User_Agent_Request"].nunique()
count
这段代码将计算在对话中,用户在给机器人一个机会之前有多少次要求代理或代表的情况。
英文:
I got a big data frame with a "Transcript" column between an bot and a user.
I need to count how many times in the transcript the user is asking for an agent/representative before giving the bot a chance.
The transcript looks as follow but longer:
"User : Order status.\nBot : Your order status is your orders tab. \nUser : representative."
"User : Agent please.\nBot : Waiting time is longer than usual."
I tried to use Regular Expression:
df["Transcript"] = df["Transcript"].str.lower()
df.loc[df["Transcript"].str.contains('agent|representative'),:]
But it will just output observations with those keywords.
How can I output a number that count when user first input is agent/representative?
答案1
得分: 2
我会通过将输入拆分为仅包含第一条转录文本(甚至在机器人回应之前),然后搜索您的术语,然后汇总结果以获取在第一条消息中用户请求代理的次数:
df['Transcript'].str.split('\n').str.get(0).str.contains('agent|representative').sum()
# 输出与您的示例一致:1
英文:
I'd do it by splitting the input to only the first transcript text (before the bot has even responded), then searching for your terms, and then summing the result to get the number of cases where a user has requested an agent within the first message:
df['Transcript'].str.split('\n').str.get(0).str.contains('agent|representative').sum()
# Output with your examples: 1
答案2
得分: 0
You could use re.findall() then take the length of the matches. This is a general way. There are other ways too.
import re
dfTranscript = '''
User: Order status.
Bot: Your order status is in your orders tab.
User: representative.
User: Agent please.
Bot: Waiting time is longer than usual.
'''
res = re.findall(r"(?i)\b(User\s*:.*?\b(?:agent|representative))\b", dfTranscript)
print(len(res))
The regex presumably needed:
(?i)
\b
( # (1 start)
User \s* : .*? \b
(?: agent | representative )
) # (1 end)
\b
英文:
You could use re.findall() then take the length of the matches.
This is general way. There are other ways too.
>>> import re
>>>
>>> dfTranscript = '''
... User : Order status.
... Bot : Your order status is your orders tab.
... User : representative.
...
... User : Agent please.
... Bot : Waiting time is longer than usual.
... '''
>>>
>>> res = re.findall(r"(?i)\b(User\s*:.*?\b(?:agent|representative))\b", dfTranscript)
>>>
>>> print(len(res))
2
The regex presumably needed :
(?i)
\b
( # (1 start)
User \s* : .*? \b
(?: agent | representative )
) # (1 end)
\b
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论