2023年6月5日 23:36:47go评论97阅读模式

英文:

How to count specific keywords in a transcript with a condition

问题

I need to count how many times in the transcript the user is asking for an agent/representative before giving the bot a chance.

df["Transcript"] = df["Transcript"].str.lower()
df["User_Agent_Request"] = df["Transcript"].str.contains(r'^user\s*:\s*(agent|representative)', regex=True).cumsum()
count = df.loc[df["User_Agent_Request"] > 0, "User_Agent_Request"].nunique()
count

这段代码将计算在对话中，用户在给机器人一个机会之前有多少次要求代理或代表的情况。

英文:

I got a big data frame with a "Transcript" column between an bot and a user.
I need to count how many times in the transcript the user is asking for an agent/representative before giving the bot a chance.

The transcript looks as follow but longer:

&quot;User : Order status.\nBot : Your order status is your orders tab. \nUser : representative.&quot;
&quot;User : Agent please.\nBot : Waiting time is longer than usual.&quot;

I tried to use Regular Expression:

df[&quot;Transcript&quot;] = df[&quot;Transcript&quot;].str.lower()
df.loc[df[&quot;Transcript&quot;].str.contains(&#39;agent|representative&#39;),:]

But it will just output observations with those keywords.
How can I output a number that count when user first input is agent/representative?

答案1

得分: 2

我会通过将输入拆分为仅包含第一条转录文本（甚至在机器人回应之前），然后搜索您的术语，然后汇总结果以获取在第一条消息中用户请求代理的次数：

df['Transcript'].str.split('\n').str.get(0).str.contains('agent|representative').sum()
# 输出与您的示例一致：1

英文:

I'd do it by splitting the input to only the first transcript text (before the bot has even responded), then searching for your terms, and then summing the result to get the number of cases where a user has requested an agent within the first message:

df[&#39;Transcript&#39;].str.split(&#39;\n&#39;).str.get(0).str.contains(&#39;agent|representative&#39;).sum()
# Output with your examples: 1

答案2

得分: 0

You could use re.findall() then take the length of the matches. This is a general way. There are other ways too.

import re
dfTranscript = '''
User: Order status.
Bot: Your order status is in your orders tab.
User: representative.
User: Agent please.
Bot: Waiting time is longer than usual.
'''
res = re.findall(r"(?i)\b(User\s*:.*?\b(?:agent|representative))\b", dfTranscript)
print(len(res))

The regex presumably needed:

(?i)
\b
(                             # (1 start)
   User \s* : .*? \b
   (?: agent | representative )
)                             # (1 end)
\b

英文:

You could use re.findall() then take the length of the matches.
This is general way. There are other ways too.

&gt;&gt;&gt; import re
&gt;&gt;&gt;
&gt;&gt;&gt; dfTranscript = &#39;&#39;&#39;
... User : Order status.
... Bot : Your order status is your orders tab.
... User : representative.
...
... User : Agent please.
... Bot : Waiting time is longer than usual.
... &#39;&#39;&#39;
&gt;&gt;&gt;
&gt;&gt;&gt; res = re.findall(r&quot;(?i)\b(User\s*:.*?\b(?:agent|representative))\b&quot;, dfTranscript)
&gt;&gt;&gt;
&gt;&gt;&gt; print(len(res))
2

The regex presumably needed :

 (?i)
 \b
 (                             # (1 start)
    User \s* : .*? \b
    (?: agent | representative )
 )                             # (1 end)
 \b

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在具有条件的转录中计算特定关键词的数量

问题

答案1

答案2

我的Python tkinter GUI为什么在我频繁点击返回按钮后关闭？

In Python, how do I read and write the actual word "None" (not the Keyword) between a .csv file and DataFrame?

解决二次阻力耦合微分方程。

在一个 Polars 数据框中如何找到每列的空值数量？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。