2023年3月7日 04:06:02go评论95阅读模式

英文:

How can I separate this conversation into per row using python,

问题

以下是翻译的内容：

让我们假设下面的字符串是一个具有“body”作为列名的行的内容。现在我想要从这个字符串中为每个发言者创建一行。

助手：
嗨，我是助手，虚拟助手，我今天可以如何帮助您？
您是否正在查询：
电子书
有声书
购买
订阅
电影
等等

客户说：
电影

预期的输出应该如下所示：

发言者	文本内容
助手	嗨，我是助手，虚拟助手，我今天可以如何帮助您？您是否正在查询：电子书有声书购买订阅电影等等
客户说	电影

我尝试过这个，但是

Testresult = tempchatdf.body.str.split(":\*\*", expand = True)

英文:

Lets say this string below is a content of a row with a "body" as the column name. Now I want to create a row per speaker out from this string.

Helper:
Hi, I'm Helper, Virtual Assistant, how can I help you today?
Are you inquiring about:
eBooks
Audiobooks
Purchasing
Subscriptions
Movies
etc

Cx said:
Movies

The expected output should be like:

Speaker	Transcript
Helper	Hi, I'm Helper, Virtual Assistant, how can I help you today? Are you inquiring about:eBooksAudiobooks Purchasing Subscriptions Movies etc
Cx said	Movies

I have tried this but the

Testresult = tempchatdf.body.str.split(&quot;:\*\*&quot;,expand = True)

答案1

得分: 1

你只需获取str并执行split(":")。

split = string.split(&quot;:&quot;)
result = [split[0], &quot;:&quot;.join(split[1:])]

这将获取第一个拆分并将其作为发言者（第0个索引），然后将其余部分与":"连接起来。这样做是为了确保任何额外的":"都会返回。

如果字符串包含多行这样的内容，您可以将其放入一个循环中。

table = []
for line in string:
    split = string.split(&quot;:&quot;)
    table.append([split[0], &quot;:&quot;.join(split[1:])])

英文:

You can just take the str and split(":").

split = string.split(&quot;:&quot;)
result = [split[0], &quot;:&quot;.join(split[1:])]

This takes the first split and makes that the speaker (0th index) then combines the rest of the split with ":". This is done to ensure that any extra ":" comes back.

If the string contains several lines of these, you can just wrap it in a loop.

table = []
for line in string:
    split = string.split(&quot;:&quot;)
    table.append([split[0], &quot;:&quot;.join(split[1:])])

答案2

得分: 0

这是一种使用 re.findall() 来匹配 body 字符串并创建新数据框的方法：

row_str = df["body"].values[0]
data = re.findall(r'(.+?):\s*(.+)', row_str)

new_df = pd.DataFrame(data, columns=["Speaker", "Transcript"])
print(new_df)

或者，您可以使用正则表达式与 pandas.DataFrame.explode 结合在列表推导中的方法：

pattern = r'^([a-zA-Z\s]+):';
rows = [{"Speaker": re.match(pattern, line).group(1).strip(), 
         "Transcript": line.split(":", 1)[1].strip()}
         for line in df["body"].str.split("\n").explode().tolist()
         if re.match(pattern, line)]

new_df = pd.DataFrame(rows)
print(new_df)

Speaker	Transcript
Helper	Hi, I'm Helper, Virtual Assistant, how can I help you today? Are you inquiring about:eBooksAudiobooks Purchasing Subscriptions Movies etc
Cx said	Movies

英文:

Here is an approach using re.findall() to match the body string and creating a new df

row_str = df[&quot;body&quot;].values[0]
data = re.findall(r&#39;(.+?):\s*(.+)&#39;, row_str)

new_df = pd.DataFrame(data, columns=[&quot;Speaker&quot;, &quot;Transcript&quot;])
print(new_df)

Or you can use re with pandas.DataFrame.explode in a list comprehension

pattern = r&#39;^([a-zA-Z\s]+):&#39;
rows = [{&quot;Speaker&quot;: re.match(pattern, line).group(1).strip(), 
         &quot;Transcript&quot;: line.split(&quot;:&quot;, 1)[1].strip()}
         for line in df[&quot;body&quot;].str.split(&quot;\n&quot;).explode().tolist()
         if re.match(pattern, line)]

new_df = pd.DataFrame(rows)
print(new_df)

Speaker	Transcript
Helper	Hi, I'm Helper, Virtual Assistant, how can I help you today? Are you inquiring about:eBooksAudiobooks Purchasing Subscriptions Movies etc
Cx said	Movies

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何使用Python将此对话分隔成每行一条记录？

问题

答案1

答案2

从SocketIO连接事件调用Celery任务

Flask重定向到url_for。

内存升级后增加了setuptools。

如何重塑我的数据，以便具有多个观察的ID按ID的所有可能观察对进行分组？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论