问题

我有一个带有文本列中分隔符的CSV文件。文本列中的分隔符数量从一行到另一行不同。

CSV数据示例（分隔符为'_'）：
ID_GROUP_TEXT_DATE_PART
101_group_1_Some text is here_23.06.2023_1
102_group_2_Some text is _ here_23.06.2023_1
103_group_3_Some text _ is _ here_23.06.2023_1
104_group_4_Some text is here_23.06.2023_1

我想要正确地按列拆分文本。预期结果如下：

ID	GROUP	TEXT	DATE	PART
101	group_1	Some text is here	23.06.2023	1
102	group_2	Some text is _ here	23.06.2023	1
103	group_3	Some text _ is _ here	23.06.2023	1
104	group_4	Some text is here	23.06.2023	1

英文:

I have a csv file with delimiter in text column. The number of delimiter in text column is different from row to row.

Example of csv data (delimiter is '_'):
ID_GROUP_TEXT_DATE_PART
101_group_1_Some text is here_23.06.2023_1
102_group_2_Some text is _ here_23.06.2023_1
103_group_3_Some text _ is _ here_23.06.2023_1
104_group_4_Some text is here_23.06.2023_1

I would like to correctly split the text by the columns.
The expected result is:

ID	GROUP	TEXT	DATE	PART
101	group_1	Some text is here	23.06.2023	1
102	group_2	Some text is _ here	23.06.2023	1
103	group_3	Some text _ is _ here	23.06.2023	1
104	group_4	Some text is here	23.06.2023	1

答案1

得分: 1

以下是翻译好的代码部分：

我建议编写一个正则表达式模式，以查找相应的列。
在你的情况下，你应该创建一个类似于以下的模式：
数字_组_数字_文本_日期_数字
所以最终的代码应该是：
```python
import re
import pandas as pd
data = """
101_group_1_Some text is here_23.06.2023_1
102_group_2_Some text is _ here_23.06.2023_1
103_group_3_Some text _ is _ here_23.06.2023_1
104_group_4_Some text is here_23.06.2023_1
"""
pattern = r"(\d+)_group_(\d+)_(.+)_(\d{2}.\d{2}.\d{4})_(\d)"
matches = re.findall(pattern, data)
df = pd.DataFrame(matches, columns=['ID', 'GROUP', 'TEXT', 'DATE', 'PART'])
print(df)


<details>
<summary>英文:</summary>
I would suggest writing a RegEx pattern in order to find the corresponding columns.
In your case you should create a pattern going like:
Number_group_n_text_date_Number
SO the final code should be:
```python
import re
import pandas as pd
data = &quot;&quot;&quot;
101_group_1_Some text is here_23.06.2023_1
102_group_2_Some text is _ here_23.06.2023_1
103_group_3_Some text _ is _ here_23.06.2023_1
104_group_4_Some text is here_23.06.2023_1
&quot;&quot;&quot;
pattern = r&quot;(\d+)_group_(\d+)_(.+)_(\d{2}.\d{2}.\d{4})_(\d)&quot;
matches = re.findall(pattern, data)
df = pd.DataFrame(matches, columns=[&#39;ID&#39;, &#39;GROUP&#39;, &#39;TEXT&#39;, &#39;DATE&#39;, &#39;PART&#39;])
print(df)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在Python中替换特定位置之前和之后的所有字符

问题

答案1

列表操作与索引

读取非UTF8编码的文件内容并正确打印出来

login(request, user) 函数在 django.contrib.auth 中不起作用。

将For语句向量化 – 对角线上的零

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。