英文:
How to replace all characters before and after specific place in text in Python
问题
我有一个带有文本列中分隔符的CSV文件。文本列中的分隔符数量从一行到另一行不同。
CSV数据示例(分隔符为'_'):
ID_GROUP_TEXT_DATE_PART
101_group_1_Some text is here_23.06.2023_1
102_group_2_Some text is _ here_23.06.2023_1
103_group_3_Some text _ is _ here_23.06.2023_1
104_group_4_Some text is here_23.06.2023_1
我想要正确地按列拆分文本。预期结果如下:
ID | GROUP | TEXT | DATE | PART |
---|---|---|---|---|
101 | group_1 | Some text is here | 23.06.2023 | 1 |
102 | group_2 | Some text is _ here | 23.06.2023 | 1 |
103 | group_3 | Some text _ is _ here | 23.06.2023 | 1 |
104 | group_4 | Some text is here | 23.06.2023 | 1 |
英文:
I have a csv file with delimiter in text column. The number of delimiter in text column is different from row to row.
Example of csv data (delimiter is '_'):
ID_GROUP_TEXT_DATE_PART
101_group_1_Some text is here_23.06.2023_1
102_group_2_Some text is _ here_23.06.2023_1
103_group_3_Some text _ is _ here_23.06.2023_1
104_group_4_Some text is here_23.06.2023_1
I would like to correctly split the text by the columns.
The expected result is:
ID | GROUP | TEXT | DATE | PART |
---|---|---|---|---|
101 | group_1 | Some text is here | 23.06.2023 | 1 |
102 | group_2 | Some text is _ here | 23.06.2023 | 1 |
103 | group_3 | Some text _ is _ here | 23.06.2023 | 1 |
104 | group_4 | Some text is here | 23.06.2023 | 1 |
答案1
得分: 1
以下是翻译好的代码部分:
我建议编写一个正则表达式模式,以查找相应的列。
在你的情况下,你应该创建一个类似于以下的模式:
数字_组_数字_文本_日期_数字
所以最终的代码应该是:
```python
import re
import pandas as pd
data = """
101_group_1_Some text is here_23.06.2023_1
102_group_2_Some text is _ here_23.06.2023_1
103_group_3_Some text _ is _ here_23.06.2023_1
104_group_4_Some text is here_23.06.2023_1
"""
pattern = r"(\d+)_group_(\d+)_(.+)_(\d{2}.\d{2}.\d{4})_(\d)"
matches = re.findall(pattern, data)
df = pd.DataFrame(matches, columns=['ID', 'GROUP', 'TEXT', 'DATE', 'PART'])
print(df)
<details>
<summary>英文:</summary>
I would suggest writing a RegEx pattern in order to find the corresponding columns.
In your case you should create a pattern going like:
Number_group_n_text_date_Number
SO the final code should be:
```python
import re
import pandas as pd
data = """
101_group_1_Some text is here_23.06.2023_1
102_group_2_Some text is _ here_23.06.2023_1
103_group_3_Some text _ is _ here_23.06.2023_1
104_group_4_Some text is here_23.06.2023_1
"""
pattern = r"(\d+)_group_(\d+)_(.+)_(\d{2}.\d{2}.\d{4})_(\d)"
matches = re.findall(pattern, data)
df = pd.DataFrame(matches, columns=['ID', 'GROUP', 'TEXT', 'DATE', 'PART'])
print(df)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论