如何在Python中替换特定位置之前和之后的所有字符

huangapple go评论68阅读模式
英文:

How to replace all characters before and after specific place in text in Python

问题

我有一个带有文本列中分隔符的CSV文件。文本列中的分隔符数量从一行到另一行不同。

CSV数据示例(分隔符为'_'):
ID_GROUP_TEXT_DATE_PART
101_group_1_Some text is here_23.06.2023_1
102_group_2_Some text is _ here_23.06.2023_1
103_group_3_Some text _ is _ here_23.06.2023_1
104_group_4_Some text is here_23.06.2023_1

我想要正确地按列拆分文本。预期结果如下:

ID GROUP TEXT DATE PART
101 group_1 Some text is here 23.06.2023 1
102 group_2 Some text is _ here 23.06.2023 1
103 group_3 Some text _ is _ here 23.06.2023 1
104 group_4 Some text is here 23.06.2023 1
英文:

I have a csv file with delimiter in text column. The number of delimiter in text column is different from row to row.

Example of csv data (delimiter is '_'):
ID_GROUP_TEXT_DATE_PART
101_group_1_Some text is here_23.06.2023_1
102_group_2_Some text is _ here_23.06.2023_1
103_group_3_Some text _ is _ here_23.06.2023_1
104_group_4_Some text is here_23.06.2023_1

I would like to correctly split the text by the columns.
The expected result is:

ID GROUP TEXT DATE PART
101 group_1 Some text is here 23.06.2023 1
102 group_2 Some text is _ here 23.06.2023 1
103 group_3 Some text _ is _ here 23.06.2023 1
104 group_4 Some text is here 23.06.2023 1

答案1

得分: 1

以下是翻译好的代码部分:

我建议编写一个正则表达式模式以查找相应的列

在你的情况下你应该创建一个类似于以下的模式
数字_组_数字_文本_日期_数字

所以最终的代码应该是

```python
import re
import pandas as pd

data = """
101_group_1_Some text is here_23.06.2023_1
102_group_2_Some text is _ here_23.06.2023_1
103_group_3_Some text _ is _ here_23.06.2023_1
104_group_4_Some text is here_23.06.2023_1
"""

pattern = r"(\d+)_group_(\d+)_(.+)_(\d{2}.\d{2}.\d{4})_(\d)"

matches = re.findall(pattern, data)

df = pd.DataFrame(matches, columns=['ID', 'GROUP', 'TEXT', 'DATE', 'PART'])

print(df)

<details>
<summary>英文:</summary>

I would suggest writing a RegEx pattern in order to find the corresponding columns.

In your case you should create a pattern going like:
Number_group_n_text_date_Number

SO the final code should be:

```python
import re
import pandas as pd

data = &quot;&quot;&quot;
101_group_1_Some text is here_23.06.2023_1
102_group_2_Some text is _ here_23.06.2023_1
103_group_3_Some text _ is _ here_23.06.2023_1
104_group_4_Some text is here_23.06.2023_1
&quot;&quot;&quot;

pattern = r&quot;(\d+)_group_(\d+)_(.+)_(\d{2}.\d{2}.\d{4})_(\d)&quot;

matches = re.findall(pattern, data)

df = pd.DataFrame(matches, columns=[&#39;ID&#39;, &#39;GROUP&#39;, &#39;TEXT&#39;, &#39;DATE&#39;, &#39;PART&#39;])

print(df)

huangapple
  • 本文由 发表于 2023年6月26日 18:32:17
  • 转载请务必保留本文链接:https://go.coder-hub.com/76555848.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定