如何在Python中替换特定位置之前和之后的所有字符

huangapple go评论91阅读模式
英文:

How to replace all characters before and after specific place in text in Python

问题

我有一个带有文本列中分隔符的CSV文件。文本列中的分隔符数量从一行到另一行不同。

CSV数据示例(分隔符为'_'):
ID_GROUP_TEXT_DATE_PART
101_group_1_Some text is here_23.06.2023_1
102_group_2_Some text is _ here_23.06.2023_1
103_group_3_Some text _ is _ here_23.06.2023_1
104_group_4_Some text is here_23.06.2023_1

我想要正确地按列拆分文本。预期结果如下:

ID GROUP TEXT DATE PART
101 group_1 Some text is here 23.06.2023 1
102 group_2 Some text is _ here 23.06.2023 1
103 group_3 Some text _ is _ here 23.06.2023 1
104 group_4 Some text is here 23.06.2023 1
英文:

I have a csv file with delimiter in text column. The number of delimiter in text column is different from row to row.

Example of csv data (delimiter is '_'):
ID_GROUP_TEXT_DATE_PART
101_group_1_Some text is here_23.06.2023_1
102_group_2_Some text is _ here_23.06.2023_1
103_group_3_Some text _ is _ here_23.06.2023_1
104_group_4_Some text is here_23.06.2023_1

I would like to correctly split the text by the columns.
The expected result is:

ID GROUP TEXT DATE PART
101 group_1 Some text is here 23.06.2023 1
102 group_2 Some text is _ here 23.06.2023 1
103 group_3 Some text _ is _ here 23.06.2023 1
104 group_4 Some text is here 23.06.2023 1

答案1

得分: 1

以下是翻译好的代码部分:

  1. 我建议编写一个正则表达式模式以查找相应的列
  2. 在你的情况下你应该创建一个类似于以下的模式
  3. 数字__数字_文本_日期_数字
  4. 所以最终的代码应该是
  5. ```python
  6. import re
  7. import pandas as pd
  8. data = """
  9. 101_group_1_Some text is here_23.06.2023_1
  10. 102_group_2_Some text is _ here_23.06.2023_1
  11. 103_group_3_Some text _ is _ here_23.06.2023_1
  12. 104_group_4_Some text is here_23.06.2023_1
  13. """
  14. pattern = r"(\d+)_group_(\d+)_(.+)_(\d{2}.\d{2}.\d{4})_(\d)"
  15. matches = re.findall(pattern, data)
  16. df = pd.DataFrame(matches, columns=['ID', 'GROUP', 'TEXT', 'DATE', 'PART'])
  17. print(df)
  1. <details>
  2. <summary>英文:</summary>
  3. I would suggest writing a RegEx pattern in order to find the corresponding columns.
  4. In your case you should create a pattern going like:
  5. Number_group_n_text_date_Number
  6. SO the final code should be:
  7. ```python
  8. import re
  9. import pandas as pd
  10. data = &quot;&quot;&quot;
  11. 101_group_1_Some text is here_23.06.2023_1
  12. 102_group_2_Some text is _ here_23.06.2023_1
  13. 103_group_3_Some text _ is _ here_23.06.2023_1
  14. 104_group_4_Some text is here_23.06.2023_1
  15. &quot;&quot;&quot;
  16. pattern = r&quot;(\d+)_group_(\d+)_(.+)_(\d{2}.\d{2}.\d{4})_(\d)&quot;
  17. matches = re.findall(pattern, data)
  18. df = pd.DataFrame(matches, columns=[&#39;ID&#39;, &#39;GROUP&#39;, &#39;TEXT&#39;, &#39;DATE&#39;, &#39;PART&#39;])
  19. print(df)

huangapple
  • 本文由 发表于 2023年6月26日 18:32:17
  • 转载请务必保留本文链接:https://go.coder-hub.com/76555848.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定