Python – 从电子邮件中提取信息

huangapple go评论109阅读模式
英文:

Python - extract information from email

问题

以下是您提供的电子邮件示例的翻译部分:

电子邮件示例1

亲爱的大家,

请注意总销售量和剩余总库存

总销售量:45677
总剩余库存 A:3456

如有任何疑问或意见,请随时联系。

最好的问候,

电子邮件示例2

亲爱的大家,

请查看以下数据:

总体积:1,231,245
剩余库存 A 的数量:232
剩余库存 B 的数量:1,435

电子邮件示例3

亲爱的大家,

请查找我们的总体积为 233,435

总剩余库存 A:2453

电子邮件示例4

在五月份,我们剩余库存 A 为 90,TEU 为 4190。

我想从这些电子邮件中提取体积和总剩余库存数字。是否有任何关于如何使用 Python 获取这些数字的提示?

我已经准备好了以下代码,用于从电子邮件中提取数字。但我无法区分哪个数字是总销售量,哪个数字是总剩余库存。

英文:

I am new to Python. Below are some sample emails I received.

Email sample 1

Dear all,

Please note the Total selling volume and total remaining stock

Total selling volume: 45677
Total remaining stock A:3456

Remain at your disposal in case of any doubt or comments.

Best Regards,

Email sample 2

Dear all,

Please see the data as below:

Tol volume: 1,231,245
No. of remaining stock A: 232
No. of remaining stock B: 1,435

Email sample 3

Dear All,

Please find our volume was 233,435

Total remaining stock A: 2453

Email sample 4

In May we had 90 remaining stock A and 4190 TEUs.

I would like to extract the volume and total remaining stock figures from those emails. Any hints if I can get those figures by using python?

I have prepared the below code to extract the figures from email. However I can not distinguish which figure is total selling volume, total remaining stock

  1. import re
  2. import pandas as pd
  3. import win32com.client
  4. from datetime import datetime, timedelta
  5. outlook = win32com.client.Dispatch('outlook.application')
  6. mapi = outlook.GetNamespace("MAPI")
  7. inbox = mapi.GetDefaultFolder(6).Folders.Item("AI email testing")
  8. #outlook.GetDefaultFolder(6) .Folders.Item("Your_Folder_Name")
  9. #inbox = outlook.GetDefaultFolder(6)
  10. messages = inbox.Items
  11. received_dt = datetime.now() - timedelta(days=1)
  12. received_dt = received_dt.strftime('%m/%d/%Y %H:%M %p')
  13. for message in list(messages):
  14. #print (message)
  15. body_content = message.body
  16. body_content =body_content[body_content.find("Subject:"):]
  17. #print(body_content)
  18. figures = re.findall("\d+(?:,\d+)*(?:\.\d+)?",body_content)
  19. print(figures)

答案1

得分: 0

以下是使用正则表达式的解决方案:

  1. from __future__ import annotations
  2. import re
  3. from typing import List, Tuple
  4. def get_number(text: str) -> float | int | str:
  5. """
  6. 从输入字符串中提取第一个数值。
  7. 该函数使用正则表达式从`text`中提取第一个数值。
  8. 如果未找到数字值,则返回原始字符串。如果从提取的数字中删除逗号(如果有的话)。
  9. 该函数首先尝试将数字转换为整数,如果失败,则尝试将其转换为浮点数。
  10. Parameters
  11. ----------
  12. text : str
  13. 应从中提取数值的字符串。
  14. Returns
  15. -------
  16. float | int | str
  17. 转换为int或float的`text`中的第一个数值,如果未找到数字值,则返回原始`text`。
  18. Raises
  19. ------
  20. ValueError
  21. 如果提取的数字无法转换为整数或浮点数。
  22. Examples
  23. --------
  24. 函数用法和行为的说明。
  25. >>> get_number("Hello world 123!")
  26. 123
  27. >>> get_number("I have 2,200 dollars.")
  28. 2200
  29. >>> get_number("No numbers here.")
  30. 'No numbers here.'
  31. >>> get_number("It is over 9000!")
  32. 9000
  33. >>> get_number("The value of pi is about 3.14159.")
  34. 3.14159
  35. >>> get_number("Total: 123,456,789.")
  36. 123456789.0
  37. """
  38. number = re.search(r'(\d+|,)+.', text, re.I)
  39. if number:
  40. number = number[0].strip().replace(',', '')
  41. if not number:
  42. print(f"Found no numbers inside text: {text!r}")
  43. return text
  44. try:
  45. return int(number)
  46. except ValueError:
  47. return float(number)
  48. def extract_stock_volume_from_email(email: str) -> Tuple[int | float | str, int | float | str]:
  49. """
  50. 从电子邮件文本中提取容量和剩余库存A的详细信息。
  51. 此函数使用正则表达式解析给定的电子邮件文本并提取有关容量和剩余库存A的详细信息。
  52. 然后清理提取的值并返回。
  53. Parameters
  54. ----------
  55. email : str
  56. 要解析的电子邮件文本。
  57. Returns
  58. -------
  59. volume : int | float | str
  60. 从电子邮件中提取的容量。
  61. 如果未找到容量详细信息,则返回'Volume not found'。
  62. remaining_stock_a : int | float | str
  63. 从电子邮件中提取的剩余库存A。
  64. 如果未找到库存A详细信息,则返回'Remaining stock A not found'。
  65. Raises
  66. ------
  67. re.error
  68. 如果使用无效的正则表达式。
  69. See Also
  70. --------
  71. re.search:用于提取容量和剩余库存详细信息的方法。
  72. Examples
  73. --------
  74. >>> email_text = "The volume was 5000 TEUs. Stock A: 1000 units."
  75. >>> extract_stock_volume_from_email(email_text)
  76. (5000, 1000)
  77. >>> email_text = "No volume and stock data available."
  78. >>> extract_stock_volume_from_email(email_text)
  79. ('Volume not found', 'Remaining stock A not found')
  80. """
  81. # 提取容量
  82. volume = re.search(
  83. r'(?:volume:|volume was|TEUs\.|TEUs |TEU |$)\s(\d+|,)+.*?|(\d+|,)+.(?:\sTEUs|\sTEU)',
  84. email, re.I
  85. )
  86. if volume:
  87. volume = get_number(volume[0].strip())
  88. if not volume:
  89. volume = 'Volume not found'
  90. # 提取剩余库存A
  91. remaining_stock_a = re.search(r'(?:stock A:|stock A: |$)(\d+|,)+.*?', email, re.I)
  92. if remaining_stock_a:
  93. remaining_stock_a = remaining_stock_a[0].strip()
  94. if not remaining_stock_a:
  95. remaining_stock_a = re.search(r'(\d+)(.+)(stock A)', email, re.I)
  96. if remaining_stock_a:
  97. remaining_stock_a = remaining_stock_a[0].strip()
  98. if remaining_stock_a:
  99. remaining_stock_a = get_number(remaining_stock_a)
  100. if not remaining_stock_a:
  101. remaining_stock_a = 'Remaining stock A not found'
  102. return volume, remaining_stock_a
  103. def extract_stock_volume_from_emails(
  104. emails: List[str],
  105. ) -> List[Tuple[int | float | str, int | float | str]]:
  106. """
  107. 将函数`extract_stock_volume_from_email`应用于电子邮件列表。
  108. Parameters
  109. ----------
  110. emails : List[str]
  111. 要解析的电子邮件文本列表。
  112. Returns
  113. -------
  114. List[Tuple[int | float | str, int | float | str]]
  115. 包含从每封电子邮件中提取的容量和剩余库存A的元组列表。
  116. 如果无法从电子邮件中提取容量或库存A详细信息,则元组中的相应元素将是'Volume not found'或'Remaining stock A not found'。
  117. Raises
  118. ------
  119. re.error
  120. 如果在`extract_stock_volume_from_email`中使用无效的正则表达式。
  121. See Also
  122. --------
  123. extract_stock_volume_from_email:用于从每封电子邮件中提取详细信息的函数。
  124. Examples
  125. --------
  126. >>> email_texts = [
  127. ... "The volume was 5000 TEUs. Stock A: 1000 units.",
  128. ... "No volume and stock data available.",
  129. ... ]
  130. >>> extract_stock_volume_from_emails(email_texts)
  131. [(5000, 1000), ('Volume not found', 'Remaining stock A not found')]
  132. """
  133. return list(map(extract_stock_volume_from_email, emails))

使用上述代码对您提供的示例电子邮件进行操作:

  1. emails = [
  2. r"""Dear all,
  3. Please note the Total selling volume and total remaining stock
  4. Total selling volume: 45677 Total remaining stock A:3456
  5. Remain at your disposal in case of any doubt or comments.
  6. Best Regards,""",
  7. r"""Dear all,
  8. Please see the data as below:
  9. Tol volume: 1,231,245 No. of remaining stock A: 232 No. of remaining stock B: 1,435""",
  10. r"""Dear All,
  11. Please find our volume was
  12. <details>
  13. <summary>英文:</summary>
  14. Here&#39;s a solution using RegEx:
  15. ```python
  16. from __future__ import annotations
  17. import re
  18. from typing import List, Tuple
  19. def get_number(text: str) -&gt; float | int | str:
  20. &quot;&quot;&quot;
  21. Extract the first numeric value from the input string.
  22. The function uses regular expressions to extract the first numeric
  23. occurrence from `text`. If no numeric value is found, the original string
  24. is returned. Commas are removed from the extracted number, if any.
  25. The function first attempts to convert the number to an integer,
  26. and if that fails, it tries to convert it to a float.
  27. Parameters
  28. ----------
  29. text : str
  30. The string from which the numeric value should be extracted.
  31. Returns
  32. -------
  33. float | int | str
  34. The first numeric value in `text` converted to int or float,
  35. or original `text` if no numeric value is found.
  36. Raises
  37. ------
  38. ValueError
  39. If the extracted number can&#39;t be converted to an integer or a float.
  40. Examples
  41. --------
  42. Illustration of the function usage and behavior.
  43. &gt;&gt;&gt; get_number(&quot;Hello world 123!&quot;)
  44. 123
  45. &gt;&gt;&gt; get_number(&quot;I have 2,200 dollars.&quot;)
  46. 2200
  47. &gt;&gt;&gt; get_number(&quot;No numbers here.&quot;)
  48. &#39;No numbers here.&#39;
  49. &gt;&gt;&gt; get_number(&quot;It is over 9000!&quot;)
  50. 9000
  51. &gt;&gt;&gt; get_number(&quot;The value of pi is about 3.14159.&quot;)
  52. 3.14159
  53. &gt;&gt;&gt; get_number(&quot;Total: 123,456,789.&quot;)
  54. 123456789.0
  55. &quot;&quot;&quot;
  56. number = re.search(r&#39;(\d+|,)+.&#39;, text, re.I)
  57. if number:
  58. number = number[0].strip().replace(&#39;,&#39;, &#39;&#39;)
  59. if not number:
  60. print(f&quot;Found no numbers inside text: {text!r}&quot;)
  61. return text
  62. try:
  63. return int(number)
  64. except ValueError:
  65. return float(number)
  66. def extract_stock_volume_from_email(email: str) -&gt; Tuple[int | float | str, int | float | str]:
  67. &quot;&quot;&quot;
  68. Extract the volume and remaining stock A details from an email text.
  69. This function employs regular expressions to parse the given email text and
  70. extract details about volume and remaining stock A.
  71. The values extracted are then cleaned and returned.
  72. Parameters
  73. ----------
  74. email : str
  75. Text from the email to parse.
  76. Returns
  77. -------
  78. volume : int | float | str
  79. Volume extracted from the email.
  80. Returns &#39;Volume not found&#39; if no volume details are found.
  81. remaining_stock_a : int | float | str
  82. Remaining stock A extracted from the email.
  83. Returns &#39;Remaining stock A not found&#39; if no stock A details are found.
  84. Raises
  85. ------
  86. re.error
  87. If a non-valid regular expression is used.
  88. See Also
  89. --------
  90. re.search : The method used for extracting volume and remaining stock details.
  91. Examples
  92. --------
  93. &gt;&gt;&gt; email_text = &quot;The volume was 5000 TEUs. Stock A: 1000 units.&quot;
  94. &gt;&gt;&gt; extract_stock_volume_from_email(email_text)
  95. (5000, 1000)
  96. &gt;&gt;&gt; email_text = &quot;No volume and stock data available.&quot;
  97. &gt;&gt;&gt; extract_stock_volume_from_email(email_text)
  98. (&#39;Volume not found&#39;, &#39;Remaining stock A not found&#39;)
  99. &quot;&quot;&quot;
  100. # Extract the volume
  101. volume = re.search(
  102. r&#39;(?:volume:|volume was|TEUs\.|TEUs |TEU |$)\s(\d+|,)+.*?|(\d+|,)+.(?:\sTEUs|\sTEU)&#39;,
  103. email, re.I
  104. )
  105. if volume:
  106. volume = get_number(volume[0].strip())
  107. if not volume:
  108. volume = &#39;Volume not found&#39;
  109. # Extract the remaining stock
  110. remaining_stock_a = re.search(r&#39;(?:stock A:|stock A: |$)(\d+|,)+.*?&#39;, email, re.I)
  111. if remaining_stock_a:
  112. remaining_stock_a = remaining_stock_a[0].strip()
  113. if not remaining_stock_a:
  114. remaining_stock_a = re.search(r&#39;(\d+)(.+)(stock A)&#39;, email, re.I)
  115. if remaining_stock_a:
  116. remaining_stock_a = remaining_stock_a[0].strip()
  117. if remaining_stock_a:
  118. remaining_stock_a = get_number(remaining_stock_a)
  119. if not remaining_stock_a:
  120. remaining_stock_a = &#39;Remaining stock A not found&#39;
  121. # print(f&quot;Volume: {volume}\nRemaining Stock A: {remaining_stock_a}\n&quot;)
  122. return volume, remaining_stock_a
  123. def extract_stock_volume_from_emails(
  124. emails: List[str],
  125. ) -&gt; List[Tuple[int | float | str, int | float | str]]:
  126. &quot;&quot;&quot;
  127. Apply the function `extract_stock_volume_from_email` to a list of emails.
  128. Parameters
  129. ----------
  130. emails : List[str]
  131. A list of email texts to be parsed.
  132. Returns
  133. -------
  134. List[Tuple[int | float | str, int | float | str]]
  135. A list of tuples. Each tuple contains the volume and remaining stock A
  136. extracted from each email. If no volume or stock A details could be
  137. extracted from an email, the corresponding element in the tuple will be
  138. &#39;Volume not found&#39; or &#39;Remaining stock A not found&#39;, respectively.
  139. Raises
  140. ------
  141. re.error
  142. If a non-valid regular expression is used in `extract_stock_volume_from_email`.
  143. See Also
  144. --------
  145. extract_stock_volume_from_email : The function used to extract details from each email.
  146. Examples
  147. --------
  148. &gt;&gt;&gt; email_texts = [
  149. ... &quot;The volume was 5000 TEUs. Stock A: 1000 units.&quot;,
  150. ... &quot;No volume and stock data available.&quot;,
  151. ... ]
  152. &gt;&gt;&gt; extract_stock_volume_from_emails(email_texts)
  153. [(5000, 1000), (&#39;Volume not found&#39;, &#39;Remaining stock A not found&#39;)]
  154. &quot;&quot;&quot;
  155. return list(map(extract_stock_volume_from_email, emails))

Using the above code on the e-mails you provided as example:

  1. emails = [
  2. r&quot;&quot;&quot;Dear all,
  3. Please note the Total selling volume and total remaining stock
  4. Total selling volume: 45677 Total remaining stock A:3456
  5. Remain at your disposal in case of any doubt or comments.
  6. Best Regards,&quot;&quot;&quot;,
  7. r&quot;&quot;&quot;Dear all,
  8. Please see the data as below:
  9. Tol volume: 1,231,245 No. of remaining stock A: 232 No. of remaining stock B: 1,435&quot;&quot;&quot;,
  10. r&quot;&quot;&quot;Dear All,
  11. Please find our volume was 233,435
  12. Total remaining stock A: 2453&quot;&quot;&quot;,
  13. r&quot;In May we had 90 remaining stock A and 4190 TEUs.&quot;,
  14. ]
  15. extract_stock_volume_from_emails(emails)
  16. # Returns:
  17. #
  18. # [(45677, 3456), (1231245, 232), (233435, 2453), (4190, 90)]
  19. # ^----^ ^--^
  20. # | |
  21. # | +-- Remaining stock A
  22. # +-- Volume

Note

It should be noted that the function extract_stock_volume_from_email, that parses each e-mail is not failproof. The RegEx patterns it contains were all based on the e-mails you provided as example. If other e-mails don't follow the same patterns as the example e-mails, these additional patterns will have to be added to the extract_stock_volume_from_email function.

答案2

得分: 0

import re

email_content = """亲爱的大家,请注意总销售量
和总剩余库存 总销售量:45677 总剩余库存A:3456 如有任何疑问或意见,请随时联系。
最好的问候,
"""

正则表达式模式以匹配数字及其上下文

number_pattern = r'总销售量:(\d+[.,]?\d+)\s+总剩余库存A:(\d+[.,]?\d+)'

使用正则表达式提取数字及其上下文

matches = re.findall(number_pattern, email_content)

for match in matches:
total_selling_volume = match[0]
total_remaining_stock = match[1]
print("总销售量:", total_selling_volume)
print("总剩余库存A:", total_remaining_stock)

输出

总销售量:45677
总剩余库存A:3456

英文:
  1. import re
  2. email_content = &quot;&quot;&quot;Dear all,Please note the Total selling volume
  3. and total remaining stock Total selling volume: 45677 Total
  4. remaining stock A:3456 Remain at your disposal in case of any
  5. doubt or comments.
  6. Best Regards,
  7. &quot;&quot;&quot;
  8. #Regular expression pattern to match numbers and their context
  9. number_pattern = r&#39;Total selling volume: (\d+[.,]?\d+)\s+Total
  10. remaining stock A:(\d+[.,]?\d+)&#39;
  11. #Extract numbers and their context using regular expression
  12. matches = re.findall(number_pattern, email_content)
  13. for match in matches:
  14. total_selling_volume = match[0]
  15. total_remaining_stock = match[1]
  16. print(&quot;Total selling volume:&quot;, total_selling_volume)
  17. print(&quot;Total remaining stock:&quot;, total_remaining_stock)
  18. #Output
  19. Total selling volume: 45677
  20. Total remaining stock: 3456

huangapple
  • 本文由 发表于 2023年6月13日 11:32:47
  • 转载请务必保留本文链接:https://go.coder-hub.com/76461534.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定