将包含关键词的段落保存到txt文件。

huangapple go评论54阅读模式
英文:

Save paragraphs containing keywords into txt file

问题

最近我有一个正在进行的研究项目,需要我只保留每个txt文件中包含关键词的段落。有没有办法做到这一点?

关键词=["cryptocurren","virtual curren","digital curren"]

txt样本

新技术的广泛采用,包括互联网服务,加密货币和支付系统,可能需要大量支出来修改或调整我们现有的产品和服务,因为我们在发展互联网银行和移动银行渠道战略以及远程连接解决方案时需要成长和发展。

重大自然灾害,如龙卷风、飓风、地震、火灾或洪水,可能会对我们开展业务产生重大不利影响,而我们的保险覆盖可能不足以弥补可能发生的损失。 恐怖主义行为、战争、社会动荡或大流行病可能会对我们的业务或整个经济造成干扰。 尽管我们已建立并定期测试灾难恢复程序,但任何此类事件的发生都可能对我们的业务、运营和财务状况产生重大不利影响。

如上文所示,只有第一段包含关键词列表中的关键词。因此,我只想保留包含txt文件中第1段的段落。

在此先行致谢!

我希望能找到一种只保留包含txt文件关键词的段落的方法。

英文:

Recently I have an ongoing research project that requires me to only keep paragraphs containing keywords of each txt file. Does there have any way to do that?

keywords=["cryptocurren","virtual curren","digital curren"]

txt sample
> The widespread adoption of new technologies, including internet services, cryptocurrencies and payment systems, could require substantial expenditures to modify or adapt our existing products and services as we grow and develop our internet banking and mobile banking channel strategies in addition to remote connectivity solutions.

> A significant natural disaster, such as a tornado, hurricane, earthquake, fire or flood, could have a material adverse impact on our ability to conduct business, and our insurance coverage may be insufficient to compensate for losses that may occur. Acts of terrorism, war, civil unrest, or pandemics could cause disruptions to our business or the economy as a whole. While we have established and regularly test disaster recovery procedures, the occurrence of any such event could have a material adverse effect on our business, operations and financial condition.

As the text showed above, only the first paragraph contains the keyword of the keyword list. Thus, I only want the txt file contain the 1st paragraph.

Thank you in advance!

I hope to find a way to only keep paragraphs that contain the keywords of the txt file.

答案1

得分: 0

我们可以通过按换行符拆分输入文本,将示例输入转换为一个段落列表,其中每个项目都是一个段落。然后,我们可以迭代列表中的每个元素,检查关键词是否存在,如果不存在,就将其移除。

sample = """新技术的广泛采用包括互联网服务加密货币和支付系统可能需要大量开支来修改或调整我们现有的产品和服务因为我们在发展互联网银行和移动银行渠道战略以及远程连接解决方案的同时也在发展产品和服务

重大自然灾害如龙卷风飓风地震火灾或洪水可能对我们开展业务造成重大不利影响我们的保险覆盖可能不足以弥补可能发生的损失恐怖主义行为战争社会动荡或大流行可能会对我们的业务或整体经济造成干扰尽管我们已经建立并定期测试灾难恢复程序但任何此类事件的发生都可能对我们的业务运营和财务状况产生重大不利影响"""
keywords = ["加密货币", "虚拟货币", "数字货币"]

sample = sample.split("\n")
output = []

for p in sample:
    for k in keywords:
        if k in p:
            output.append(p)
            break
英文:

We can turn the sample input into a list with each item being a paragraph by splitting the input text by newlines. Then we can iterate through each element in the list and check if the keyword is in it, and if it isn't, then we remove it.

sample = """The widespread adoption of new technologies, including internet services, cryptocurrencies and payment systems, could require substantial expenditures to modify or adapt our existing products and services as we grow and develop our internet banking and mobile banking channel strategies in addition to remote connectivity solutions.

A significant natural disaster, such as a tornado, hurricane, earthquake, fire or flood, could have a material adverse impact on our ability to conduct business, and our insurance coverage may be insufficient to compensate for losses that may occur. Acts of terrorism, war, civil unrest, or pandemics could cause disruptions to our business or the economy as a whole. While we have established and regularly test disaster recovery procedures, the occurrence of any such event could have a material adverse effect on our business, operations and financial condition."""
keywords = ["cryptocurren", "virtual curren", "digital curren"]

sample = sample.split("\n")
output = []

for p in sample:
    for k in keywords:
        if k in p:
            output.append(p)
            break

答案2

得分: 0

你必须找出段落,然后搜索关键词。我使用了正则表达式:

import re

data = """The widespread adoption of new technologies, including internet
services, cryptocurrencies and payment systems, could require
substantial expenditures to modify or adapt our existing products
and services as we grow and develop our internet banking and
mobile banking channel strategies in addition to remote 
connectivity solutions.

A significant natural disaster, such as a tornado, hurricane, 
earthquake, fire or flood, could have a material adverse impact on 
our ability to conduct business, and our insurance coverage may
be insufficient to compensate for losses that may occur. Acts of 
terrorism, war, civil unrest, or pandemics could cause disruptions
to our business or the economy as a whole. While we have
established and regularly test disaster recovery procedures, the 
occurrence of any such event could have a material adverse effect 
on our business, operations and financial condition."""
    
keywords=["cryptocurren","virtual curren","digital curren"]
# keywords = ["insurance"]
for match in re.finditer(r'(?s)((?:[^\n][\n]?)+)', data):
   print(match.start(), match.end())
   start = match.start()
   end = match.end()
   step = 1
   if [word for word in keywords if word in data[start:end:step]]:
       print(data[start:end:step])

输出:

0 334
The widespread adoption of new technologies, including internet
services, cryptocurrencies and payment systems, could require
substantial expenditures to modify or adapt our existing products
and services as we grow and develop our internet banking and
mobile banking channel strategies in addition to remote 
connectivity solutions.

335 905

注意:我已经将代码部分从中文翻译中排除。

英文:

You have to figure out the paragraphs and than search the keyword. I used regex:

import re

data = """The widespread adoption of new technologies, including internet
services, cryptocurrencies and payment systems, could require
substantial expenditures to modify or adapt our existing products
and services as we grow and develop our internet banking and
mobile banking channel strategies in addition to remote 
connectivity solutions.

A significant natural disaster, such as a tornado, hurricane, 
earthquake, fire or flood, could have a material adverse impact on 
our ability to conduct business, and our insurance coverage may
be insufficient to compensate for losses that may occur. Acts of 
terrorism, war, civil unrest, or pandemics could cause disruptions
to our business or the economy as a whole. While we have
established and regularly test disaster recovery procedures, the 
occurrence of any such event could have a material adverse effect 
on our business, operations and financial condition."""

keywords=["cryptocurren","virtual curren","digital curren"]
# keywords = ["insurance"]
for match in re.finditer(r'(?s)((?:[^\n][\n]?)+)', data):
   print(match.start(), match.end())
   start = match.start()
   end = match.end()
   step = 1
   if [word for word in keywords if word in data[start:end:step]]:
       print(data[start:end:step])

Output:

0 334
The widespread adoption of new technologies, including internet
services, cryptocurrencies and payment systems, could require
substantial expenditures to modify or adapt our existing products
and services as we grow and develop our internet banking and
mobile banking channel strategies in addition to remote 
connectivity solutions.

335 905

huangapple
  • 本文由 发表于 2023年3月7日 06:04:51
  • 转载请务必保留本文链接:https://go.coder-hub.com/75656259.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定