英文:
Save paragraphs containing keywords into txt file
问题
最近我有一个正在进行的研究项目,需要我只保留每个txt文件中包含关键词的段落。有没有办法做到这一点?
关键词=["cryptocurren","virtual curren","digital curren"]
txt样本
新技术的广泛采用,包括互联网服务,加密货币和支付系统,可能需要大量支出来修改或调整我们现有的产品和服务,因为我们在发展互联网银行和移动银行渠道战略以及远程连接解决方案时需要成长和发展。
重大自然灾害,如龙卷风、飓风、地震、火灾或洪水,可能会对我们开展业务产生重大不利影响,而我们的保险覆盖可能不足以弥补可能发生的损失。 恐怖主义行为、战争、社会动荡或大流行病可能会对我们的业务或整个经济造成干扰。 尽管我们已建立并定期测试灾难恢复程序,但任何此类事件的发生都可能对我们的业务、运营和财务状况产生重大不利影响。
如上文所示,只有第一段包含关键词列表中的关键词。因此,我只想保留包含txt文件中第1段的段落。
在此先行致谢!
我希望能找到一种只保留包含txt文件关键词的段落的方法。
英文:
Recently I have an ongoing research project that requires me to only keep paragraphs containing keywords of each txt file. Does there have any way to do that?
keywords=["cryptocurren","virtual curren","digital curren"]
txt sample
> The widespread adoption of new technologies, including internet services, cryptocurrencies and payment systems, could require substantial expenditures to modify or adapt our existing products and services as we grow and develop our internet banking and mobile banking channel strategies in addition to remote connectivity solutions.
> A significant natural disaster, such as a tornado, hurricane, earthquake, fire or flood, could have a material adverse impact on our ability to conduct business, and our insurance coverage may be insufficient to compensate for losses that may occur. Acts of terrorism, war, civil unrest, or pandemics could cause disruptions to our business or the economy as a whole. While we have established and regularly test disaster recovery procedures, the occurrence of any such event could have a material adverse effect on our business, operations and financial condition.
As the text showed above, only the first paragraph contains the keyword of the keyword list. Thus, I only want the txt file contain the 1st paragraph.
Thank you in advance!
I hope to find a way to only keep paragraphs that contain the keywords of the txt file.
答案1
得分: 0
我们可以通过按换行符拆分输入文本,将示例输入转换为一个段落列表,其中每个项目都是一个段落。然后,我们可以迭代列表中的每个元素,检查关键词是否存在,如果不存在,就将其移除。
sample = """新技术的广泛采用,包括互联网服务、加密货币和支付系统,可能需要大量开支来修改或调整我们现有的产品和服务,因为我们在发展互联网银行和移动银行渠道战略以及远程连接解决方案的同时也在发展产品和服务。
重大自然灾害,如龙卷风、飓风、地震、火灾或洪水,可能对我们开展业务造成重大不利影响,我们的保险覆盖可能不足以弥补可能发生的损失。恐怖主义行为、战争、社会动荡或大流行可能会对我们的业务或整体经济造成干扰。尽管我们已经建立并定期测试灾难恢复程序,但任何此类事件的发生都可能对我们的业务、运营和财务状况产生重大不利影响。"""
keywords = ["加密货币", "虚拟货币", "数字货币"]
sample = sample.split("\n")
output = []
for p in sample:
for k in keywords:
if k in p:
output.append(p)
break
英文:
We can turn the sample input into a list with each item being a paragraph by splitting the input text by newlines. Then we can iterate through each element in the list and check if the keyword is in it, and if it isn't, then we remove it.
sample = """The widespread adoption of new technologies, including internet services, cryptocurrencies and payment systems, could require substantial expenditures to modify or adapt our existing products and services as we grow and develop our internet banking and mobile banking channel strategies in addition to remote connectivity solutions.
A significant natural disaster, such as a tornado, hurricane, earthquake, fire or flood, could have a material adverse impact on our ability to conduct business, and our insurance coverage may be insufficient to compensate for losses that may occur. Acts of terrorism, war, civil unrest, or pandemics could cause disruptions to our business or the economy as a whole. While we have established and regularly test disaster recovery procedures, the occurrence of any such event could have a material adverse effect on our business, operations and financial condition."""
keywords = ["cryptocurren", "virtual curren", "digital curren"]
sample = sample.split("\n")
output = []
for p in sample:
for k in keywords:
if k in p:
output.append(p)
break
答案2
得分: 0
你必须找出段落,然后搜索关键词。我使用了正则表达式:
import re
data = """The widespread adoption of new technologies, including internet
services, cryptocurrencies and payment systems, could require
substantial expenditures to modify or adapt our existing products
and services as we grow and develop our internet banking and
mobile banking channel strategies in addition to remote
connectivity solutions.
A significant natural disaster, such as a tornado, hurricane,
earthquake, fire or flood, could have a material adverse impact on
our ability to conduct business, and our insurance coverage may
be insufficient to compensate for losses that may occur. Acts of
terrorism, war, civil unrest, or pandemics could cause disruptions
to our business or the economy as a whole. While we have
established and regularly test disaster recovery procedures, the
occurrence of any such event could have a material adverse effect
on our business, operations and financial condition."""
keywords=["cryptocurren","virtual curren","digital curren"]
# keywords = ["insurance"]
for match in re.finditer(r'(?s)((?:[^\n][\n]?)+)', data):
print(match.start(), match.end())
start = match.start()
end = match.end()
step = 1
if [word for word in keywords if word in data[start:end:step]]:
print(data[start:end:step])
输出:
0 334
The widespread adoption of new technologies, including internet
services, cryptocurrencies and payment systems, could require
substantial expenditures to modify or adapt our existing products
and services as we grow and develop our internet banking and
mobile banking channel strategies in addition to remote
connectivity solutions.
335 905
注意:我已经将代码部分从中文翻译中排除。
英文:
You have to figure out the paragraphs and than search the keyword. I used regex:
import re
data = """The widespread adoption of new technologies, including internet
services, cryptocurrencies and payment systems, could require
substantial expenditures to modify or adapt our existing products
and services as we grow and develop our internet banking and
mobile banking channel strategies in addition to remote
connectivity solutions.
A significant natural disaster, such as a tornado, hurricane,
earthquake, fire or flood, could have a material adverse impact on
our ability to conduct business, and our insurance coverage may
be insufficient to compensate for losses that may occur. Acts of
terrorism, war, civil unrest, or pandemics could cause disruptions
to our business or the economy as a whole. While we have
established and regularly test disaster recovery procedures, the
occurrence of any such event could have a material adverse effect
on our business, operations and financial condition."""
keywords=["cryptocurren","virtual curren","digital curren"]
# keywords = ["insurance"]
for match in re.finditer(r'(?s)((?:[^\n][\n]?)+)', data):
print(match.start(), match.end())
start = match.start()
end = match.end()
step = 1
if [word for word in keywords if word in data[start:end:step]]:
print(data[start:end:step])
Output:
0 334
The widespread adoption of new technologies, including internet
services, cryptocurrencies and payment systems, could require
substantial expenditures to modify or adapt our existing products
and services as we grow and develop our internet banking and
mobile banking channel strategies in addition to remote
connectivity solutions.
335 905
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论