英文:
Frequency distribution in a list of strings based on common beginning substring
问题
以下是翻译好的部分:
"Is there a way to get the common substrings that form the beginning of a list of strings?
The length of Substring must be greater than 1.
For instance, if there is a list L:
[ "hello", "help", "helium", "habit", "habitat", "hazard"]
The output should be a frequency distribution should be:
he - 3
hel - 3
habit - 2
ha - 3
I can later remove the duplicates formed by the same strings by again applying the same logic to look at the common beginning substrings from the distribution, checking their counts, and if the counts are same, consider only the longest string.
i.e., We get 'he' and 'hel' formed by the same set of words, having the same counts. Hence, picking only "hel" since it is the longest string"
英文:
Is there a way to get the common substrings that form the beginning of a list of strings?
The length of Substring must be greater than 1.
For instance, if there is a list L:
[ "hello", "help", "helium", "habit", "habitat", "hazard"]
The output should be a frequency distribution should be:
he - 3
hel - 3
habit - 2
ha - 3
I can later remove the duplicates formed by the same strings by again applying the same logic to look at the common beginning substrings from the distribution, checking their counts, and if the counts are same, consider only the longest string.
i.e., We get 'he' and 'hel' formed by the same set of words, having the same counts. Hence, picking only "hel" since it is the longest string
答案1
得分: 1
你可以使用 collections.Counter
来统计子字符串的重复次数,只需自己创建子字符串。
以下是一行代码,通过使用 itertools.chain.from_iterable
来创建和消耗子字符串,将子字符串的生成器展平成一个供计数器消耗的长生成器。
from collections import Counter
from itertools import chain
input_list = ["hello", "help", "helium", "habit", "habitat", "hazard"]
counter = Counter(
chain.from_iterable(
(entry[:i] for i in range(2,len(entry))) for entry in input_list
)
)
counter_filtered = {x:y for x,y in counter.items() if y != 1} # 移除单个条目
print(counter_filtered)
{'he': 3, 'hel': 3, 'ha': 3, 'hab': 2, 'habi': 2}
英文:
you can use collections.Counter
to count repeatitions of substrings, and you just have to create substrings yourself.
a one liner would be to create an consume substrings by using itertools.chain.from_iterable
, which would flatten the generators of substrings into one long generator for the counter to consume as follows.
from collections import Counter
from itertools import chain
input_list = [ "hello", "help", "helium", "habit", "habitat", "hazard"]
counter = Counter(
chain.from_iterable(
(entry[:i] for i in range(2,len(entry))) for entry in input_list
)
)
counter_filtered = {x:y for x,y in counter.items() if y != 1} # remove single entries
print(counter_filtered)
{'he': 3, 'hel': 3, 'ha': 3, 'hab': 2, 'habi': 2}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论