英文:
Comparison of the result of iterating a line slice with a list from a file
问题
I've translated the code portion for you:
rus_words = open('russian.txt') # 打开一个文件以读取模式
text = 'АРВТРВТПЛЯЖАОВР' # 初始行
length_of_text = len(text) + 1 # 文本长度
for line in rus_words: # 遍历文件中的值
for i in range(length_of_text): # 遍历行索引
for j in range(1, 11): # 遍历可能的单词长度(假设单词不超过10个字符)
maybe_word = text.lower()[i:i+j] # 形成可能的单词
if maybe_word in line: # 将获得的单词与列表中的值进行比较
print(maybe_word) # 输出匹配项
The code provided appears to open a file named 'russian.txt', read the initial line 'АРВТРВТПЛЯЖАОВР', and then iterates through various word possibilities to find matches in the file. If you are encountering an issue with endless printing of words with a length of no more than 3 characters, it may indeed be related to how the file is read or how the loop is structured. You can further investigate the issue using debugging techniques or by checking the content and format of the 'russian.txt' file you are reading.
英文:
I have a certain text from one line, for example: 'АРВТРВТПЛЯЖАОВР'
. The word 'ПЛЯЖ'
is hidden in it.
There is also a list of all Russian words in all declensions. About 1.5 million words. I want to set a loop that iterates through all possible options for slicing the initial line and compares it with the values in the list. If it matches, it prints a match.
To solve the problem, I wrote the following code.
rus_words = open('russian.txt') #opening a file in read mode
text = 'АРВТРВТПЛЯЖАОВР' #Initial line
length_of_text = len(text)+1 #Text length
for line in rus_words: #Iterating through the values in the file
for i in range(length_of_text): #Iterating through the row indexes
for j in range(1,11): #Iterating over the possible length of a word
#(Here I assume that the word is no more than 10 characters)
maybe_word = text.lower()[i:i+j] #Formation of a possible word
if maybe_word in line: #Comparison of the received word with the values in the list
print(maybe_word) #Output of matches
As a result: the endless process of printing words with a length of no more than 3 characters begins.
I assume that the problem is either in reading the file or in the loop. The first option is more likely, but what is the problem is not entirely clear
https://github.com/danakt/russian-words
答案1
得分: 1
There is a better way to do it, use the in
operator.
The in keyword is used to check if a value is present in a sequence (list, range, string etc.). [1]
>>> 'ПЛЯЖ' in 'АРВТРВТПЛЯЖАОВР'
True
You can simply loop through the Russian words list, do some text processing like .strip()
or .lower()
depending on your needs.
For example:
rus_words = open('russian.txt', encoding='windows-1251') # in the Russian words GitHub repo it uses windows-1251 encoding
text = 'АРВТРВТПЛЯЖАОВР'
for line in rus_words:
if line.strip() in text:
print(line)
英文:
There is a better way to do it, use the in
operator.
> The in keyword is used to check if a value is present in a sequence (list, range, string etc.). [1]
>>> 'ПЛЯЖ' in 'АРВТРВТПЛЯЖАОВР'
True
You can simply loop through the russian words list, do some text processing like .strip()
or .lower()
depends on your need.
For example:
rus_words = open('russian.txt', encoding='windows-1251') # in the russian words github repo it uses windows-1251 encoding
text = 'АРВТРВТПЛЯЖАОВР'
for line in rus_words:
if line.strip() in text:
print(line)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论