对比从文件中迭代行片段与列表的结果。

huangapple go评论74阅读模式
英文:

Comparison of the result of iterating a line slice with a list from a file

问题

I've translated the code portion for you:

rus_words = open('russian.txt') # 打开一个文件以读取模式

text = 'АРВТРВТПЛЯЖАОВР' # 初始行

length_of_text = len(text) + 1 # 文本长度

for line in rus_words: # 遍历文件中的值
    for i in range(length_of_text): # 遍历行索引
        for j in range(1, 11): # 遍历可能的单词长度(假设单词不超过10个字符)
            maybe_word = text.lower()[i:i+j] # 形成可能的单词
            if maybe_word in line: # 将获得的单词与列表中的值进行比较
                print(maybe_word) # 输出匹配项

The code provided appears to open a file named 'russian.txt', read the initial line 'АРВТРВТПЛЯЖАОВР', and then iterates through various word possibilities to find matches in the file. If you are encountering an issue with endless printing of words with a length of no more than 3 characters, it may indeed be related to how the file is read or how the loop is structured. You can further investigate the issue using debugging techniques or by checking the content and format of the 'russian.txt' file you are reading.

英文:

I have a certain text from one line, for example: 'АРВТРВТПЛЯЖАОВР'. The word 'ПЛЯЖ' is hidden in it.
There is also a list of all Russian words in all declensions. About 1.5 million words. I want to set a loop that iterates through all possible options for slicing the initial line and compares it with the values in the list. If it matches, it prints a match.

To solve the problem, I wrote the following code.

rus_words = open('russian.txt') #opening a file in read mode

text = 'АРВТРВТПЛЯЖАОВР' #Initial line

length_of_text = len(text)+1 #Text length


for line in rus_words: #Iterating through the values in the file
    for i in range(length_of_text): #Iterating through the row indexes
        for j in range(1,11): #Iterating over the possible length of a word 
                              #(Here I assume that the word is no more than 10 characters)
            maybe_word = text.lower()[i:i+j] #Formation of a possible word
            if maybe_word in line: #Comparison of the received word with the values in the list
                print(maybe_word) #Output of matches
               

As a result: the endless process of printing words with a length of no more than 3 characters begins.

I assume that the problem is either in reading the file or in the loop. The first option is more likely, but what is the problem is not entirely clear
https://github.com/danakt/russian-words

答案1

得分: 1

There is a better way to do it, use the in operator.

The in keyword is used to check if a value is present in a sequence (list, range, string etc.). [1]

>>> 'ПЛЯЖ' in 'АРВТРВТПЛЯЖАОВР'
True

You can simply loop through the Russian words list, do some text processing like .strip() or .lower() depending on your needs.

For example:

rus_words = open('russian.txt', encoding='windows-1251') # in the Russian words GitHub repo it uses windows-1251 encoding
text = 'АРВТРВТПЛЯЖАОВР'
for line in rus_words:
    if line.strip() in text:
        print(line)
英文:

There is a better way to do it, use the in operator.

> The in keyword is used to check if a value is present in a sequence (list, range, string etc.). [1]

>>> 'ПЛЯЖ' in 'АРВТРВТПЛЯЖАОВР'
True

You can simply loop through the russian words list, do some text processing like .strip() or .lower() depends on your need.

For example:

rus_words = open('russian.txt', encoding='windows-1251') # in the russian words github repo it uses windows-1251 encoding
text = 'АРВТРВТПЛЯЖАОВР'
for line in rus_words:
    if line.strip() in text:
        print(line)

huangapple
  • 本文由 发表于 2023年5月17日 16:52:39
  • 转载请务必保留本文链接:https://go.coder-hub.com/76270228.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定