使用Python查找txt文件中最长的句子。

huangapple go评论58阅读模式
英文:

Find the biggest sentence in a txt file using Python

问题

我正在尝试查找文本文件中最长的句子。我使用句点(.)来定义句子的开头和结尾。文本文件没有特殊的标点符号(比如?!等)。

我的代码目前只返回文本文件的第一个字母。我不确定为什么。

def recherche(source):
    "查找最长的句子"
    fs = open(source, "r")
    while 1:
        txt = fs.readline()
        if txt == "":
            break
        else:
            grande_phrase = max(txt, key=len)
            print(grande_phrase)
    fs.close()

recherche("for92.txt")
英文:

I'm trying to find the biggest sentence in a text file. I'm using the dot (.) to define the beginning and end of sentences. The text file don't have special punctuation (like ?! etc).

My code currently only return the first letter of my text file. I'm not sure why.

def recherche(source):
    "find the biggest sentence"
    fs = open(source, "r")
    while 1:
        txt = fs.readline()
        if txt == "":
            break
        else:
            grande_phrase= max(txt, key=len)
            print (grande_phrase)
    fs.close()

recherche("for92.txt")

答案1

得分: 1

您当前的代码读取每一行,并找到该行的最大字符数。由于字符串只是字符的集合,您的表达式 max(txt, key=len) 给出了具有最大长度的 txt 中的字符。由于所有字符的长度都是 1,因此您只会得到行的第一个字符。

您希望创建一个包含所有句子的列表,然后在该列表上使用 max。似乎无法保证您的输入文件每行都是一个句子。由于您使用句点来定义句子的结束,您将不得不在拆分整个文件,以获取句子列表。请注意,这不是将任何文本拆分为句子的万无一失的策略,因为您可能会在其他地方出现,比如小数点或缩写。

def recherche(source):
    """查找最长的句子"""
    with open(source, "r") as fs:
        sentences = fs.read().split(".")

    grande_phrase = max(sentences, key=len)
    print(grande_phrase)

对于如下形式的输入文件:

It was the best of times. It was the worst of times. It was the age of wisdom. It was the age of foolishness. It was the epoch of belief. It was the epoch of incredulity. It was the season of light. It was the season of darkness. It was the spring of hope. It was the winter of despair.

您将得到以下输出:

It was the epoch of incredulity

在线尝试 <sup>注意:我替换了文件以使用 io.StringIO 在 tio.run 上运行</sup>

英文:

Your current code reads each line, and finds the max of that line. Since a string is just a collection of characters, your expression max(txt, key=len) gives you the character in txt that has the maximum length. Since all characters have a length of 1, you just get the first character of the line.

You want to create a list of all sentences, and then use max on that list. There seems to be no guarantee that your input file will have one sentence per line. Since you use a period to define where a sentence ends, you're going to have to split the entire file at . to get your list of sentences. Keep in mind that this is not a foolproof strategy to split any text into sentences, since you risk splitting at other occurrences of ., such as a decimal point or an abbreviation.

def recherche(source):
    &quot;find the biggest sentence&quot;
    with open(source, &quot;r&quot;) as fs:
        sentences = fs.read().split(&quot;.&quot;)

    grande_phrase = max(sentences, key=len)
    print(grande_phrase)

With an input file that looks like so:

It was the best of times. It was the worst of times. It was the age of wisdom. It was the age of foolishness. It was the epoch of belief. It was the epoch of incredulity. It was the season of light. It was the season of darkness. It was the spring of hope. It was the winter of despair.

we get the output:

It was the epoch of incredulity

Try it online <sup>Note: I replaced the file with an io.StringIO to work on tio.run</sup>

huangapple
  • 本文由 发表于 2023年2月14日 05:51:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/75441541.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定