如何从文本文件中将单词拆分为单个字母python

huangapple go评论88阅读模式
英文:

How to separate words to single letters from text file python

问题

我不会翻译代码部分,但我可以帮你理解如何在文本文件中将单词分割成单个字母。你可以使用Python中的循环来完成这个任务。以下是代码的修改部分:

# 导入必要的库
import string

# 打开文本文件并读取内容
filein = open("small_text.txt", "r")
lines = filein.readlines()
smalltxt = "".join(lines)

# 初始化字母频率字典
letter_frequency = {letter: 0 for letter in string.ascii_lowercase}

# 遍历文本中的每个字符
for char in smalltxt:
    # 如果字符是小写字母,则增加相应字母的频率计数
    if char.islower():
        letter_frequency[char] += 1

# 打印字母频率
print(letter_frequency)

这段代码会遍历文本中的每个字符,如果字符是小写字母,则会增加相应字母的频率计数。最后,你将得到一个包含字母频率的字典。你可以根据需要使用这个字典来计算字母的频率。

请注意,上述代码假定文本中只包含小写字母。如果文本包含大写字母或其他字符,你可能需要进行适当的修改以满足你的需求。

英文:

How do I separate words from a text file into single letters?

I'm given a text where I have to calculate the frequency of the letters in a text. However, I can't seem to figure out how I separate the words into single letters so I can count the unique elements and from there determine their frequency.

I apologize for not having the text in a text file, but the following text I'm given:
> alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, and what is the use of a book,' thought alice without pictures or conversation?'
>
> so she was considering in her own mind (as well as she could, for the hot day made her feel very sleepy and stupid), whether the pleasure of making a daisy- chain would be worth the trouble of getting up and picking the daisies, when suddenly a white rabbit with pink eyes ran close by her.
>
> there was nothing so very remarkable in that; nor did alice think it so very much out of the way to hear the rabbit say to itself, `oh dear! oh dear! i shall be late!' (when she thought it over afterwards, it occurred to her that she ought to have wondered at this, but at the time it all seemed quite natural); but when the rabbit actually took a watch out of its waistcoat- pocket, and looked at it, and then hurried on, alice started to her feet, for it flashed across her mind that she had never before seen a rabbit with either a waistcoat-pocket, or a watch to take out of it, and burning with curiosity, she ran across the field after it, and fortunately was just in time to see it pop down a large rabbit-hole under the hedge.
>
> in another moment down went alice after it, never once considering how in the world she was to get out again.
>
> the rabbit-hole went straight on like a tunnel for some way, and then dipped suddenly down, so suddenly that alice had not a moment to think about stopping herself before she found herself falling down a very deep well.

I'm supposed to separate into getting 26 variables a-z, and then determine their frequency which is given as the following:
如何从文本文件中将单词拆分为单个字母python

I tried making the following code so far:

# Check where the current file you are working in, is saved. 
import os
os.getcwd()
#print(os.getcwd())

# 1. Change the current working directory to the place where you have saved the file.
os.chdir('C:/Users/Annik/Desktop/DTU/02633 Introduction to programming/Datafiles')
os.getcwd()
#print(os.chdir('C:/Users/Annik/Desktop/DTU/02633 Introduction to programming/Datafiles'))

# 2. Listing the content of current working directory type
os.listdir(os.getcwd())
#print(os.listdir(os.getcwd()))

#importing the file
filein = open("small_text.txt", "r") #opens the file for reading
lines = filein.readlines() #reads all lines into an array
smalltxt = "".join(lines) #Joins the lines into one big string.

import numpy as np

def letterFrequency(filename):
    #counts the frequency of letters in a text
    
    unique_elems, counts = np.unique(separate_words, return_counts=True)

    return unique_elems

I just don't know how to separate the letters in the text, so I can count the unique elements.

答案1

得分: 1

你可以使用 collections.Counter 直接从文本中获取频率。

然后只需选择你感兴趣的 26 个键,因为它也会包括空格和其他符号。

from collections import Counter
[...]
with open("small_text.txt", "r") as file:
    text = file.read()

keys = "abcdefghijklmnopqrstuvwxyz"

c = Counter(text.lower())
# 初始化所有键的出现次数为零。
occurrence = dict.fromkeys(keys, 0)
occurrence.update({k:v for k,v in c.items() if k in keys})
total = sum(occurrence.values())
frequency = {k:v/total for k,v in occurrence.items()}

[...]

处理大写的情况时,str.lower 也可能很有用。

英文:

You can use collections.Counter to get your frequencies directly from the text.

Then just select the 26 keys you are interested, because it will also include whitespaces and other signs.

from collections import Counter
[...]
with open("small_text.txt", "r") as file:
    text = file.read()

keys = "abcdefghijklmnopqrstuvwxyz"

c = Counter(text.lower())
# initialize occurrence with zeros to have all keys present.
occurrence = dict.fromkeys(keys, 0)
occurrence.update({k:v for k,v in c.items() if k in keys})
total = sum(occurrence.values())
frequency = {k:v/total for k,v in occurrence.items()}

[...]

To handle upper case str.lower might be useful as well.

答案2

得分: 0

"how I separate the words into single letters" 由于您想计算字符的数量,您可以在collections中实现Python计数器。

例如:

import collections
import pprint
...
...
file_input = input('File_Name: ')
with open(file_input, 'r') as info:
  count = collections.Counter(info.read().upper()) # 读取文件
  value = pprint.pformat(count)
print(value)
...
...

这将读取您的文件并输出字符的数量。

英文:

"how I separate the words into single letters" since you want to calculate the count of the characters you can implement python counter in collections.

For example

import collections
import pprint
...
...
file_input = input('File_Name: ')
with open(file_input, 'r') as info:
  count = collections.Counter(info.read().upper()) # reading file 
  value = pprint.pformat(count)
print(value)
...
...

This read your file will output the count of characters present.

huangapple
  • 本文由 发表于 2023年1月9日 16:44:16
  • 转载请务必保留本文链接:https://go.coder-hub.com/75054842.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定