英文:
Extracting dictionary values into .txt files
问题
from tqdm.auto import tqdm
text_data = []
file_count = 0
for sample in tqdm(new_dict):
    # Remove newline characters from each sample as we need to use them exclusively as separators
    sample_text = sample['text'].replace('\n', ' ')
    text_data.append(sample_text)
    if len(text_data) == 5_000:
        # Once we reach the 5K mark, save to a file
        with open(f'file_path/oscar_data/oscar_{file_count}.txt', 'w', encoding='utf-8') as fp:
            fp.write('\n'.join(text_data))
        text_data = []
        file_count += 1
英文:
I am looking to create .txt files from a dictionary, extracting text into new lines of each txt file - dictionary structure looks like:
{'id': 0,
 'text': 'Mtendere Village was inspired by the vision'}
I am using this code:
from tqdm.auto import tqdm  #loading bar
text_data = []
file_count = 0
for sample in tqdm(new_dict):
    # remove newline characters from each sample as we need to use exclusively as seperators
    sample = sample['text'].replace('\n', '\s')
    text_data.append(sample)
    if len(text_data) == 5_000:
        # once we hit the 5K mark, save to file
        with open('file_path\oscar_data\oscar_%s.txt' %file_count, 'w', encoding='utf-8') as fp:
            fp.write('\n'.join(text_data)) 
        text_data = []
        file_count += 1
However this gives me an error;
---> 12     sample = sample['text'].replace('\n', '\s') 
TypeError: 'int' object is not subscriptable
Although I understand what the error is telling me, I'm not sure how to correct it...
答案1
得分: 0
我认为您正在尝试将一个字典列表传递给循环,但实际上传递了一个字典。
from tqdm.auto import tqdm  #加载进度条
new_dict = [
	{
		'id': 0,
		'text': 'Mtendere Village was inspired by the vision'
	}
]
text_data = []
file_count = 0
for sample in tqdm(new_dict):
	# 从每个样本中移除换行符,因为我们需要将其专门用作分隔符
	sample = sample['text'].replace('\n', ' ')
	text_data.append(sample)
	if len(text_data) == 5000:
		# 一旦达到5K标记,将其保存到文件
		with open('file_path\\oscar_data\\oscar_%s.txt' %file_count, 'w', encoding='utf-8') as fp:
			fp.write('\n'.join(text_data)) 
		
		text_data = []
		file_count += 1
我已将new_dict更新为一个字典列表,并修复了这个问题。
英文:
I think you're trying to pass a list of dictionaries to the loop, but actually passed a dictionary.
from tqdm.auto import tqdm  #loading bar
new_dict = [
	{
		'id': 0,
		'text': 'Mtendere Village was inspired by the vision'
	}
]
text_data = []
file_count = 0
for sample in tqdm(new_dict):
	# remove newline characters from each sample as we need to use exclusively as seperators
	sample = sample['text'].replace('\n', '\s')
	text_data.append(sample)
	if len(text_data) == 5000:
		# Once we hit the 5K mark, save it to file
		with open('file_path\oscar_data\oscar_%s.txt' %file_count, 'w', encoding='utf-8') as fp:
			fp.write('\n'.join(text_data)) 
		
		text_data = []
		file_count += 1
I have updated new_dict to a list of dictionaries and it fixed the issue.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论