提取字典值到 .txt 文件中

huangapple go评论94阅读模式
英文:

Extracting dictionary values into .txt files

问题

  1. from tqdm.auto import tqdm
  2. text_data = []
  3. file_count = 0
  4. for sample in tqdm(new_dict):
  5. # Remove newline characters from each sample as we need to use them exclusively as separators
  6. sample_text = sample['text'].replace('\n', ' ')
  7. text_data.append(sample_text)
  8. if len(text_data) == 5_000:
  9. # Once we reach the 5K mark, save to a file
  10. with open(f'file_path/oscar_data/oscar_{file_count}.txt', 'w', encoding='utf-8') as fp:
  11. fp.write('\n'.join(text_data))
  12. text_data = []
  13. file_count += 1
英文:

I am looking to create .txt files from a dictionary, extracting text into new lines of each txt file - dictionary structure looks like:

  1. {'id': 0,
  2. 'text': 'Mtendere Village was inspired by the vision'}

I am using this code:

  1. from tqdm.auto import tqdm #loading bar
  2. text_data = []
  3. file_count = 0
  4. for sample in tqdm(new_dict):
  5. # remove newline characters from each sample as we need to use exclusively as seperators
  6. sample = sample['text'].replace('\n', '\s')
  7. text_data.append(sample)
  8. if len(text_data) == 5_000:
  9. # once we hit the 5K mark, save to file
  10. with open('file_path\oscar_data\oscar_%s.txt' %file_count, 'w', encoding='utf-8') as fp:
  11. fp.write('\n'.join(text_data))
  12. text_data = []
  13. file_count += 1

However this gives me an error;

  1. ---> 12 sample = sample['text'].replace('\n', '\s')
  2. TypeError: 'int' object is not subscriptable

Although I understand what the error is telling me, I'm not sure how to correct it...

答案1

得分: 0

我认为您正在尝试将一个字典列表传递给循环,但实际上传递了一个字典。

  1. from tqdm.auto import tqdm #加载进度条
  2. new_dict = [
  3. {
  4. 'id': 0,
  5. 'text': 'Mtendere Village was inspired by the vision'
  6. }
  7. ]
  8. text_data = []
  9. file_count = 0
  10. for sample in tqdm(new_dict):
  11. # 从每个样本中移除换行符,因为我们需要将其专门用作分隔符
  12. sample = sample['text'].replace('\n', ' ')
  13. text_data.append(sample)
  14. if len(text_data) == 5000:
  15. # 一旦达到5K标记,将其保存到文件
  16. with open('file_path\\oscar_data\\oscar_%s.txt' %file_count, 'w', encoding='utf-8') as fp:
  17. fp.write('\n'.join(text_data))
  18. text_data = []
  19. file_count += 1

我已将new_dict更新为一个字典列表,并修复了这个问题。

英文:

I think you're trying to pass a list of dictionaries to the loop, but actually passed a dictionary.

  1. from tqdm.auto import tqdm #loading bar
  2. new_dict = [
  3. {
  4. 'id': 0,
  5. 'text': 'Mtendere Village was inspired by the vision'
  6. }
  7. ]
  8. text_data = []
  9. file_count = 0
  10. for sample in tqdm(new_dict):
  11. # remove newline characters from each sample as we need to use exclusively as seperators
  12. sample = sample['text'].replace('\n', '\s')
  13. text_data.append(sample)
  14. if len(text_data) == 5000:
  15. # Once we hit the 5K mark, save it to file
  16. with open('file_path\oscar_data\oscar_%s.txt' %file_count, 'w', encoding='utf-8') as fp:
  17. fp.write('\n'.join(text_data))
  18. text_data = []
  19. file_count += 1

I have updated new_dict to a list of dictionaries and it fixed the issue.

huangapple
  • 本文由 发表于 2023年7月6日 18:07:54
  • 转载请务必保留本文链接:https://go.coder-hub.com/76627714.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定