2023年7月6日 18:07:54go评论94阅读模式

英文:

Extracting dictionary values into .txt files

问题

from tqdm.auto import tqdm
text_data = []
file_count = 0
for sample in tqdm(new_dict):
    # Remove newline characters from each sample as we need to use them exclusively as separators
    sample_text = sample['text'].replace('\n', ' ')
    text_data.append(sample_text)
    if len(text_data) == 5_000:
        # Once we reach the 5K mark, save to a file
        with open(f'file_path/oscar_data/oscar_{file_count}.txt', 'w', encoding='utf-8') as fp:
            fp.write('\n'.join(text_data))
        text_data = []
        file_count += 1

英文:

I am looking to create .txt files from a dictionary, extracting text into new lines of each txt file - dictionary structure looks like:

{&#39;id&#39;: 0,
 &#39;text&#39;: &#39;Mtendere Village was inspired by the vision&#39;}

I am using this code:

from tqdm.auto import tqdm  #loading bar
text_data = []
file_count = 0
for sample in tqdm(new_dict):
    # remove newline characters from each sample as we need to use exclusively as seperators
    sample = sample[&#39;text&#39;].replace(&#39;\n&#39;, &#39;\s&#39;)
    text_data.append(sample)
    if len(text_data) == 5_000:
        # once we hit the 5K mark, save to file
        with open(&#39;file_path\oscar_data\oscar_%s.txt&#39; %file_count, &#39;w&#39;, encoding=&#39;utf-8&#39;) as fp:
            fp.write(&#39;\n&#39;.join(text_data)) 
        text_data = []
        file_count += 1

However this gives me an error;

---&gt; 12     sample = sample[&#39;text&#39;].replace(&#39;\n&#39;, &#39;\s&#39;) 
TypeError: &#39;int&#39; object is not subscriptable

Although I understand what the error is telling me, I'm not sure how to correct it...

答案1

得分: 0

我认为您正在尝试将一个字典列表传递给循环，但实际上传递了一个字典。

from tqdm.auto import tqdm  #加载进度条
new_dict = [
	{
		'id': 0,
		'text': 'Mtendere Village was inspired by the vision'
	}
]
text_data = []
file_count = 0
for sample in tqdm(new_dict):
	# 从每个样本中移除换行符，因为我们需要将其专门用作分隔符
	sample = sample['text'].replace('\n', ' ')
	text_data.append(sample)
	if len(text_data) == 5000:
		# 一旦达到5K标记，将其保存到文件
		with open('file_path\\oscar_data\\oscar_%s.txt' %file_count, 'w', encoding='utf-8') as fp:
			fp.write('\n'.join(text_data)) 
		
		text_data = []
		file_count += 1

我已将new_dict更新为一个字典列表，并修复了这个问题。

英文:

I think you're trying to pass a list of dictionaries to the loop, but actually passed a dictionary.

from tqdm.auto import tqdm  #loading bar
new_dict = [
	{
		&#39;id&#39;: 0,
		&#39;text&#39;: &#39;Mtendere Village was inspired by the vision&#39;
	}
]
text_data = []
file_count = 0
for sample in tqdm(new_dict):
	# remove newline characters from each sample as we need to use exclusively as seperators
	sample = sample[&#39;text&#39;].replace(&#39;\n&#39;, &#39;\s&#39;)
	text_data.append(sample)
	if len(text_data) == 5000:
		# Once we hit the 5K mark, save it to file
		with open(&#39;file_path\oscar_data\oscar_%s.txt&#39; %file_count, &#39;w&#39;, encoding=&#39;utf-8&#39;) as fp:
			fp.write(&#39;\n&#39;.join(text_data)) 
		
		text_data = []
		file_count += 1

I have updated new_dict to a list of dictionaries and it fixed the issue.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

提取字典值到 .txt 文件中

问题

答案1

如何迭代列表中的函数，以便将不同的项分开？

在每个匹配索引的实例上向DataFrame添加预定义的值。

在VSCode中运行无需互联网连接的.py脚本。

Repeat rows in DataFrame with respect to column 重复DataFrame中的行，以列为基准。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。