Trouble with writing to a csv using utf-8 encoding 写入CSV文件时使用UTF-8编码遇到问题

huangapple go评论100阅读模式
英文:

Trouble with writing to a csv using utf-8 encoding

问题

我试图分析一些Facebook Messenger数据,但在UTF-8编码方面遇到了问题。

  1. 导入 os
  2. 导入 json
  3. 导入 datetime
  4. tqdm 导入 tqdm
  5. 导入 csv
  6. datetime 导入 datetime
  7. 目录 = "facebook-100071636101603/messages/inbox"
  8. 文件夹 = os.listdir(目录)
  9. 如果 ".DS_Store" 文件夹 中:
  10. 文件夹.remove(".DS_Store")
  11. 对于 文件夹 中的 每个文件夹:
  12. 打印(文件夹)
  13. 对于 文件名 os.listdir(os.path.join(目录, 文件夹)):
  14. 如果 文件名 "message" 开头:
  15. 数据 = json.load(open(os.path.join(目录, 文件夹, 文件名), "r"))
  16. 对于 消息 数据["messages"]:
  17. 尝试:
  18. 日期 = datetime.fromtimestamp(消息["timestamp_ms"] / 1000).strftime("%Y-%m-%d %H:%M:%S")
  19. 发件人 = 消息["sender_name"]
  20. 内容 = 消息["content"]
  21. 使用 'output.csv''w'encoding="utf-8" 打开 as csv_文件:
  22. 写入器 = csv.writer(csv_文件)
  23. 写入器.writerow([日期, 发件人, 内容])
  24. KeyError 以外:
  25. 继续

这个脚本可以运行,但输出的CSV文件中没有显示带重音符号的字符。我对这方面很陌生,所以尝试不多。我阅读了Python CSV文档并找到了这段说明:链接。但似乎不起作用。

编辑:
这是我得到的输出,但应该是 Jørn 而不是 Jørn,以及 quête,而不是 quête。

英文:

I'm trying to ananalyse some facebook messenger data and I'm having trouble with utf-8 encoding.

  1. import os
  2. import json
  3. import datetime
  4. from tqdm import tqdm
  5. import csv
  6. from datetime import datetime
  7. directory = "facebook-100071636101603/messages/inbox"
  8. folders = os.listdir(directory)
  9. if ".DS_Store" in folders:
  10. folders.remove(".DS_Store")
  11. for folder in tqdm(folders):
  12. print(folder)
  13. for filename in os.listdir(os.path.join(directory,folder)):
  14. if filename.startswith("message"):
  15. data = json.load(open(os.path.join(directory,folder,filename), "r"))
  16. for message in data["messages"]:
  17. try:
  18. date = datetime.fromtimestamp(message["timestamp_ms"] / 1000).strftime("%Y-%m-%d %H:%M:%S")
  19. sender = message["sender_name"]
  20. content = message["content"]
  21. with open('output.csv', 'w', encoding="utf-8") as csv_file:
  22. writer = csv.writer(csv_file)
  23. writer.writerow([date,sender,content])
  24. except KeyError:
  25. pass

This script works but the output csv doesn't show the accentuated characters.

I'm very knew to this so I haven't tried a lot. I've read the Python csv documentation and found this passage:
> Since open() is used to open a CSV file for reading, the file will by default be decoded into unicode using the system default encoding (see locale.getencoding()). To decode a file using a different encoding, use the encoding argument of open:
>
> import csv
> with open('some.csv', newline='', encoding='utf-8') as f:
> reader = csv.reader(f)
> for row in reader:
> print(row)

But this doesn't seems to work.

Edit :
This is the output I'm getting but it should be Jørn and not Jørn and quête, not quête.

答案1

得分: 0

Try adding encoding="utf-8" to this line:

json.load(open(os.path.join(directory, folder, filename), "r", encoding="utf-8"))

This will ensure that every file you import is in the utf-8 encoding format

EDIT:

You need to install ftfy using pip install ftfy. This package will fix your broken encoding.

Change sender and content to fix the encoding using ftfy by writing this:

  1. import ftfy
  2. # Your other code
  3. sender = message["sender_name"]
  4. content = message["content"]
  5. sender = ftfy.fix_text(sender)
  6. content = ftfy.fix_text(content)

You can use ftfy.fix_text(string) for any other broken encoding as well.

英文:

Try adding encoding="utf-8 to this line:

  1. json.load(open(os.path.join(directory,folder,filename), "r", encoding="utf-8"))

This will ensure that every file you import is in the utf-8 encoding format

EDIT:

You need to install ftfy using pip install ftfy. This package will fix your broken encoding.
Change sender and content to fix the encoding using ftfy by writing this:

  1. import ftfy
  2. # Your other code
  3. sender = message["sender_name"]
  4. content = message["content"]
  5. sender = ftfy.fix_text(sender)
  6. content = ftfy.fix_text(content)

You can use ftfy.fix_text(string) for any other broken encoding as well.

huangapple
  • 本文由 发表于 2023年6月15日 17:24:29
  • 转载请务必保留本文链接:https://go.coder-hub.com/76481033.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定