英文:
Formula and encoding issues when saving df to Excel
问题
Here are the translated parts of your text:
我正在开发一个脚本,用于收集一些YouTube数据。当然,该脚本创建了一个 pandas
数据框,然后将其导出到Excel。我遇到了两个主要问题,这两个问题似乎与彼此相关。
因此,Excel 365 允许用户使用 IMAGE()
公式(https://support.microsoft.com/en-au/office/image-function-7e112975-5e52-4f2a-b9da-1d913d51f5d5)将图像插入到单元格中。脚本提取YouTube视频的缩略图链接并将其保存到 defaultdict(list)
字典中。接下来,在同时进行的操作中,创建了 IMAGE()
公式字符串。在使用专用的 ExcelWriter
将 df
保存为 .xlsx
(建议在此处查看:https://stackoverflow.com/a/58062606/11485896)之后,不管我使用哪个名称和设置 - 英文或本地设置 - 我的公式始终后面都跟着 =@
。这很奇怪,因为 xlsxwriter
要求使用英文名称:https://xlsxwriter.readthedocs.io/working_with_formulas.html)。
我设法将这个 df
保存为 .csv
。公式现在正常工作(以本地语言编写),但我失去了所有隐式格式(Excel会自动将URL转换为超链接等),编码也崩溃了,一些视频ID后面跟着 -
被错误地视为公式(具有讽刺意味)。代码:
df.to_csv("data.csv", encoding="utf-8", sep=";")
我认为我至少可以处理编码问题:
df.to_csv("data.csv", encoding="windows-1250", sep=";")
...但我收到了这个错误:
# 具有讽刺意味的是,这是 “大声哭泣的脸” 表情符号 😭
UnicodeEncodeError:
'charmap' 编解码器无法编码位置 305 的字符 '\U0001f62d':字符映射到 <undefined>
因此,我的问题是:
- 如何使用
xlsxwriter
保存df
并保留并运行公式?(去掉短文中的@
) - 或者,如何将
df
保存为.csv
,以正确的编码方式处理以-
开头的视频ID并将其视为文本?
Please note that some characters, like emojis, might not display correctly in certain encodings, which could be the reason for the error you encountered.
英文:
I'm developing a script which gathers some YouTube data. The script of course creates a pandas
dataframe which is later exported to Excel. I'm experiencing two major issues which somehow seem to be related to each other.
So, Excel 365 allows users to insert an image to a cell using IMAGE()
formula (https://support.microsoft.com/en-au/office/image-function-7e112975-5e52-4f2a-b9da-1d913d51f5d5). Script extracts YouTube thumbnail link to a video and saves it to a defaultdict(list)
dictionary. Next and in parallel, the IMAGE()
formula string is created. After saving the df
to .xlsx
by a dedicated ExcelWriter
(as recommended here: https://stackoverflow.com/a/58062606/11485896) my formulas are always followed by =@
no matter which name and settings - English or local - I use. It's strange because xlsxwriter
requires English names: https://xlsxwriter.readthedocs.io/working_with_formulas.html).
Code (some parts are deleted for better readability):
if export_by_xlsxwriter:
# English formula name - recommended by xlsxwriter guide
channel_videos_data_dict["thumbnailHyperlink_en"].append(
fr'=IMAGE("{thumbnail_url}",,1)')
# local formula name
# note: in my local language formula arguments are splitted by ";" - not ","
# interestingly, using ";" makes workbook corrupted
channel_videos_data_dict["thumbnailHyperlink_locale"].append(
fr'=OBRAZ("{thumbnail_url}",,1)')
writer: pd.ExcelWriter = pd.ExcelWriter("data.xlsx", engine = "xlsxwriter")
df.to_excel(writer)
writer.save()
writer.close()
I managed to save this df
to .csv
. Formulas now work fine (written in local language!) but I lose all the implicit formatting (Excel automatically converts urls to hyperlinks etc.), encoding is crashed and some videos IDs which are followed by -
are mistakenly considered as formulas (ironically). Code:
df.to_csv("data.csv", encoding = "utf-8", sep = ";")
I thought I can at least deal with encoding issues:
df.to_csv("data.csv", encoding = "windows-1250", sep = ";")
...but I get this error:
# ironically again, this is "loudly crying face" emoji 😭
UnicodeEncodeError:
'charmap' codec can't encode character '\U0001f62d' in position 305: character maps to <undefined>
Thus, my questions are:
- How to save the
df
usingxlsxwriter
with formulas preserved and working? (get rid of@
in short) - Alternatively, how to save the
df
to.csv
with proper encoding and videos IDs starting with-
treated as text and text only?
答案1
得分: 5
The Implicit Intersection Operator @
in a formula usually means that an array formula is returning a scalar value (see the XlsxWriter docs) although it can sometimes indicate an unknown formula.
In your case the =IMAGE()
function is relatively new to Excel and as such it is classified as "future function" (see the XlsxWriter docs section on Formulas added in Excel 2010 and later).
As a result you will need to prefix it with _xlfn.
.
This will fix the @
issue but the formula may still not work. I get this the first time I try to load the file.
However, once I allow/trust the image/url it loads as expected.
You may get different results depending on your security setting and/or OS.
英文:
The Implicit Intersection Operator @
in a formula usually means that an array formula is returning a scalar value (see the XlsxWriter docs) although it can sometimes indicate an unknown formula.
In your case the =IMAGE()
function is relatively new to Excel and as such it is classified as "future function" (see the XlsxWriter docs section on Formulas added in Excel 2010 and later).
As a result you will need to prefix it with _xlfn.
.
import xlsxwriter
workbook = xlsxwriter.Workbook("image.xlsx")
worksheet = workbook.add_worksheet()
# Make the cell bigger for clarity.
worksheet.set_row(0, 80)
worksheet.set_column(0, 0, 14)
# Insert an image via a formula.
worksheet.write(0, 0, '=_xlfn.IMAGE("https://support.content.office.net/en-us/media/35aecc53-b3c1-4895-8a7d-554716941806.jpg")')
workbook.close()
This will fix the @
issue but the formula may still not work. I get this the first time I try to load the file:
However, once I allow/trust the image/url it loads as expected:
You may get different results depending on your security setting and/or OS.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论