Formula and encoding issues when saving df to Excel.

huangapple go评论65阅读模式
英文:

Formula and encoding issues when saving df to Excel

问题

Here are the translated parts of your text:

我正在开发一个脚本,用于收集一些YouTube数据。当然,该脚本创建了一个 pandas 数据框,然后将其导出到Excel。我遇到了两个主要问题,这两个问题似乎与彼此相关。

因此,Excel 365 允许用户使用 IMAGE() 公式(https://support.microsoft.com/en-au/office/image-function-7e112975-5e52-4f2a-b9da-1d913d51f5d5)将图像插入到单元格中。脚本提取YouTube视频的缩略图链接并将其保存到 defaultdict(list) 字典中。接下来,在同时进行的操作中,创建了 IMAGE() 公式字符串。在使用专用的 ExcelWriterdf 保存为 .xlsx(建议在此处查看:https://stackoverflow.com/a/58062606/11485896)之后,不管我使用哪个名称和设置 - 英文或本地设置 - 我的公式始终后面都跟着 =@。这很奇怪,因为 xlsxwriter 要求使用英文名称:https://xlsxwriter.readthedocs.io/working_with_formulas.html)。

我设法将这个 df 保存为 .csv。公式现在正常工作(以本地语言编写),但我失去了所有隐式格式(Excel会自动将URL转换为超链接等),编码也崩溃了,一些视频ID后面跟着 - 被错误地视为公式(具有讽刺意味)。代码:

df.to_csv("data.csv", encoding="utf-8", sep=";")

我认为我至少可以处理编码问题:

df.to_csv("data.csv", encoding="windows-1250", sep=";")

...但我收到了这个错误:

# 具有讽刺意味的是,这是 “大声哭泣的脸” 表情符号 😭
UnicodeEncodeError: 
'charmap' 编解码器无法编码位置 305 的字符 '\U0001f62d'字符映射到 <undefined>

因此,我的问题是:

  1. 如何使用 xlsxwriter 保存 df 并保留并运行公式?(去掉短文中的 @
  2. 或者,如何将 df 保存为 .csv,以正确的编码方式处理以 - 开头的视频ID并将其视为文本?

Please note that some characters, like emojis, might not display correctly in certain encodings, which could be the reason for the error you encountered.

英文:

I'm developing a script which gathers some YouTube data. The script of course creates a pandas dataframe which is later exported to Excel. I'm experiencing two major issues which somehow seem to be related to each other.

So, Excel 365 allows users to insert an image to a cell using IMAGE() formula (https://support.microsoft.com/en-au/office/image-function-7e112975-5e52-4f2a-b9da-1d913d51f5d5). Script extracts YouTube thumbnail link to a video and saves it to a defaultdict(list) dictionary. Next and in parallel, the IMAGE() formula string is created. After saving the df to .xlsx by a dedicated ExcelWriter (as recommended here: https://stackoverflow.com/a/58062606/11485896) my formulas are always followed by =@ no matter which name and settings - English or local - I use. It's strange because xlsxwriter requires English names: https://xlsxwriter.readthedocs.io/working_with_formulas.html).

Code (some parts are deleted for better readability):

if export_by_xlsxwriter:
    # English formula name - recommended by xlsxwriter guide
    channel_videos_data_dict[&quot;thumbnailHyperlink_en&quot;].append(
        fr&#39;=IMAGE(&quot;{thumbnail_url}&quot;,,1)&#39;)
    
    # local formula name
    # note: in my local language formula arguments are splitted by &quot;;&quot; - not &quot;,&quot;
    # interestingly, using &quot;;&quot; makes workbook corrupted
    channel_videos_data_dict[&quot;thumbnailHyperlink_locale&quot;].append(
        fr&#39;=OBRAZ(&quot;{thumbnail_url}&quot;,,1)&#39;)

writer: pd.ExcelWriter = pd.ExcelWriter(&quot;data.xlsx&quot;, engine = &quot;xlsxwriter&quot;)
df.to_excel(writer)
writer.save()
writer.close()

I managed to save this df to .csv. Formulas now work fine (written in local language!) but I lose all the implicit formatting (Excel automatically converts urls to hyperlinks etc.), encoding is crashed and some videos IDs which are followed by - are mistakenly considered as formulas (ironically). Code:

df.to_csv(&quot;data.csv&quot;, encoding = &quot;utf-8&quot;, sep = &quot;;&quot;)

I thought I can at least deal with encoding issues:

df.to_csv(&quot;data.csv&quot;, encoding = &quot;windows-1250&quot;, sep = &quot;;&quot;)

...but I get this error:

# ironically again, this is &quot;loudly crying face&quot; emoji &#128557;
UnicodeEncodeError: 
&#39;charmap&#39; codec can&#39;t encode character &#39;\U0001f62d&#39; in position 305: character maps to &lt;undefined&gt;

Thus, my questions are:

  1. How to save the df using xlsxwriter with formulas preserved and working? (get rid of @ in short)
  2. Alternatively, how to save the df to .csv with proper encoding and videos IDs starting with - treated as text and text only?

答案1

得分: 5

The Implicit Intersection Operator @ in a formula usually means that an array formula is returning a scalar value (see the XlsxWriter docs) although it can sometimes indicate an unknown formula.

In your case the =IMAGE() function is relatively new to Excel and as such it is classified as "future function" (see the XlsxWriter docs section on Formulas added in Excel 2010 and later).

As a result you will need to prefix it with _xlfn..

This will fix the @ issue but the formula may still not work. I get this the first time I try to load the file.

However, once I allow/trust the image/url it loads as expected.

You may get different results depending on your security setting and/or OS.

英文:

The Implicit Intersection Operator @ in a formula usually means that an array formula is returning a scalar value (see the XlsxWriter docs) although it can sometimes indicate an unknown formula.

In your case the =IMAGE() function is relatively new to Excel and as such it is classified as "future function" (see the XlsxWriter docs section on Formulas added in Excel 2010 and later).

As a result you will need to prefix it with _xlfn..

import xlsxwriter

workbook = xlsxwriter.Workbook(&quot;image.xlsx&quot;)
worksheet = workbook.add_worksheet()


# Make the cell bigger for clarity.
worksheet.set_row(0, 80)
worksheet.set_column(0, 0, 14)

# Insert an image via a formula.
worksheet.write(0, 0, &#39;=_xlfn.IMAGE(&quot;https://support.content.office.net/en-us/media/35aecc53-b3c1-4895-8a7d-554716941806.jpg&quot;)&#39;)

workbook.close()

This will fix the @ issue but the formula may still not work. I get this the first time I try to load the file:

Formula and encoding issues when saving df to Excel.

However, once I allow/trust the image/url it loads as expected:

Formula and encoding issues when saving df to Excel.

You may get different results depending on your security setting and/or OS.

huangapple
  • 本文由 发表于 2023年5月26日 01:30:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/76334911.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定