2023年7月10日 23:41:32go评论107阅读模式

英文:

XLSX Writer num_format function is not visually appearing in excel

问题

我有一个Jupyter笔记本，它从Athena中提取数据到一个数据帧，然后将该数据帧插入到Excel工作簿中，使用xlsxwriter

Excel中的日期需要自定义格式[$-en-US,1]dd-mm-yy，虽然已经应用了该格式，但在Excel中视觉上并不会立即显示，直到双击单元格为止。例如，它们显示为2023-07-10，但一旦您双击单元格，格式将应用并显示为10-07-23

SQL查询中的日期都是使用以下方式选择的：DateFormat(exampl_date, '%Y-%m-%d')，并且它们以Object数据类型出现在数据帧中

然后，我使用此函数创建Excel工作簿：

def format_excel(dataframe, worksheet):
    ## 创建新工作表
    dataframe.to_excel(writer, sheet_name=worksheet, startrow=3, header=False, index=False)
    workbook = writer.book
    worksheet = writer.sheets[worksheet]
    
    ## 后面要用到的各种单元格格式
    default_format = workbook.add_format({
        'font_name': 'Arial',
        'font_size': '10'})
    
    heading_format = workbook.add_format({
        'text_wrap': True,
        'bold': True,
        'valign': 'top',
        'font_color': '#4682B4',
        'font_name': 'Courier New',
        'font_size': '12'})
    
    date_format = workbook.add_format({
        'num_format': '[$-en-US,1]dd-mm-yy'})
    
    ## 将默认格式应用于所有单元格并调整列大小
    worksheet.set_column('A:B', 23, default_format)
    worksheet.set_column('C:C', 24, default_format)
    worksheet.set_column('D:D', 28, default_format)
    worksheet.set_column('E:M', 19, default_format)
    
    ## 调整前三行的大小
    worksheet.set_row(0, 30)
    worksheet.set_row(1,  1)
    worksheet.set_row(2, 25)
    
    ## 添加标题行
    run_day_formatted = run_day.strftime("%d/%m/%Y")
    worksheet.merge_range('A1:F1', 'sometext ' + run_day_formatted + ' and '  + run_day_formatted, heading_format)
    
    ## 添加日期格式
    worksheet.set_column('F:M', 19, date_format)
    
    ## 添加和格式化列名
    header_format = workbook.add_format({
        'bold': True,
        'text_wrap': True,
        'valign': 'top',
        'fg_color': '#C0C0C0'})
    for col_num, value in enumerate(dataframe.columns.values):
        worksheet.write(2, col_num, value, header_format)
# 打开XLSX Writer并创建工作簿
writer = pd.ExcelWriter(file_name(), engine='xlsxwriter')
# 使用上面的函数格式化Excel表
format_excel(df_results, 'some name')
# 关闭写入器并生成Excel
writer.close()
print('Excel created')

我认为解决方案是添加一个步骤，将数据帧中的日期字段转换为某种日期数据类型，但似乎没有什么起作用。

编辑:

date_columns = [日期列]
for column in date_columns:
    df_results[column] = pd.to_datetime(df_results[column], format='%Y-%m-%d')

这会导致Excel中的日期值变为2023-07-05 00:00:00，而不是预期的2023-07-05，xlsx writer的自定义格式完全被yyyy-mm-dd hh:mm:ss覆盖，而不是预期的[$-en-US,1]dd-mm-yy。

编辑 2:
这是Excel的预期外观：
1

注意，条形图中的值和单元格中的值是不同的，自定义格式字符串与xlsx writer中的格式匹配。

我尝试使此文件与旧文件完全一致，这就是为什么我试图重新创建它而不只是在字段中放入所需的日期。

英文:

I have a Jupyter notebook that is pulling data from Athena into a Dataframe and then inserting that dataframe into an Excel Workbook using xlsxwriter

The dates in excel require the custom formatting [$-en-US,1]dd-mm-yy which is being applied but it doesn't appear visually in excel until the cell has been double clicked. eg. they appear as 2023-07-10 but as soon as you double click the cell the formatting is applied and it shows 10-07-23

The dates from the SQL query are all selected using: DateFormat(exampl_date, '%Y-%m-%d') and they appear in the dataframe with data type Object

and then I use this function to create the excel workbook:

def format_excel(dataframe, worksheet):
## Creates new sheet
dataframe.to_excel(writer, sheet_name=worksheet, startrow=3, header=False, index=False)
workbook = writer.book
worksheet = writer.sheets[worksheet]
## Various cell formats to be used later
default_format = workbook.add_format({
&#39;font_name&#39;: &#39;Arial&#39;,
&#39;font_size&#39;: &#39;10&#39;})
heading_format = workbook.add_format({
&#39;text_wrap&#39;: True,
&#39;bold&#39;: True,
&#39;valign&#39;: &#39;top&#39;,
&#39;font_color&#39;: &#39;#4682B4&#39;,
&#39;font_name&#39;: &#39;Courier New&#39;,
&#39;font_size&#39;: &#39;12&#39;})
date_format = workbook.add_format({
&#39;num_format&#39;: &#39;[$-en-US,1]dd-mm-yy&#39;})
##Applies default format to all cells and sizes columns
worksheet.set_column(&#39;A:B&#39;, 23, default_format)
worksheet.set_column(&#39;C:C&#39;, 24, default_format)
worksheet.set_column(&#39;D:D&#39;, 28, default_format)
worksheet.set_column(&#39;E:M&#39;, 19, default_format)
##Sizes the first three rows
worksheet.set_row(0, 30)
worksheet.set_row(1,  1)
worksheet.set_row(2, 25)
## adds the heading row
run_day_formatted = run_day.strftime(&quot;%d/%m/%Y&quot;)
worksheet.merge_range(&#39;A1:F1&#39;, &#39;sometext &#39; + run_day_formatted + &#39; and &#39;  + run_day_formatted, heading_format)
##Adds the date formatting
worksheet.set_column(&#39;F:M&#39;, 19, date_format)
## adds and formats the column names
header_format = workbook.add_format({
&#39;bold&#39;: True,
&#39;text_wrap&#39;: True,
&#39;valign&#39;: &#39;top&#39;,
&#39;fg_color&#39;: &#39;#C0C0C0&#39;})
for col_num, value in enumerate(dataframe.columns.values):
worksheet.write(2, col_num, value, header_format)
# Open XLSX Writer and create workbook
writer = pd.ExcelWriter(file_name(), engine=&#39;xlsxwriter&#39;)
# Uses above function to format the excel sheet
format_excel(df_results, &#39;some name&#39;)
# Closes writer and generates excel
writer.close()
print(&#39;Excel created&#39;)

I think the solution is adding a step that converts the date fields in the dataframe to a date datatype of some description but nothing seems to be working

Edit:

date_columns = [The Date Columns]
for column in date_columns:
df_results[column] = pd.to_datetime(df_results[column], format=&#39;%Y-%m-%d&#39;)

This causes the date value in excel to be 2023-07-05 00:00:00 instead of the intended 2023-07-05 and the custom formatting from xlsx writer is completely overwitten by yyyy-mm-dd hh:mm:ss instead of the intended [$-en-US,1]dd-mm-yy

Edit 2:
This is what the excel is intended to look like:
1

Notice the value in the bar and the cell are different and the custom format string matches the one from xlsx writer.

I'm trying to get this file to reconcile exactly with an old one which is why i'm trying to recreate this instead of just putting the date i want in the field.

答案1

得分: 2

问题在于Pandas为日期时间设置了单元格格式，这会覆盖列格式。相反，您可以在pd.ExcelWriter()参数中设置默认日期格式：

import pandas as pd
from datetime import datetime, date
# 从一些日期时间数据创建Pandas数据框。
df = pd.DataFrame(
    {
        "日期和时间": [
            datetime(2023, 7, 11, 11, 30, 55),
            datetime(2023, 7, 12, 1, 20, 33),
            datetime(2023, 7, 13, 11, 10),
            datetime(2023, 7, 14, 16, 45, 35),
            datetime(2023, 7, 15, 12, 10, 15),
        ],
        "仅日期": [
            date(2023, 7, 11),
            date(2023, 7, 12),
            date(2023, 7, 13),
            date(2023, 7, 14),
            date(2023, 7, 15),
        ],
    }
)
# 使用XlsxWriter作为引擎创建Pandas Excel写入器。
# 同时设置默认的日期时间和日期格式。
writer = pd.ExcelWriter(
    "pandas_datetime.xlsx",
    engine="xlsxwriter",
    datetime_format="[$-en-US,1]dd-mm-yy",
    date_format="[$-en-US,1]dd-mm-yy",
)
# 将数据框转换为XlsxWriter Excel对象。
df.to_excel(writer, sheet_name="Sheet1")
# 获取XlsxWriter工作簿和工作表对象，以设置列宽以使日期更清晰。
workbook = writer.book
worksheet = writer.sheets["Sheet1"]
# 获取数据框的维度。
(max_row, max_col) = df.shape
# 设置列宽以使日期更清晰。
worksheet.set_column(1, max_col, 20)
# 关闭Pandas Excel写入器并输出Excel文件。
writer.close()

输出：

英文:

The issue here is that Pandas sets a cell format for datetimes and that overrides the column format. Instead you can set the default date format pd.ExcelWriter() parameters:

import pandas as pd
from datetime import datetime, date
# Create a Pandas dataframe from some datetime data.
df = pd.DataFrame(
    {
        &quot;Date and time&quot;: [
            datetime(2023, 7, 11, 11, 30, 55),
            datetime(2023, 7, 12, 1, 20, 33),
            datetime(2023, 7, 13, 11, 10),
            datetime(2023, 7, 14, 16, 45, 35),
            datetime(2023, 7, 15, 12, 10, 15),
        ],
        &quot;Dates only&quot;: [
            date(2023, 7, 11),
            date(2023, 7, 12),
            date(2023, 7, 13),
            date(2023, 7, 14),
            date(2023, 7, 15),
        ],
    }
)
# Create a Pandas Excel writer using XlsxWriter as the engine.
# Also set the default datetime and date formats.
writer = pd.ExcelWriter(
    &quot;pandas_datetime.xlsx&quot;,
    engine=&quot;xlsxwriter&quot;,
    datetime_format=&quot;[$-en-US,1]dd-mm-yy&quot;,
    date_format=&quot;[$-en-US,1]dd-mm-yy&quot;,
)
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name=&quot;Sheet1&quot;)
# Get the xlsxwriter workbook and worksheet objects in order
# to set the column widths, to make the dates clearer.
workbook = writer.book
worksheet = writer.sheets[&quot;Sheet1&quot;]
# Get the dimensions of the dataframe.
(max_row, max_col) = df.shape
# Set the column widths, to make the dates clearer.
worksheet.set_column(1, max_col, 20)
# Close the Pandas Excel writer and output the Excel file.
writer.close()

Output:

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

XLSX Writer的num_format函数在Excel中不可视。

问题

答案1

Scheduling a python script with Windows task scheduler

Python BeautifulSoup Span Scraping

Need a way to exit a loop(something like a interrupt) for PySimpleGUI

datetime struggle with a pandas dataframe

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。