2020年1月7日 01:56:04go评论146阅读模式

英文:

Combine index header row and column header row in Pandas

问题

I create a dataframe and export to an html table. However the headers are off as below

How can I combine the index name row, and the column name row?

I want the table header to look like this:

but it currently exports to html like this:

I create the dataframe as below (example):

data = [{'Name': 'A', 'status': 'ok', 'host': '1', 'time1': '2020-01-06 06:31:06', 'time2': '2020-02-06 21:10:00'}, {'Name': 'A', 'status': 'ok', 'host': '2', 'time1': '2020-01-06 06:31:06', 'time2': '-'}, {'Name': 'B', 'status': 'Alert', 'host': '1', 'time1': '2020-01-06 10:31:06', 'time2': '2020-02-06 21:10:00'}, {'Name': 'B', 'status': 'ok', 'host': '2', 'time1': '2020-01-06 10:31:06', 'time2': '2020-02-06 21:10:00'},{'Name': 'B', 'status': 'ok', 'host': '4', 'time1': '2020-01-06 10:31:06', 'time2': '2020-02-06 21:10:00'},{'Name': 'C', 'status': 'Alert', 'host': '2', 'time1': '2020-01-06 10:31:06', 'time2': '2020-02-06 21:10:00'},{'Name': 'C', 'status': 'ok', 'host': '3', 'time1': '2020-01-06 10:31:06', 'time2': '2020-02-06 21:10:00'},{'Name': 'C', 'status': 'ok', 'host': '4', 'time1': '-', 'time2': '-'}]

df = pandas.DataFrame(data)
df.set_index(['Name', 'status', 'host'], inplace=True)
html_body = df.to_html(bold_rows=False)

The index is set to have hierarchical rows, for easier reading in an html table:

print(df)

                           time1                time2
Name status host                                          
A    ok     1     2020-01-06 06:31:06  2020-02-06 21:10:00
        2     2020-01-06 06:31:06                    -
B    Alert  1     2020-01-06 10:31:06  2020-02-06 21:10:00
     ok     2     2020-01-06 10:31:06  2020-02-06 21:10:00
        4     2020-01-06 10:31:06  2020-02-06 21:10:00
C    Alert  2     2020-01-06 10:31:06  2020-02-06 21:10:00
     ok     3     2020-01-06 10:31:06  2020-02-06 21:10:00
        4                       -                    -

The only solution that I've got working is to set every column to index.
This doesn't seem practical tho, and leaves an empty row that must be manually removed:

英文:

I create a dataframe and export to an html table. However the headers are off as below

How can I combine the index name row, and the column name row?

I want the table header to look like this:

but it currently exports to html like this:

I create the dataframe as below (example):

data = [{&#39;Name&#39;: &#39;A&#39;, &#39;status&#39;: &#39;ok&#39;, &#39;host&#39;: &#39;1&#39;, &#39;time1&#39;: &#39;2020-01-06 06:31:06&#39;, &#39;time2&#39;: &#39;2020-02-06 21:10:00&#39;}, {&#39;Name&#39;: &#39;A&#39;, &#39;status&#39;: &#39;ok&#39;, &#39;host&#39;: &#39;2&#39;, &#39;time1&#39;: &#39;2020-01-06 06:31:06&#39;, &#39;time2&#39;: &#39;-&#39;}, {&#39;Name&#39;: &#39;B&#39;, &#39;status&#39;: &#39;Alert&#39;, &#39;host&#39;: &#39;1&#39;, &#39;time1&#39;: &#39;2020-01-06 10:31:06&#39;, &#39;time2&#39;: &#39;2020-02-06 21:10:00&#39;}, {&#39;Name&#39;: &#39;B&#39;, &#39;status&#39;: &#39;ok&#39;, &#39;host&#39;: &#39;2&#39;, &#39;time1&#39;: &#39;2020-01-06 10:31:06&#39;, &#39;time2&#39;: &#39;2020-02-06 21:10:00&#39;},{&#39;Name&#39;: &#39;B&#39;, &#39;status&#39;: &#39;ok&#39;, &#39;host&#39;: &#39;4&#39;, &#39;time1&#39;: &#39;2020-01-06 10:31:06&#39;, &#39;time2&#39;: &#39;2020-02-06 21:10:00&#39;},{&#39;Name&#39;: &#39;C&#39;, &#39;status&#39;: &#39;Alert&#39;, &#39;host&#39;: &#39;2&#39;, &#39;time1&#39;: &#39;2020-01-06 10:31:06&#39;, &#39;time2&#39;: &#39;2020-02-06 21:10:00&#39;},{&#39;Name&#39;: &#39;C&#39;, &#39;status&#39;: &#39;ok&#39;, &#39;host&#39;: &#39;3&#39;, &#39;time1&#39;: &#39;2020-01-06 10:31:06&#39;, &#39;time2&#39;: &#39;2020-02-06 21:10:00&#39;},{&#39;Name&#39;: &#39;C&#39;, &#39;status&#39;: &#39;ok&#39;, &#39;host&#39;: &#39;4&#39;, &#39;time1&#39;: &#39;-&#39;, &#39;time2&#39;: &#39;-&#39;}]

df = pandas.DataFrame(data)
df.set_index([&#39;Name&#39;, &#39;status&#39;, &#39;host&#39;], inplace=True)
html_body = df.to_html(bold_rows=False)

The index is set to have hierarchical rows, for easier reading in an html table:

print(df)

                               time1                time2
Name status host                                          
A    ok     1     2020-01-06 06:31:06  2020-02-06 21:10:00
            2     2020-01-06 06:31:06                    -
B    Alert  1     2020-01-06 10:31:06  2020-02-06 21:10:00
     ok     2     2020-01-06 10:31:06  2020-02-06 21:10:00
            4     2020-01-06 10:31:06  2020-02-06 21:10:00
C    Alert  2     2020-01-06 10:31:06  2020-02-06 21:10:00
     ok     3     2020-01-06 10:31:06  2020-02-06 21:10:00
            4                       -                    -

The only solution that I've got working is to set every column to index.
This doesn't seem practical tho, and leaves an empty row that must be manually removed:

答案1

得分: 4

### 设置

import pandas as pd
from IPython.display import HTML

l0 = ('Foo', 'Bar')
l1 = ('One', 'Two')
ix = pd.MultiIndex.from_product([l0, l1], names=('L0', 'L1'))
df = pd.DataFrame(1, ix, ['W', 'X', 'Y', 'Z'])

HTML(df.to_html())

[![enter image description here][1]][1]
____

### BeautifulSoup
从 `df.to_html(header=False)` 中获取HTML结果。提取表头中的空单元格并插入列名。

from bs4 import BeautifulSoup

html_doc = df.to_html(header=False)
soup = BeautifulSoup(html_doc, 'html.parser')

empty_cols = soup.find('thead').find_all(lambda tag: not tag.contents)

for tag, col in zip(empty_cols, df):
    tag.string = col

HTML(soup.decode_contents())

[![enter image description here][2]][2]


  [1]: https://i.stack.imgur.com/bUDFS.png
  [2]: https://i.stack.imgur.com/62eBa.png

英文:

Setup

import pandas as pd
from IPython.display import HTML

l0 = (&#39;Foo&#39;, &#39;Bar&#39;)
l1 = (&#39;One&#39;, &#39;Two&#39;)
ix = pd.MultiIndex.from_product([l0, l1], names=(&#39;L0&#39;, &#39;L1&#39;))
df = pd.DataFrame(1, ix, [*&#39;WXYZ&#39;])

HTML(df.to_html())

BeautifulSoup

Hack the HTML result from df.to_html(header=False). Pluck out the empty cells in the table head and drop in the column names.

from bs4 import BeautifulSoup

html_doc = df.to_html(header=False)
soup = BeautifulSoup(html_doc, &#39;html.parser&#39;)

empty_cols = soup.find(&#39;thead&#39;).find_all(lambda tag: not tag.contents)

for tag, col in zip(empty_cols, df):
    tag.string = col

HTML(soup.decode_contents())

答案2

得分: 1

以下是您要翻译的内容：

如果您想要使用数据框架Styler对表格、元素和内容执行许多精美的格式设置，那么您可能需要稍微修改piRSquared的答案，就像我做的那样。

转换前

style.to_html()添加了不间断空格，这导致标签.contents始终返回true，因此对表格没有任何更改。我修改了lambda以解决这个问题，这揭示了另一个问题。

lambda标签：（不是标签.contents）或'\xa0'在标签.contents中

单元格复制得很奇怪

Styler.to_html()缺少header kwarg - 我猜这是问题的根源。我采取了稍微不同的方法 - 将第二行标题移到第一行，然后销毁第二个标题行。

它似乎非常通用且可重用于任何多索引数据框架。

df_styler = summary_df.style

使用df_styler更改显示格式、颜色、对齐方式等。

raw_html = df_styler.to_html()
soup = BeautifulSoup(raw_html,'html.parser')
head = soup.find('thead')
trs = head.find_all('tr')
ths0 = trs[0].find_all(lambda tag: (not tag.contents) or '\xa0' in tag.contents)
ths1 = trs1.find_all(lambda tag: (tag.contents) or '\xa0' not in tag.contents)
for blank, filled in zip(ths0, ths1):
blank.replace_with(filled)
trs1.decompose()
final_html_str = soup.decode_contents()

成功 - 两个标题行合并为一个

非常感谢piRSquared提供Beautiful Soup的起点！

英文:

If you want to use a Dataframe Styler to perform a lot of wonderful formatting on your table, the elements, and the contents, then you might need a slight change to piRSquared's answer, as I did.

before transformation

style.to_html() added non-breaking spaces which made tag.contents always return true, and thus yielded no change to the table. I modified the lambda to account for this, which revealed another issue.

lambda tag: (not tag.contents) or &#39;\xa0&#39; in tag.contents

Cells were copied strangely

Styler.to_html() lacks the header kwarg - I am guessing this is the source of the issue. I took a slightly different approach - Move the second row headers into the first row, and then destroy the second header row.

It seems pretty generic and reusable for any multi-indexed dataframe.

df_styler = summary_df.style
# Use the df_styler to change display format, color, alignment, etc.
raw_html = df_styler.to_html()
soup = BeautifulSoup(raw_html,&#39;html.parser&#39;)
head = soup.find(&#39;thead&#39;)
trs = head.find_all(&#39;tr&#39;)
ths0 = trs[0].find_all(lambda tag: (not tag.contents) or &#39;\xa0&#39; in tag.contents)
ths1 = trs[1].find_all(lambda tag: (tag.contents) or &#39;\xa0&#39; not in tag.contents)
for blank, filled in zip(ths0, ths1):
    blank.replace_with(filled)
trs[1].decompose()
final_html_str = soup.decode_contents()

Success - two header rows condensed into one

Big Thanks to piRSquared for the starting point of Beautiful soup!

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将 Pandas 中的索引标题行和列标题行组合起来。

问题

答案1

Setup

BeautifulSoup

答案2

使用df_styler更改显示格式、颜色、对齐方式等。

Why does pandas fail to join on two columns of object dtype (one of them is converted from int to object)?

如何将包含对象的列拆分成多个列？

如何在Python pandas中将变量用作替换函数的值

月度汇率图

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论