2023年6月1日 01:01:45go评论95阅读模式

英文:

Pandas' table_pivot Generating the wrong number of columns

问题

I have a table of patients as shown bellow:

id	exam	value
1	100	34.2
1	110	23
1	120	4.2
2	90	114
2	100	54.2
2	120	6.7

The id column is the identifier of a patient.
The exam column is the code representing a specific medical examination.
The value column is the numerical result of that examination.

What I want to do is pivot the table so that, each row corresponds to only one patient and the columns the exams. Like this:

id	90	100	110	120
1	NaN	34.2	23	4.2
2	114	54.2	NaN	6.7

I looked it up on how to do it, and the pandas library have the function pandas.DataFrame.pivot_table that does exactly what I needed.

So, after giving context to the problem here is the question. My original table have 884 unique exams codes, meaning the after the pivot it should result in 884 columns. But that is not the case, after I pivot the table, it shows the correct number of rows but only 557 columns.

I checked pandas' documentation and searched for someone with the same problem but I couldn't find it.

Anyone knows what might be the problem ?

->Original table
*The name of the columns were simplified in the example shown above

->Table after pivot

Ps: Sorry for the poor English, I'm a non native speaker.

英文:

I have a table of patients as shown bellow:

id	exam	value
1	100	34.2
1	110	23
1	120	4.2
2	90	114
2	100	54.2
2	120	6.7

The id column is the identifier of a patient.
The exam column is the code representing a specific medical examination.
The value column is the numerical result of that examination.

What I want to do is pivot the table so that, each row corresponds to only one patient and the columns the exams. Like this:

id	90	100	110	120
1	NaN	34.2	23	4.2
2	114	54.2	NaN	6.7

I looked it up on how to do it, and the pandas library have the function pandas.DataFrame.pivot_table that does exactly what I needed.

I checked pandas' documentation and searched for someone with the same problem but I couldn`t find it.

Anyone knows what might be the problem ?

-> Original table
*The name of the columns were simplified in the example shown above

->Pivot code
table = pd.pivot_table(results, values='valuenum', index='subject_id', columns=['itemid'] )

-> Table after pivot

Ps: Sorry for the poor English, I'm a non native speaker.

答案1

得分: 0

以下是您要翻译的内容：

可能发生的情况：

我只是猜测，但我知道空值可能会影响pd.pivot_table，导致值从索引或列中丢失。也许在您的pd.pivot_table调用中添加dropna=False可能会解决这个问题？

显示`dropna=True`可能是问题的示例：

# 完整数据
data_full = {
    "id": [1, 1, 1, 2, 2, 2],
    "exam": [100, 110, 120, 90, 100, 120],
    "value": [34.2, 23, 4.2, 114, 54.2, 6.7],
}
# 我要挑选出id: 2, exam: 90, value: 114的示例
data_null_id = {
    "id": [1, 1, 1, pd.NA, 2, 2],
    "exam": [100, 110, 120, 90, 100, 120],
    "value": [34.2, 23, 4.2, 114, 54.2, 6.7],
}
data_null_exam = {
    "id": [1, 1, 1, 2, 2, 2],
    "exam": [100, 110, 120, pd.NA, 100, 120],
    "value": [34.2, 23, 4.2, 114, 54.2, 6.7],
}
data_null_value = {
    "id": [1, 1, 1, 2, 2, 2],
    "exam": [100, 110, 120, 90, 100, 120],
    "value": [34.2, 23, 4.2, pd.NA, 54.2, 6.7],
}
datas = [data_full, data_null_id, data_null_id, data_null_exam, data_null_value]
# 对我上面制造的每种假设场景
for data in datas:
    # 建立该场景的DataFrame
    df = pd.DataFrame(data)
    # 像您在示例中所做的那样进行数据透视
    pivot = pd.pivot_table(df, values='value', index=['id'], columns=['exam'])
    display(pivot)

您会发现，即使value或id为空，考试仍然被省略。但是，如果您事先计算了exam的唯一计数，它将不会与数据透视表中的列数相符。

可能的修复

如果您将pivot声明更改为pd.pivot_table(df, values='value', index=['id'], columns=['exam'], dropna=False)，您将看到列不会被丢弃。

希望这有所帮助！很抱歉，我无法在没有数据的情况下准确了解发生了什么。

英文:

What might be going on:

Just guessing here, but I know that null values can mess with pd.pivot_table and cause values to fall out of the index or the columns. Maybe adding dropna=False to your pd.pivot_table call might fix the problem?

Examples showing how `dropna=True` might be the problem:

# Full data
data_full = {
    &quot;id&quot;: [1, 1, 1, 2, 2, 2],
    &quot;exam&quot;: [100, 110, 120, 90, 100, 120],
    &quot;value&quot;: [34.2, 23, 4.2, 114, 54.2, 6.7],
}
# I am going to pick on the example of id: 2, exam: 90, value: 114
data_null_id = {
    &quot;id&quot;: [1, 1, 1, pd.NA, 2, 2],
    &quot;exam&quot;: [100, 110, 120, 90, 100, 120],
    &quot;value&quot;: [34.2, 23, 4.2, 114, 54.2, 6.7],
}
data_null_exam = {
    &quot;id&quot;: [1, 1, 1, 2, 2, 2],
    &quot;exam&quot;: [100, 110, 120, pd.NA, 100, 120],
    &quot;value&quot;: [34.2, 23, 4.2, 114, 54.2, 6.7],
}
data_null_value = {
    &quot;id&quot;: [1, 1, 1, 2, 2, 2],
    &quot;exam&quot;: [100, 110, 120, 90, 100, 120],
    &quot;value&quot;: [34.2, 23, 4.2, pd.NA, 54.2, 6.7],
}
datas = [data_full, data_null_id, data_null_id, data_null_exam, data_null_value]
# For each of the contrived scenarios I made above
for data in datas:
    # Build a DataFrame of that scenario
    df = pd.DataFrame(data)
    # Pivot as you do in your example
    pivot = pd.pivot_table(df, values=&#39;value&#39;, index=[&#39;id&#39;], columns=[&#39;exam&#39;])
    display(pivot)

You will see that the exam is omitted even if the value or the id is null. However, if you calculated a unique count on exam beforehand, it would not agree with the number of columns in the pivot table.

Possible Fix

If you change the pivot declaration to pd.pivot_table(df, values='value', index=['id'], columns=['exam'], dropna=False) you will see that the column is not dropped.

I hope this helps! Sorry that I can't know exactly what's going on without the data itself.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Pandas的table_pivot生成了错误数量的列。

问题

答案1

可能发生的情况：

显示`dropna=True`可能是问题的示例：

可能的修复

What might be going on:

Examples showing how `dropna=True` might be the problem:

Possible Fix

当将函数用作默认参数时，为什么该函数总是被调用？

删除已安装包中的CPython文件

替换NumPy数组中的元素块

改变乌龟图形的颜色按键按下时。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论

问题

答案1

可能发生的情况：

显示dropna=True可能是问题的示例：

可能的修复

What might be going on:

Examples showing how dropna=True might be the problem:

Possible Fix

发表评论

显示`dropna=True`可能是问题的示例：

Examples showing how `dropna=True` might be the problem: