Pandas的table_pivot生成了错误数量的列。

huangapple go评论68阅读模式
英文:

Pandas' table_pivot Generating the wrong number of columns

问题

I have a table of patients as shown bellow:

id exam value
1 100 34.2
1 110 23
1 120 4.2
2 90 114
2 100 54.2
2 120 6.7

The id column is the identifier of a patient.
The exam column is the code representing a specific medical examination.
The value column is the numerical result of that examination.

What I want to do is pivot the table so that, each row corresponds to only one patient and the columns the exams. Like this:

id 90 100 110 120
1 NaN 34.2 23 4.2
2 114 54.2 NaN 6.7

I looked it up on how to do it, and the pandas library have the function pandas.DataFrame.pivot_table that does exactly what I needed.

So, after giving context to the problem here is the question. My original table have 884 unique exams codes, meaning the after the pivot it should result in 884 columns. But that is not the case, after I pivot the table, it shows the correct number of rows but only 557 columns.

I checked pandas' documentation and searched for someone with the same problem but I couldn't find it.

Anyone knows what might be the problem ?

->Original table
*The name of the columns were simplified in the example shown above

->Table after pivot

Ps: Sorry for the poor English, I'm a non native speaker.

英文:

I have a table of patients as shown bellow:

id exam value
1 100 34.2
1 110 23
1 120 4.2
2 90 114
2 100 54.2
2 120 6.7

The id column is the identifier of a patient.
The exam column is the code representing a specific medical examination.
The value column is the numerical result of that examination.

What I want to do is pivot the table so that, each row corresponds to only one patient and the columns the exams. Like this:

id 90 100 110 120
1 NaN 34.2 23 4.2
2 114 54.2 NaN 6.7

I looked it up on how to do it, and the pandas library have the function pandas.DataFrame.pivot_table that does exactly what I needed.

So, after giving context to the problem here is the question. My original table have 884 unique exams codes, meaning the after the pivot it should result in 884 columns. But that is not the case, after I pivot the table, it shows the correct number of rows but only 557 columns.

I checked pandas' documentation and searched for someone with the same problem but I couldn`t find it.

Anyone knows what might be the problem ?

-> Original table
*The name of the columns were simplified in the example shown above

->Pivot code
table = pd.pivot_table(results, values='valuenum', index='subject_id', columns=['itemid'] )

-> Table after pivot

Ps: Sorry for the poor English, I'm a non native speaker.

答案1

得分: 0

以下是您要翻译的内容:

可能发生的情况:

我只是猜测,但我知道空值可能会影响pd.pivot_table,导致值从索引或列中丢失。也许在您的pd.pivot_table调用中添加dropna=False可能会解决这个问题?

显示dropna=True可能是问题的示例:

# 完整数据
data_full = {
    "id": [1, 1, 1, 2, 2, 2],
    "exam": [100, 110, 120, 90, 100, 120],
    "value": [34.2, 23, 4.2, 114, 54.2, 6.7],
}

# 我要挑选出id: 2, exam: 90, value: 114的示例
data_null_id = {
    "id": [1, 1, 1, pd.NA, 2, 2],
    "exam": [100, 110, 120, 90, 100, 120],
    "value": [34.2, 23, 4.2, 114, 54.2, 6.7],
}

data_null_exam = {
    "id": [1, 1, 1, 2, 2, 2],
    "exam": [100, 110, 120, pd.NA, 100, 120],
    "value": [34.2, 23, 4.2, 114, 54.2, 6.7],
}

data_null_value = {
    "id": [1, 1, 1, 2, 2, 2],
    "exam": [100, 110, 120, 90, 100, 120],
    "value": [34.2, 23, 4.2, pd.NA, 54.2, 6.7],
}

datas = [data_full, data_null_id, data_null_id, data_null_exam, data_null_value]

# 对我上面制造的每种假设场景
for data in datas:
    # 建立该场景的DataFrame
    df = pd.DataFrame(data)
    # 像您在示例中所做的那样进行数据透视
    pivot = pd.pivot_table(df, values='value', index=['id'], columns=['exam'])

    display(pivot)

您会发现,即使valueid为空,考试仍然被省略。但是,如果您事先计算了exam的唯一计数,它将不会与数据透视表中的列数相符。

可能的修复

如果您将pivot声明更改为pd.pivot_table(df, values='value', index=['id'], columns=['exam'], dropna=False),您将看到列不会被丢弃。

希望这有所帮助!很抱歉,我无法在没有数据的情况下准确了解发生了什么。

英文:

What might be going on:

Just guessing here, but I know that null values can mess with pd.pivot_table and cause values to fall out of the index or the columns. Maybe adding dropna=False to your pd.pivot_table call might fix the problem?

Examples showing how dropna=True might be the problem:

# Full data
data_full = {
    "id": [1, 1, 1, 2, 2, 2],
    "exam": [100, 110, 120, 90, 100, 120],
    "value": [34.2, 23, 4.2, 114, 54.2, 6.7],
}

# I am going to pick on the example of id: 2, exam: 90, value: 114
data_null_id = {
    "id": [1, 1, 1, pd.NA, 2, 2],
    "exam": [100, 110, 120, 90, 100, 120],
    "value": [34.2, 23, 4.2, 114, 54.2, 6.7],
}

data_null_exam = {
    "id": [1, 1, 1, 2, 2, 2],
    "exam": [100, 110, 120, pd.NA, 100, 120],
    "value": [34.2, 23, 4.2, 114, 54.2, 6.7],
}

data_null_value = {
    "id": [1, 1, 1, 2, 2, 2],
    "exam": [100, 110, 120, 90, 100, 120],
    "value": [34.2, 23, 4.2, pd.NA, 54.2, 6.7],
}

datas = [data_full, data_null_id, data_null_id, data_null_exam, data_null_value]

# For each of the contrived scenarios I made above
for data in datas:
    # Build a DataFrame of that scenario
    df = pd.DataFrame(data)
    # Pivot as you do in your example
    pivot = pd.pivot_table(df, values='value', index=['id'], columns=['exam'])

    display(pivot)

You will see that the exam is omitted even if the value or the id is null. However, if you calculated a unique count on exam beforehand, it would not agree with the number of columns in the pivot table.

Possible Fix

If you change the pivot declaration to pd.pivot_table(df, values='value', index=['id'], columns=['exam'], dropna=False) you will see that the column is not dropped.

I hope this helps! Sorry that I can't know exactly what's going on without the data itself.

huangapple
  • 本文由 发表于 2023年6月1日 01:01:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/76375819.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定