英文:
Pandas' table_pivot Generating the wrong number of columns
问题
I have a table of patients as shown bellow:
id | exam | value |
---|---|---|
1 | 100 | 34.2 |
1 | 110 | 23 |
1 | 120 | 4.2 |
2 | 90 | 114 |
2 | 100 | 54.2 |
2 | 120 | 6.7 |
The id
column is the identifier of a patient.
The exam
column is the code representing a specific medical examination.
The value
column is the numerical result of that examination.
What I want to do is pivot the table so that, each row corresponds to only one patient and the columns the exams. Like this:
id | 90 | 100 | 110 | 120 |
---|---|---|---|---|
1 | NaN | 34.2 | 23 | 4.2 |
2 | 114 | 54.2 | NaN | 6.7 |
I looked it up on how to do it, and the pandas library have the function pandas.DataFrame.pivot_table
that does exactly what I needed.
So, after giving context to the problem here is the question. My original table have 884
unique exams codes, meaning the after the pivot it should result in 884
columns. But that is not the case, after I pivot the table, it shows the correct number of rows but only 557
columns.
I checked pandas' documentation and searched for someone with the same problem but I couldn't find it.
Anyone knows what might be the problem ?
->Original table
*The name of the columns were simplified in the example shown above
Ps: Sorry for the poor English, I'm a non native speaker.
英文:
I have a table of patients as shown bellow:
id | exam | value |
---|---|---|
1 | 100 | 34.2 |
1 | 110 | 23 |
1 | 120 | 4.2 |
2 | 90 | 114 |
2 | 100 | 54.2 |
2 | 120 | 6.7 |
The id
column is the identifier of a patient.
The exam
column is the code representing a specific medical examination.
The value
column is the numerical result of that examination.
What I want to do is pivot the table so that, each row corresponds to only one patient and the columns the exams. Like this:
id | 90 | 100 | 110 | 120 |
---|---|---|---|---|
1 | NaN | 34.2 | 23 | 4.2 |
2 | 114 | 54.2 | NaN | 6.7 |
I looked it up on how to do it, and the pandas library have the function pandas.DataFrame.pivot_table
that does exactly what I needed.
So, after giving context to the problem here is the question. My original table have 884
unique exams codes, meaning the after the pivot it should result in 884
columns. But that is not the case, after I pivot the table, it shows the correct number of rows but only 557
columns.
I checked pandas' documentation and searched for someone with the same problem but I couldn`t find it.
Anyone knows what might be the problem ?
-> Original table
*The name of the columns were simplified in the example shown above
->Pivot code
table = pd.pivot_table(results, values='valuenum', index='subject_id', columns=['itemid'] )
Ps: Sorry for the poor English, I'm a non native speaker.
答案1
得分: 0
以下是您要翻译的内容:
可能发生的情况:
我只是猜测,但我知道空值可能会影响pd.pivot_table
,导致值从索引或列中丢失。也许在您的pd.pivot_table
调用中添加dropna=False
可能会解决这个问题?
显示dropna=True
可能是问题的示例:
# 完整数据
data_full = {
"id": [1, 1, 1, 2, 2, 2],
"exam": [100, 110, 120, 90, 100, 120],
"value": [34.2, 23, 4.2, 114, 54.2, 6.7],
}
# 我要挑选出id: 2, exam: 90, value: 114的示例
data_null_id = {
"id": [1, 1, 1, pd.NA, 2, 2],
"exam": [100, 110, 120, 90, 100, 120],
"value": [34.2, 23, 4.2, 114, 54.2, 6.7],
}
data_null_exam = {
"id": [1, 1, 1, 2, 2, 2],
"exam": [100, 110, 120, pd.NA, 100, 120],
"value": [34.2, 23, 4.2, 114, 54.2, 6.7],
}
data_null_value = {
"id": [1, 1, 1, 2, 2, 2],
"exam": [100, 110, 120, 90, 100, 120],
"value": [34.2, 23, 4.2, pd.NA, 54.2, 6.7],
}
datas = [data_full, data_null_id, data_null_id, data_null_exam, data_null_value]
# 对我上面制造的每种假设场景
for data in datas:
# 建立该场景的DataFrame
df = pd.DataFrame(data)
# 像您在示例中所做的那样进行数据透视
pivot = pd.pivot_table(df, values='value', index=['id'], columns=['exam'])
display(pivot)
您会发现,即使value
或id
为空,考试仍然被省略。但是,如果您事先计算了exam
的唯一计数,它将不会与数据透视表中的列数相符。
可能的修复
如果您将pivot
声明更改为pd.pivot_table(df, values='value', index=['id'], columns=['exam'], dropna=False)
,您将看到列不会被丢弃。
希望这有所帮助!很抱歉,我无法在没有数据的情况下准确了解发生了什么。
英文:
What might be going on:
Just guessing here, but I know that null values can mess with pd.pivot_table
and cause values to fall out of the index or the columns. Maybe adding dropna=False
to your pd.pivot_table
call might fix the problem?
Examples showing how dropna=True
might be the problem:
# Full data
data_full = {
"id": [1, 1, 1, 2, 2, 2],
"exam": [100, 110, 120, 90, 100, 120],
"value": [34.2, 23, 4.2, 114, 54.2, 6.7],
}
# I am going to pick on the example of id: 2, exam: 90, value: 114
data_null_id = {
"id": [1, 1, 1, pd.NA, 2, 2],
"exam": [100, 110, 120, 90, 100, 120],
"value": [34.2, 23, 4.2, 114, 54.2, 6.7],
}
data_null_exam = {
"id": [1, 1, 1, 2, 2, 2],
"exam": [100, 110, 120, pd.NA, 100, 120],
"value": [34.2, 23, 4.2, 114, 54.2, 6.7],
}
data_null_value = {
"id": [1, 1, 1, 2, 2, 2],
"exam": [100, 110, 120, 90, 100, 120],
"value": [34.2, 23, 4.2, pd.NA, 54.2, 6.7],
}
datas = [data_full, data_null_id, data_null_id, data_null_exam, data_null_value]
# For each of the contrived scenarios I made above
for data in datas:
# Build a DataFrame of that scenario
df = pd.DataFrame(data)
# Pivot as you do in your example
pivot = pd.pivot_table(df, values='value', index=['id'], columns=['exam'])
display(pivot)
You will see that the exam is omitted even if the value
or the id
is null. However, if you calculated a unique count on exam
beforehand, it would not agree with the number of columns in the pivot table.
Possible Fix
If you change the pivot
declaration to pd.pivot_table(df, values='value', index=['id'], columns=['exam'], dropna=False)
you will see that the column is not dropped.
I hope this helps! Sorry that I can't know exactly what's going on without the data itself.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论