英文:
Convert wide format data (separate dfs) to long format using Python
问题
将宽格式数据从多个数据框转换为单个数据框的长格式数据,其中一些值为NaN。
最小示例:
df1 = pd.DataFrame({
"id": ["Mark", "Dave", "Ron" ],
"c2_A": [2, 3, np.nan ],
"c3_A": [1, np.nan, np.nan ] })
df2 = pd.DataFrame({
"id": ["Mark", "Dave", "Ron" ],
"c2_B": [1, 0, np.nan ],
"c3_B": [1, np.nan, 4 ] })
所需数据框:
dffinal = pd.DataFrame({
"id": ["Mark", "Mark","Dave", "Dave", "Ron" , "Ron"],
"cValue": ["A", "B","A", "B", "A", "B"],
"c2Value": [2, 1, 3,0,np.nan,np.nan ],
"c3Value": [1, 1, np.nan,np.nan,np.nan,4 ] }
英文:
Convert wide format data in separate dfs to long format in a single df in Python. Some values are NaNs.
Minimal example:
df1 = pd.DataFrame({
"id": ["Mark", "Dave", "Ron" ],
"c2_A": [2, 3, np.nan ],
"c3_A": [1, np.nan, np.nan ] })
df2 = pd.DataFrame({
"id": ["Mark", "Dave", "Ron" ],
"c2_B": [1, 0, np.nan ],
"c3_B": [1, np.nan, 4 ] })
Required df:
dffinal = pd.DataFrame({
"id": ["Mark", "Mark","Dave", "Dave", "Ron" , "Ron"],
"cValue": ["A", "B","A", "B", "A", "B"],
"c2Value": [2, 1, 3,0,np.nan,np.nan ],
"c3Value": [1, 1, np.nan,np.nan,np.nan,4 ] }
答案1
得分: 2
以下是翻译好的部分:
dffinal = (
pd.concat([df1, df2])
.set_index("id", append=True).pipe(
lambda x: x.set_axis(x.columns.str.split("_", expand=True), axis=1))
.stack(1, dropna=False).groupby(level=[1, 2],sort=False).first()
.add_suffix("Value").reset_index().rename(columns={"level_1": "cValue"})
)
使用 wide_to_long
:
dffinal = (
pd.concat([df1, df2], keys=["1", "2"])
.reset_index(level=0).pipe(
pd.wide_to_long, stubnames=["c2", "c3"],
i=["level_0", "id"], j="cValue", sep="_", suffix=r"\w+")
.groupby(level=[1, 2], sort=False).first().add_suffix("Value").reset_index()
)
输出:
print(dffinal)
id cValue c2Value c3Value
0 Mark A 2.00 1.00
1 Mark B 1.00 1.00
2 Dave A 3.00 NaN
3 Dave B 0.00 NaN
4 Ron A NaN NaN
5 Ron B NaN 4.00
英文:
You can try one of these two options:
dffinal = (
pd.concat([df1, df2])
.set_index("id", append=True).pipe(
lambda x: x.set_axis(x.columns.str.split("_", expand=True), axis=1))
.stack(1, dropna=False).groupby(level=[1, 2],sort=False).first()
.add_suffix("Value").reset_index().rename(columns={"level_1": "cValue"})
)
With wide_to_long
:
dffinal = (
pd.concat([df1, df2], keys=["1", "2"])
.reset_index(level=0).pipe(
pd.wide_to_long, stubnames=["c2", "c3"],
i=["level_0", "id"], j="cValue", sep="_", suffix=r"\w+")
.groupby(level=[1, 2], sort=False).first().add_suffix("Value").reset_index()
)
Output:
print(dffinal)
id cValue c2Value c3Value
0 Mark A 2.00 1.00
1 Mark B 1.00 1.00
2 Dave A 3.00 NaN
3 Dave B 0.00 NaN
4 Ron A NaN NaN
5 Ron B NaN 4.00
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论