将宽格式数据(分开的数据框)使用Python转换为长格式。

huangapple go评论151阅读模式
英文:

Convert wide format data (separate dfs) to long format using Python

问题

将宽格式数据从多个数据框转换为单个数据框的长格式数据,其中一些值为NaN。

最小示例:

df1 = pd.DataFrame({
                     "id": ["Mark", "Dave", "Ron" ], 
                     "c2_A": [2, 3, np.nan ], 
                     "c3_A": [1, np.nan, np.nan ] })

df2 = pd.DataFrame({
                     "id": ["Mark", "Dave", "Ron" ], 
                     "c2_B": [1, 0, np.nan ], 
                     "c3_B": [1, np.nan, 4 ] })

所需数据框:

dffinal = pd.DataFrame({
                     "id": ["Mark", "Mark","Dave", "Dave", "Ron" , "Ron"], 
                     "cValue": ["A", "B","A", "B", "A", "B"],
                     "c2Value": [2, 1, 3,0,np.nan,np.nan ], 
                     "c3Value": [1, 1, np.nan,np.nan,np.nan,4 ] }
英文:

Convert wide format data in separate dfs to long format in a single df in Python. Some values are NaNs.

Minimal example:

df1 = pd.DataFrame({
                     "id": ["Mark", "Dave", "Ron" ], 
                     "c2_A": [2, 3, np.nan ], 
                     "c3_A": [1, np.nan, np.nan ] })

df2 = pd.DataFrame({
                     "id": ["Mark", "Dave", "Ron" ], 
                     "c2_B": [1, 0, np.nan ], 
                     "c3_B": [1, np.nan, 4 ] })

Required df:

dffinal = pd.DataFrame({
                     "id": ["Mark", "Mark","Dave", "Dave", "Ron" , "Ron"], 
                        "cValue": ["A", "B","A", "B", "A", "B"],
                     "c2Value": [2, 1, 3,0,np.nan,np.nan ], 
                     "c3Value": [1, 1, np.nan,np.nan,np.nan,4 ] }

答案1

得分: 2

以下是翻译好的部分:

使用 split/stack

dffinal = (
    pd.concat([df1, df2])
        .set_index("id", append=True).pipe(
            lambda x: x.set_axis(x.columns.str.split("_", expand=True), axis=1))
        .stack(1, dropna=False).groupby(level=[1, 2],sort=False).first()
        .add_suffix("Value").reset_index().rename(columns={"level_1": "cValue"})
)

使用 wide_to_long

dffinal = (
    pd.concat([df1, df2], keys=["1", "2"])
        .reset_index(level=0).pipe(
            pd.wide_to_long, stubnames=["c2", "c3"],
            i=["level_0", "id"], j="cValue", sep="_", suffix=r"\w+")
        .groupby(level=[1, 2], sort=False).first().add_suffix("Value").reset_index()
)

输出:

print(dffinal)

     id cValue  c2Value  c3Value
0  Mark      A     2.00     1.00
1  Mark      B     1.00     1.00
2  Dave      A     3.00      NaN
3  Dave      B     0.00      NaN
4   Ron      A      NaN      NaN
5   Ron      B      NaN     4.00
英文:

You can try one of these two options:

With split/stack:

dffinal = (
    pd.concat([df1, df2])
        .set_index("id", append=True).pipe(
            lambda x: x.set_axis(x.columns.str.split("_", expand=True), axis=1))
        .stack(1, dropna=False).groupby(level=[1, 2],sort=False).first()
        .add_suffix("Value").reset_index().rename(columns={"level_1": "cValue"})
)

With wide_to_long:

dffinal = (
    pd.concat([df1, df2], keys=["1", "2"])
        .reset_index(level=0).pipe(
            pd.wide_to_long, stubnames=["c2", "c3"],
            i=["level_0", "id"], j="cValue", sep="_", suffix=r"\w+")
        .groupby(level=[1, 2], sort=False).first().add_suffix("Value").reset_index()
)

Output:

print(dffinal)

     id cValue  c2Value  c3Value
0  Mark      A     2.00     1.00
1  Mark      B     1.00     1.00
2  Dave      A     3.00      NaN
3  Dave      B     0.00      NaN
4   Ron      A      NaN      NaN
5   Ron      B      NaN     4.00

huangapple
  • 本文由 发表于 2023年6月6日 07:14:58
  • 转载请务必保留本文链接:https://go.coder-hub.com/76410537.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定