如何将多个数据框中的数值填充到新的空数据框中

huangapple go评论70阅读模式
英文:

How to fill values from multiple dataframe into new empty dataframe

问题

你可以使用Pandas库来合并这两个数据帧并填充新的数据帧。首先,你需要确保你的数据帧(DataFrame)中的列名与你的预期输出一致,然后使用merge函数合并两个数据帧,最后使用fillna函数来填充缺失的值。

下面是你可以使用的代码:

import pandas as pd

# 创建两个数据帧
data1 = {'Player A': [1, 1, 1, 1, 1, 1],
         'Time': [1, 2, 3, 4, 5, 6],
         'a': [5.83, 5.64, 5.60, 5.55, 5.52, 5.48],
         'b': [10.21, 10.16, 10.11, 10.08, 10.02, 9.97],
         'c': [3.12, 3.15, 3.19, 3.22, 3.27, 3.33],
         'd': [2.33, 2.37, 2.42, 2.46, 2.50, 2.55]}
data2 = {'Player B': [1, 1, 1, 1, 1, 1],
         'Time': [1, 2, 5, 6, 7, 9],
         'e': [7.77, 7.79, 7.83, 7.96, 9.22, 9.41],
         'f': [9.63, 9.66, 9.72, 9.77, 9.83, 9.97],
         'g': [5.61, 5.66, 5.72, 5.77, 6.23, 6.54],
         'h': [4.33, 4.37, 4.42, 4.56, 4.67, 5.55]}

df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)

# 合并两个数据帧
result = df1.merge(df2, on='Time', how='outer')

# 重新排序列
result = result[['Player A', 'Player B', 'Time', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']]

# 填充缺失值
result = result.fillna('nan')

print(result)

这将创建你所期望的输出。请注意,你可以根据实际的数据来替换data1data2的值。

英文:

I have 2 dataframe with different value.

#1 Dataframe

Player A Time a b c d
1 1 5.83 10.21 3.12 2.33
1 2 5.64 10.16 3.15 2.37
1 3 5.60 10.11 3.19 2.42
1 4 5.55 10.08 3.22 2.46
1 5 5.52 10.02 3.27 2.50
1 6 5.48 9.97 3.33 2.55

#2 Dataframe

Player B Time e f g h
1 1 7.77 9.63 5.61 4.33
1 2 7.79 9.66 5.66 4.37
1 5 7.83 9.72 5.72 4.42
1 6 7.96 9.77 5.77 4.56
1 7 9.22 9.83 6.23 4.67
1 9 9.41 9.97 6.54 5.55

I am trying to create a new dataframe and filling the new dataframe with the values from the two dataframe above.

This is my expected output:

Index Player A Player B Time a b c d e f g h
0 1 1 1 5.83 10.21 3.12 2.33 7.77 9.63 5.61 4.33
1 1 1 2 5.64 10.16 3.15 2.37 7.79 9.66 5.66 4.37
2 1 nan 3 5.60 10.11 3.19 2.42 nan nan nan nan
3 1 nan 4 5.55 10.08 3.22 2.46 nan nan nan nan
4 1 1 5 5.52 10.02 3.27 2.50 7.83 9.72 5.72 4.42
5 1 1 6 5.48 9.97 3.33 2.55 7.96 9.77 5.77 4.56
6 nan 1 7 nan nan nan nan 7.96 9.77 5.77 4.56
7 nan nan 8 nan nan nan nan nan nan nan nan
8 nan 1 9 nan nan nan nan 9.41 9.97 6.54 5.55

I had create a dataframe:

df = pd.DataFrame(columns=['Player A', 'Player B', 'Time','C','D','E','F','G','H'], index=range(10))

My current output is :

Index Player A Player B Time a b c d e f g h
0 nan nan nan nan nan nan nan nan nan nan nan
1 nan nan nan nan nan nan nan nan nan nan nan
2 nan nan nan nan nan nan nan nan nan nan nan
3 nan nan nan nan nan nan nan nan nan nan nan
4 nan nan nan nan nan nan nan nan nan nan nan
5 nan nan nan nan nan nan nan nan nan nan nan
6 nan nan nan nan nan nan nan nan nan nan nan
7 nan nan nan nan nan nan nan nan nan nan nan
8 nan nan nan nan nan nan nan nan nan nan nan

I"m stuck after this step. How do I solve this problem for the further step?

Note: I'm using python 3.9

答案1

得分: 1

以下是翻译的代码部分:

One approach could be as follows:

  • 首先,使用 df.merge 来合并这两个数据帧。将参数 on 设置为列 Timehow 设置为 outer
  • 接下来,我们使用 df.set_index 将列 Time 设置为索引,并将 range(1,10) 传递给 df.reindex 来添加任何缺失的 Time 值(在这种情况下是 8)。
  • 最后,我们再次重置索引 (df.reset_index),并使用 df.loc 来获取所需顺序的列。
res = (df
       .merge(df2, on='Time', how='outer')
       .set_index('Time').reindex(range(1,10))
       .reset_index(drop=False)
       .loc[:,['Player A','Player B', 'Time'] + ['a','b','c','d','e','f','g','h']])

res

   Player A  Player B  Time     a      b     c     d     e     f     g     h
0       1.0       1.0     1  5.83  10.21  3.12  2.33  7.77  9.63  5.61  4.33
1       1.0       1.0     2  5.64  10.16  3.15  2.37  7.79  9.66  5.66  4.37
2       1.0       NaN     3  5.60  10.11  3.19  2.42   NaN   NaN   NaN   NaN
3       1.0       NaN     4  5.55  10.08  3.22  2.46   NaN   NaN   NaN   NaN
4       1.0       1.0     5  5.52  10.02  3.27  2.50  7.83  9.72  5.72  4.42
5       1.0       1.0     6  5.48   9.97  3.33  2.55  7.96  9.77  5.77  4.56
6       NaN       1.0     7   NaN    NaN   NaN   NaN  9.22  9.83  6.23  4.67
7       NaN       NaN     8   NaN    NaN   NaN   NaN   NaN   NaN   NaN   NaN
8       NaN       1.0     9   NaN    NaN   NaN   NaN  9.41  9.97  6.54  5.55

Data used

import pandas as pd

data = {'Player A': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1}, 
        'Time': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6}, 
        'a': {0: 5.83, 1: 5.64, 2: 5.6, 3: 5.55, 4: 5.52, 5: 5.48}, 
        'b': {0: 10.21, 1: 10.16, 2: 10.11, 3: 10.08, 4: 10.02, 5: 9.97}, 
        'c': {0: 3.12, 1: 3.15, 2: 3.19, 3: 3.22, 4: 3.27, 5: 3.33}, 
        'd': {0: 2.33, 1: 2.37, 2: 2.42, 3: 2.46, 4: 2.5, 5: 2.55}}

df = pd.DataFrame(data)

data2 = {'Player B': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1}, 
         'Time': {0: 1, 1: 2, 2: 5, 3: 6, 4: 7, 5: 9}, 
         'e': {0: 7.77, 1: 7.79, 2: 7.83, 3: 7.96, 4: 9.22, 5: 9.41}, 
         'f': {0: 9.63, 1: 9.66, 2: 9.72, 3: 9.77, 4: 9.83, 5: 9.97}, 
         'g': {0: 5.61, 1: 5.66, 2: 5.72, 3: 5.77, 4: 6.23, 5: 6.54}, 
         'h': {0: 4.33, 1: 4.37, 2: 4.42, 3: 4.56, 4: 4.67, 5: 5.55}}

df2 = pd.DataFrame(data2)

注意:顺便提一下,我相信你期望的输出中在 Index 6 的行有错误的值。我指的是这里第二行的值:

res.loc[5:6, ['e','f','g','h']]

      e     f     g     h
5  7.96  9.77  5.77  4.56
6  7.96  9.77  5.77  4.56

这应该是:

res.loc[5:6, ['e','f','g','h']]

      e     f     g     h

<details>
<summary>英文:</summary>

One approach could be as follows:
* First, use [`df.merge`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html) to combine the two DataFrames. Parameter `on` should be set to column `Time` and `how` to `outer`.
* Next, we use [`df.set_index`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.set_index.html) to make column `Time` the index, and pass `range(1,10)` to [`df.reindex`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.reindex.html) to add any missing `Time` values (i.e. `8` in this case).
* Finally, we reset the index again ([`df.reset_index`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.reset_index.html)), and use [`df.loc`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html) to get the columns in the desired order.

res = (df
.merge(df2, on='Time', how='outer')
.set_index('Time').reindex(range(1,10))
.reset_index(drop=False)
.loc[:,['Player A','Player B', 'Time'] + [*'abcdefgh']])

res

Player A Player B Time a b c d e f g h
0 1.0 1.0 1 5.83 10.21 3.12 2.33 7.77 9.63 5.61 4.33
1 1.0 1.0 2 5.64 10.16 3.15 2.37 7.79 9.66 5.66 4.37
2 1.0 NaN 3 5.60 10.11 3.19 2.42 NaN NaN NaN NaN
3 1.0 NaN 4 5.55 10.08 3.22 2.46 NaN NaN NaN NaN
4 1.0 1.0 5 5.52 10.02 3.27 2.50 7.83 9.72 5.72 4.42
5 1.0 1.0 6 5.48 9.97 3.33 2.55 7.96 9.77 5.77 4.56
6 NaN 1.0 7 NaN NaN NaN NaN 9.22 9.83 6.23 4.67
7 NaN NaN 8 NaN NaN NaN NaN NaN NaN NaN NaN
8 NaN 1.0 9 NaN NaN NaN NaN 9.41 9.97 6.54 5.55

**Data used**

import pandas as pd

data = {'Player A': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1},
'Time': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6},
'a': {0: 5.83, 1: 5.64, 2: 5.6, 3: 5.55, 4: 5.52, 5: 5.48},
'b': {0: 10.21, 1: 10.16, 2: 10.11, 3: 10.08, 4: 10.02, 5: 9.97},
'c': {0: 3.12, 1: 3.15, 2: 3.19, 3: 3.22, 4: 3.27, 5: 3.33},
'd': {0: 2.33, 1: 2.37, 2: 2.42, 3: 2.46, 4: 2.5, 5: 2.55}}

df = pd.DataFrame(data)

data2 = {'Player B': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1},
'Time': {0: 1, 1: 2, 2: 5, 3: 6, 4: 7, 5: 9},
'e': {0: 7.77, 1: 7.79, 2: 7.83, 3: 7.96, 4: 9.22, 5: 9.41},
'f': {0: 9.63, 1: 9.66, 2: 9.72, 3: 9.77, 4: 9.83, 5: 9.97},
'g': {0: 5.61, 1: 5.66, 2: 5.72, 3: 5.77, 4: 6.23, 5: 6.54},
'h': {0: 4.33, 1: 4.37, 2: 4.42, 3: 4.56, 4: 4.67, 5: 5.55}}

df2 = pd.DataFrame(data2)

---------
*Note*. Incidentally, I trust that your expected output has erroneous values for the row at `Index 6`. I mean the values here in the second row:

output.loc[5:6, [*'efgh']]

  e     f     g     h

5 7.96 9.77 5.77 4.56
6 7.96 9.77 5.77 4.56

This should be:

res.loc[5:6, [*'efgh']]

  e     f     g     h

5 7.96 9.77 5.77 4.56
6 9.22 9.83 6.23 4.67


</details>



huangapple
  • 本文由 发表于 2023年2月26日 19:51:22
  • 转载请务必保留本文链接:https://go.coder-hub.com/75571775.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定