创建基于数据框列和系列数值条件的新数据框

huangapple go评论64阅读模式
英文:

Creating new dataframe conditioned on dataframe columns and series values

问题

假设我有一个带有索引t1、t2、t3和列A、B、C、D、E的数据框(df),以及一个具有索引t1、t2、t3和值的系列,使得:

t1 [A, B, C]
t2 [D, E]
t3 [B, C, D]

我如何创建一个新的数据框,其索引为t1、t2、t3,列为A、B、C、D、E,以便数据框中的每个值取决于列值是否在系列的列表中。在这个示例中,我想要如下的数据框:

A  B  C  D  E
t1  T  T  T  F  F
t2  F  F  F  T  T
t3  F  T  T  T  F

我知道有一些用于数据框的函数,如apply和transform,但这些通常适用于数据框中的每个值,而不是列值本身。

英文:

Suppose that I have df with index t1, t2, t3 and columns A, B, C, D, E and a series with index t1, t2, t3 with values such that

t1 [A, B, C]
t2 [D, E]
t3 [B, C, D]

How do I create a new dataframe such with index t1, t2, t3 and columns A, B, C, D, E such that each value in the dataframe depends on whether the column value is in the lists of the series. So in this example, I want the dataframe

A B C D E
t1 T T T F F
t2 F F F T T
t3 F T T T F

I know that there are functions like apply and transform for dataframes but these usually apply to each value of the dataframe rather than column values themselves.

答案1

得分: 0

下面是翻译好的代码部分:

out = (s.explode().to_frame('col').assign(val='T')
       .set_index('col', append=True).unstack('col', fill_value='F')
       .droplevel(level=0, axis=1).rename_axis('', axis=1))
s = s.explode()
out = pd.crosstab(s.index, s).replace({1: 'T', 0: 'F'})
out = s.str.join(',').str.get_dummies(sep=',')
英文:

You can explode the Series and assign a new column val then set_index and unstack the A..E value column to convert it to column header

out = (s.explode().to_frame('col').assign(val='T')
       .set_index('col', append=True).unstack('col', fill_value='F')
       .droplevel(level=0, axis=1).rename_axis('', axis=1))
print(out)

     A  B  C  D  E
t1   T  T  T  F  F
t2   F  F  F  T  T
t3   F  T  T  T  F

Or you can do it with pd.crosstab

s = s.explode()
out = pd.crosstab(s.index, s).replace({1: 'T', 0: 'F'})
print(out)

col_0  A  B  C  D  E
row_0
t1     T  T  T  F  F
t2     F  F  F  T  T
t3     F  T  T  T  F

Or with Series.str.get_dummies

out = s.str.join(',').str.get_dummies(sep=',')
print(out)

    A  B  C  D  E
t1  1  1  1  0  0
t2  0  0  0  1  1
t3  0  1  1  1  0

答案2

得分: 0

Solution

一个更容易理解的方法。

  1. 获取原始系列的索引(index = series_t.index
  2. 设置新数据框的列(columns = ['A', 'B', 'C', 'D', 'E']
  3. 创建一个空的数据框(df = pd.DataFrame(index=index, columns=columns)
  4. 使用 for 循环和 if 语句设置数据框(df)的每个值

Code

import pandas as pd

# 创建原始系列
series_t = pd.Series(index=['t1', 't2', 't3'], data=[['A', 'B', 'C'], ['D', 'E'], ['B', 'C', 'D']])
print(series_t)

print('----------------------------------------------')

# 从原始系列创建新数据框
index = series_t.index
columns = ['A', 'B', 'C', 'D', 'E']
df = pd.DataFrame(index=index, columns=columns)
for i in index:
    for j in columns:
        if (j in series_t[i]):
            df[j][i] = 'T'
        else:
            df[j][i] = 'F'
print(df)

Output

t1    [A, B, C]
t2       [D, E]
t3    [B, C, D]
dtype: object
----------------------------------------------
    A  B  C  D  E
t1  T  T  T  F  F
t2  F  F  F  T  T
t3  F  T  T  T  F

注意:代码和输出已经在中文翻译中被保留,因此不需要进行额外的翻译。

英文:

Solution

A more understandable approach.

  1. Get the index of original series (index = series_t.index)
  2. Set the column of the new dataframe (columns = ['A', 'B', 'C', 'D', 'E'])
  3. Create an empty dataFrame (df = pd.DataFrame(index=index, columns=columns))
  4. Use for loop and if statement set each value of the dataFrame(df)

Code

import pandas as pd

# create original series
series_t = pd.Series(index=['t1', 't2', 't3'], data=[['A', 'B', 'C'], ['D', 'E'], ['B', 'C', 'D']])
print(series_t)

print('----------------------------------------------')

# create the new dataFrame from the original series
index = series_t.index
columns = ['A', 'B', 'C', 'D', 'E']
df = pd.DataFrame(index=index, columns=columns)
for i in index:
    for j in columns:
        if (j in series_t[i]):
            df[j][i] = 'T'
        else:
            df[j][i] = 'F'
print(df)

Output

t1    [A, B, C]
t2       [D, E]
t3    [B, C, D]
dtype: object
----------------------------------------------
    A  B  C  D  E
t1  T  T  T  F  F
t2  F  F  F  T  T
t3  F  T  T  T  F

huangapple
  • 本文由 发表于 2023年3月7日 08:52:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/75657140.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定