英文:
Creating new dataframe conditioned on dataframe columns and series values
问题
假设我有一个带有索引t1、t2、t3和列A、B、C、D、E的数据框(df),以及一个具有索引t1、t2、t3和值的系列,使得:
t1 [A, B, C]
t2 [D, E]
t3 [B, C, D]
我如何创建一个新的数据框,其索引为t1、t2、t3,列为A、B、C、D、E,以便数据框中的每个值取决于列值是否在系列的列表中。在这个示例中,我想要如下的数据框:
A B C D E
t1 T T T F F
t2 F F F T T
t3 F T T T F
我知道有一些用于数据框的函数,如apply和transform,但这些通常适用于数据框中的每个值,而不是列值本身。
英文:
Suppose that I have df with index t1, t2, t3 and columns A, B, C, D, E and a series with index t1, t2, t3 with values such that
t1 [A, B, C]
t2 [D, E]
t3 [B, C, D]
How do I create a new dataframe such with index t1, t2, t3 and columns A, B, C, D, E such that each value in the dataframe depends on whether the column value is in the lists of the series. So in this example, I want the dataframe
A B C D E
t1 T T T F F
t2 F F F T T
t3 F T T T F
I know that there are functions like apply and transform for dataframes but these usually apply to each value of the dataframe rather than column values themselves.
答案1
得分: 0
下面是翻译好的代码部分:
out = (s.explode().to_frame('col').assign(val='T')
.set_index('col', append=True).unstack('col', fill_value='F')
.droplevel(level=0, axis=1).rename_axis('', axis=1))
s = s.explode()
out = pd.crosstab(s.index, s).replace({1: 'T', 0: 'F'})
out = s.str.join(',').str.get_dummies(sep=',')
英文:
You can explode the Series and assign
a new column val
then set_index
and unstack
the A..E
value column to convert it to column header
out = (s.explode().to_frame('col').assign(val='T')
.set_index('col', append=True).unstack('col', fill_value='F')
.droplevel(level=0, axis=1).rename_axis('', axis=1))
print(out)
A B C D E
t1 T T T F F
t2 F F F T T
t3 F T T T F
Or you can do it with pd.crosstab
s = s.explode()
out = pd.crosstab(s.index, s).replace({1: 'T', 0: 'F'})
print(out)
col_0 A B C D E
row_0
t1 T T T F F
t2 F F F T T
t3 F T T T F
Or with Series.str.get_dummies
out = s.str.join(',').str.get_dummies(sep=',')
print(out)
A B C D E
t1 1 1 1 0 0
t2 0 0 0 1 1
t3 0 1 1 1 0
答案2
得分: 0
Solution
一个更容易理解的方法。
- 获取原始系列的索引(
index = series_t.index
) - 设置新数据框的列(
columns = ['A', 'B', 'C', 'D', 'E']
) - 创建一个空的数据框(
df = pd.DataFrame(index=index, columns=columns)
) - 使用
for
循环和if
语句设置数据框(df
)的每个值
Code
import pandas as pd
# 创建原始系列
series_t = pd.Series(index=['t1', 't2', 't3'], data=[['A', 'B', 'C'], ['D', 'E'], ['B', 'C', 'D']])
print(series_t)
print('----------------------------------------------')
# 从原始系列创建新数据框
index = series_t.index
columns = ['A', 'B', 'C', 'D', 'E']
df = pd.DataFrame(index=index, columns=columns)
for i in index:
for j in columns:
if (j in series_t[i]):
df[j][i] = 'T'
else:
df[j][i] = 'F'
print(df)
Output
t1 [A, B, C]
t2 [D, E]
t3 [B, C, D]
dtype: object
----------------------------------------------
A B C D E
t1 T T T F F
t2 F F F T T
t3 F T T T F
注意:代码和输出已经在中文翻译中被保留,因此不需要进行额外的翻译。
英文:
Solution
A more understandable approach.
- Get the index of original series (
index = series_t.index
) - Set the column of the new dataframe (
columns = ['A', 'B', 'C', 'D', 'E']
) - Create an empty dataFrame (
df = pd.DataFrame(index=index, columns=columns)
) - Use
for
loop andif
statement set each value of the dataFrame(df
)
Code
import pandas as pd
# create original series
series_t = pd.Series(index=['t1', 't2', 't3'], data=[['A', 'B', 'C'], ['D', 'E'], ['B', 'C', 'D']])
print(series_t)
print('----------------------------------------------')
# create the new dataFrame from the original series
index = series_t.index
columns = ['A', 'B', 'C', 'D', 'E']
df = pd.DataFrame(index=index, columns=columns)
for i in index:
for j in columns:
if (j in series_t[i]):
df[j][i] = 'T'
else:
df[j][i] = 'F'
print(df)
Output
t1 [A, B, C]
t2 [D, E]
t3 [B, C, D]
dtype: object
----------------------------------------------
A B C D E
t1 T T T F F
t2 F F F T T
t3 F T T T F
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论