2023年3月7日 08:52:53go评论69阅读模式

英文:

Creating new dataframe conditioned on dataframe columns and series values

问题

假设我有一个带有索引t1、t2、t3和列A、B、C、D、E的数据框（df），以及一个具有索引t1、t2、t3和值的系列，使得：

t1 [A, B, C]
t2 [D, E]
t3 [B, C, D]

我如何创建一个新的数据框，其索引为t1、t2、t3，列为A、B、C、D、E，以便数据框中的每个值取决于列值是否在系列的列表中。在这个示例中，我想要如下的数据框：

A  B  C  D  E
t1  T  T  T  F  F
t2  F  F  F  T  T
t3  F  T  T  T  F

我知道有一些用于数据框的函数，如apply和transform，但这些通常适用于数据框中的每个值，而不是列值本身。

英文:

Suppose that I have df with index t1, t2, t3 and columns A, B, C, D, E and a series with index t1, t2, t3 with values such that

t1 [A, B, C]
t2 [D, E]
t3 [B, C, D]

How do I create a new dataframe such with index t1, t2, t3 and columns A, B, C, D, E such that each value in the dataframe depends on whether the column value is in the lists of the series. So in this example, I want the dataframe

A B C D E
t1 T T T F F
t2 F F F T T
t3 F T T T F

I know that there are functions like apply and transform for dataframes but these usually apply to each value of the dataframe rather than column values themselves.

答案1

得分: 0

下面是翻译好的代码部分：

out = (s.explode().to_frame('col').assign(val='T')
       .set_index('col', append=True).unstack('col', fill_value='F')
       .droplevel(level=0, axis=1).rename_axis('', axis=1))

s = s.explode()
out = pd.crosstab(s.index, s).replace({1: 'T', 0: 'F'})

out = s.str.join(',').str.get_dummies(sep=',')

英文:

You can explode the Series and assign a new column val then set_index and unstack the A..E value column to convert it to column header

out = (s.explode().to_frame(&#39;col&#39;).assign(val=&#39;T&#39;)
       .set_index(&#39;col&#39;, append=True).unstack(&#39;col&#39;, fill_value=&#39;F&#39;)
       .droplevel(level=0, axis=1).rename_axis(&#39;&#39;, axis=1))

print(out)

     A  B  C  D  E
t1   T  T  T  F  F
t2   F  F  F  T  T
t3   F  T  T  T  F

Or you can do it with pd.crosstab

s = s.explode()
out = pd.crosstab(s.index, s).replace({1: &#39;T&#39;, 0: &#39;F&#39;})

print(out)

col_0  A  B  C  D  E
row_0
t1     T  T  T  F  F
t2     F  F  F  T  T
t3     F  T  T  T  F

Or with Series.str.get_dummies

out = s.str.join(&#39;,&#39;).str.get_dummies(sep=&#39;,&#39;)

print(out)

    A  B  C  D  E
t1  1  1  1  0  0
t2  0  0  0  1  1
t3  0  1  1  1  0

答案2

得分: 0

Solution

一个更容易理解的方法。

获取原始系列的索引（index = series_t.index）
设置新数据框的列（columns = ['A', 'B', 'C', 'D', 'E']）
创建一个空的数据框（df = pd.DataFrame(index=index, columns=columns)）
使用 for 循环和 if 语句设置数据框（df）的每个值

Code

import pandas as pd

# 创建原始系列
series_t = pd.Series(index=['t1', 't2', 't3'], data=[['A', 'B', 'C'], ['D', 'E'], ['B', 'C', 'D']])
print(series_t)

print('----------------------------------------------')

# 从原始系列创建新数据框
index = series_t.index
columns = ['A', 'B', 'C', 'D', 'E']
df = pd.DataFrame(index=index, columns=columns)
for i in index:
    for j in columns:
        if (j in series_t[i]):
            df[j][i] = 'T'
        else:
            df[j][i] = 'F'
print(df)

Output

t1    [A, B, C]
t2       [D, E]
t3    [B, C, D]
dtype: object
----------------------------------------------
    A  B  C  D  E
t1  T  T  T  F  F
t2  F  F  F  T  T
t3  F  T  T  T  F

注意：代码和输出已经在中文翻译中被保留，因此不需要进行额外的翻译。

英文:

Solution

A more understandable approach.

Get the index of original series (index = series_t.index)
Set the column of the new dataframe (columns = ['A', 'B', 'C', 'D', 'E'])
Create an empty dataFrame (df = pd.DataFrame(index=index, columns=columns))
Use for loop and if statement set each value of the dataFrame(df)

Code

import pandas as pd

# create original series
series_t = pd.Series(index=[&#39;t1&#39;, &#39;t2&#39;, &#39;t3&#39;], data=[[&#39;A&#39;, &#39;B&#39;, &#39;C&#39;], [&#39;D&#39;, &#39;E&#39;], [&#39;B&#39;, &#39;C&#39;, &#39;D&#39;]])
print(series_t)

print(&#39;----------------------------------------------&#39;)

# create the new dataFrame from the original series
index = series_t.index
columns = [&#39;A&#39;, &#39;B&#39;, &#39;C&#39;, &#39;D&#39;, &#39;E&#39;]
df = pd.DataFrame(index=index, columns=columns)
for i in index:
    for j in columns:
        if (j in series_t[i]):
            df[j][i] = &#39;T&#39;
        else:
            df[j][i] = &#39;F&#39;
print(df)

Output

t1    [A, B, C]
t2       [D, E]
t3    [B, C, D]
dtype: object
----------------------------------------------
    A  B  C  D  E
t1  T  T  T  F  F
t2  F  F  F  T  T
t3  F  T  T  T  F

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

创建基于数据框列和系列数值条件的新数据框

问题

答案1

答案2

Explanation on python slice function output 关于Python切片函数输出的解释

在Python中的类变量的条件赋值

Bokeh ColumnDataSource标识为源时出现错误 – 为什么？

chr(122 – (ord(char) – 97))的用法如何？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论