如果’W’在’X’中,将’Y’添加到’Z’。

huangapple go评论76阅读模式
英文:

If 'W' is in 'X' append 'Y' to 'Z'

问题

我正在尝试使它,如果'Col A'中的值在'Col C'中,则取'Col B'并在新列中创建一个列表,如下所示:

Column A Column B Column C Column D
P1 A P3, P4 B, D
P2 B P1, P3 C
P3 C P2, P3 A, B, C
P4 D P1, P4 A, D

'Col C'中的值目前是一个对象(我可以更改,但这是从文本文件导入并使用TextFSM格式设置的)。

新列中的值应该是一个字符串(我还应该将'Col C'设置为字符串以便更容易操作)。

我尝试过的一切似乎都不能按我希望的方式工作。
我在Pandas方面仍然相对新手,一直在努力找到一种方法来使这个工作。
我觉得答案非常简单,就在我眼前,但我似乎无法做到。

使用下面的尝试,我得到了"unhashable type: 'Series'",但是数据在我的代码中并没有创建为Series本身,它是从文本文件加载并用TextFSM格式化的。

for i in df.iterrows():
    if df['Column C'].str.contains(df['Column A']).any():
        results.append(df['Column B'][i])
df['VLANS on Port'] = results
英文:

I have a dataframe which is similar to this:

Column A Column B Column C
P1 A P3, P4
P2 B P1, P3
P3 C P2, P3
P4 D P1, P4

I'm trying to make it so if the value in 'Col A' is in 'Col C' then is takes 'Col B' and creates a list in a new column like below:

Column A Column B Column C Column D
P1 A P3, P4 B, D
P2 B P1, P3 C
P3 C P2, P3 A, B, C
P4 D P1, P4 A, D

The Values in 'Col C' is currently an object (I can change but this is what the import from a text file and textfsm formatting set it as.

The Values in the new column should be a string. (I should also set 'Col C' to be a string to make this easier on myself)

Nothing I've tried seems to get it to work the way I'm hoping.
I'm still pretty new with Pandas and I've been struggling to find a way to get this to work.
I feel the answer is super simple and right in front of me too but I can't seem to get it.

With the below attempt I get "unhashable type: 'Series'" however the data isn't created as a series itself within my code, it's loaded from a text file and formatted with TextFSM.

for i in df.iterrows():
    if df['Column C'].str.contains(df['Column A']).any():
        results.append(df['Column B'][i])
df['VLANS on Port'] = results

答案1

得分: 2

另一种可能的解决方案:

conn = (
    df.assign(tmp=df["Column C"].str.split(",\s*"))
        .explode("tmp").groupby("tmp")["Column B"].agg(", ".join)
)

df["Column E"] = df["Column A"].map(conn)

输出:

print(df)

  Column A Column B Column C Column E
0       P1        A   P3, P4     B, D
1       P2        B   P1, P3        C
2       P3        C   P2, P3  A, B, C
3       P4        D   P1, P4     A, D
英文:

Another possible solution :

conn = (
    df.assign(tmp = df["Column C"].str.split(",\s*"))
        .explode("tmp").groupby("tmp")["Column B"].agg(", ".join)
)
                          
df["Column E"] = df["Column A"].map(conn)

Output :

print(df)

  Column A Column B Column C Column E
0       P1        A   P3, P4     B, D
1       P2        B   P1, P3        C
2       P3        C   P2, P3  A, B, C
3       P4        D   P1, P4     A, D

答案2

得分: 1

在下面的代码中,您可以定义一个查找函数来查找行的索引,然后构建字符串。在下面的代码中:.map 将 col_A 中的每个值发送到映射函数。这形成了包含 col_C 包含该值的行的索引列表。
然后使用此索引列表来形成另一个列表,其中包含这些索引处的 col_B 的值。然后,使用逗号分隔符将所有这些值添加到字符串中。我假设了字符串类型始终是您的意图。

import pandas as pd

df = pd.DataFrame({ 'col_A': ['P1', 'P2', 'P3', 'P4'],
                    'col_B': ['A', 'B', 'C', 'D'],
                    'col_C': [ 'P3, P4', 'P1, P3', 'P2, P3', 'P1, P4']
                    })

def mapper(val):
    res = ','.join(df.loc[df[df['col_C'].str.contains(val)].index, 'col_B'])
    return res

df['col_D'] = df['col_A'].map(mapper)

print(df)

输出结果如下:

  col_A col_B   col_C  col_D
0    P1     A  P3, P4    B,D
1    P2     B  P1, P3      C
2    P3     C  P2, P3  A,B,C
3    P4     D  P1, P4    A,D

希望这对您有所帮助。

英文:

You can define a look-up function to find the indices of the rows and so construct the string. In the code below: .map sends each value in col_A to the mapper function. This forms an index list of the rows in which col_C contains the value.
This index List is then used to form another List of the values from col_B at those indices. Then all those values are added to the
string using a comma separator. I have assumed string types throughout which seems to be your intention.

import pandas as pd

df = pd.DataFrame({ 'col_A': ['P1', 'P2', 'P3', 'P4'],
                    'col_B': ['A', 'B', 'C', 'D'],
                    'col_C': [ 'P3, P4', 'P1, P3', 'P2, P3', 'P1, P4']
                    })

def mapper(val):
    res = ','.join(df.loc[df[df['col_C'].str.contains(val)].index, 'col_B'])
    return res

df['col_D'] = df['col_A'].map(mapper)

print(df)

gives:

  col_A col_B   col_C  col_D
0    P1     A  P3, P4    B,D
1    P2     B  P1, P3      C
2    P3     C  P2, P3  A,B,C
3    P4     D  P1, P4    A,D

答案3

得分: 1

# 输入
import pandas as pd
df = pd.read_clipboard()
df["Column C"] = df["Column C"].str.split(", ")  # 假设这是一个包含列表的列

# 之后重现你的输出的一个选项是:
```python
# 找出每行中包含Px的位置,然后进行转置(这样就知道哪些行包含Px)
# 重新索引,以便如果Px不存在,则不会返回错误值
# 使用0来填充NaN(因为重新索引会返回NaN)
# 使用Column B的值来重命名列
dummies = df["Column C"].apply(pd.Series).stack().str\
    .get_dummies().sum(level=0).T\
        .reindex(df["Column A"]).fillna(0)\
            .rename(columns=df["Column B"].to_dict())

#           A  B  C  D
# Column A            
# P1        0  1  0  1
# P2        0  0  1  0
# P3        1  1  1  0
# P4        1  0  0  1

# 对每行中的所有非零值,将值转换为列名
# 每行的输出将是一个列表
# 重置索引,以便与df匹配(假设索引是标准的0,...)
df["Column D"] = dummies.apply(
    lambda row: list(dummies.columns[row.values.nonzero()]), axis=1)\
    .reset_index(drop=True)

#   Column A Column B  Column C   Column D
# 0       P1        A  [P3, P4]     [B, D]
# 1       P2        B  [P1, P3]        [C]
# 2       P3        C  [P2, P3]  [A, B, C]
# 3       P4        D  [P1, P4]     [A, D]
英文:
# input
import pandas as pd
df = pd.read_clipboard()
df["Column C"] = df["Column C"].str.split(", ")  # assuming this is a column of lists

One option to then recreate your output is then:

# find which Px are in each row, then transpose (so which rows contain Px)
# reindex so that if any do Px do not occur then wrong values not returned
# fillna with 0 (as reindexing will return NaNs)
# rename columns with Column B values
dummies = df["Column C"].apply(pd.Series).stack().str\
    .get_dummies().sum(level=0).T\
        .reindex(df["Column A"]).fillna(0)\
            .rename(columns=df["Column B"].to_dict())

#           A  B  C  D
# Column A            
# P1        0  1  0  1
# P2        0  0  1  0
# P3        1  1  1  0
# P4        1  0  0  1

# for all non-zero values in each row, convert values to column names
# the output for each row will be a list
# reset the index so that it matches df (assuming that index is standard 0,...)
df["Column D"] = dummies.apply(
    lambda row: list(dummies.columns[row.values.nonzero()]), axis=1)\
    .reset_index(drop=True)

#   Column A Column B  Column C   Column D
# 0       P1        A  [P3, P4]     [B, D]
# 1       P2        B  [P1, P3]        [C]
# 2       P3        C  [P2, P3]  [A, B, C]
# 3       P4        D  [P1, P4]     [A, D]

huangapple
  • 本文由 发表于 2023年6月19日 23:27:39
  • 转载请务必保留本文链接:https://go.coder-hub.com/76508065.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定