英文:
If 'W' is in 'X' append 'Y' to 'Z'
问题
我正在尝试使它,如果'Col A'中的值在'Col C'中,则取'Col B'并在新列中创建一个列表,如下所示:
Column A | Column B | Column C | Column D |
---|---|---|---|
P1 | A | P3, P4 | B, D |
P2 | B | P1, P3 | C |
P3 | C | P2, P3 | A, B, C |
P4 | D | P1, P4 | A, D |
'Col C'中的值目前是一个对象(我可以更改,但这是从文本文件导入并使用TextFSM格式设置的)。
新列中的值应该是一个字符串(我还应该将'Col C'设置为字符串以便更容易操作)。
我尝试过的一切似乎都不能按我希望的方式工作。
我在Pandas方面仍然相对新手,一直在努力找到一种方法来使这个工作。
我觉得答案非常简单,就在我眼前,但我似乎无法做到。
使用下面的尝试,我得到了"unhashable type: 'Series'",但是数据在我的代码中并没有创建为Series本身,它是从文本文件加载并用TextFSM格式化的。
for i in df.iterrows():
if df['Column C'].str.contains(df['Column A']).any():
results.append(df['Column B'][i])
df['VLANS on Port'] = results
英文:
I have a dataframe which is similar to this:
Column A | Column B | Column C |
---|---|---|
P1 | A | P3, P4 |
P2 | B | P1, P3 |
P3 | C | P2, P3 |
P4 | D | P1, P4 |
I'm trying to make it so if the value in 'Col A' is in 'Col C' then is takes 'Col B' and creates a list in a new column like below:
Column A | Column B | Column C | Column D |
---|---|---|---|
P1 | A | P3, P4 | B, D |
P2 | B | P1, P3 | C |
P3 | C | P2, P3 | A, B, C |
P4 | D | P1, P4 | A, D |
The Values in 'Col C' is currently an object (I can change but this is what the import from a text file and textfsm formatting set it as.
The Values in the new column should be a string. (I should also set 'Col C' to be a string to make this easier on myself)
Nothing I've tried seems to get it to work the way I'm hoping.
I'm still pretty new with Pandas and I've been struggling to find a way to get this to work.
I feel the answer is super simple and right in front of me too but I can't seem to get it.
With the below attempt I get "unhashable type: 'Series'" however the data isn't created as a series itself within my code, it's loaded from a text file and formatted with TextFSM.
for i in df.iterrows():
if df['Column C'].str.contains(df['Column A']).any():
results.append(df['Column B'][i])
df['VLANS on Port'] = results
答案1
得分: 2
另一种可能的解决方案:
conn = (
df.assign(tmp=df["Column C"].str.split(",\s*"))
.explode("tmp").groupby("tmp")["Column B"].agg(", ".join)
)
df["Column E"] = df["Column A"].map(conn)
输出:
print(df)
Column A Column B Column C Column E
0 P1 A P3, P4 B, D
1 P2 B P1, P3 C
2 P3 C P2, P3 A, B, C
3 P4 D P1, P4 A, D
英文:
Another possible solution :
conn = (
df.assign(tmp = df["Column C"].str.split(",\s*"))
.explode("tmp").groupby("tmp")["Column B"].agg(", ".join)
)
df["Column E"] = df["Column A"].map(conn)
Output :
print(df)
Column A Column B Column C Column E
0 P1 A P3, P4 B, D
1 P2 B P1, P3 C
2 P3 C P2, P3 A, B, C
3 P4 D P1, P4 A, D
答案2
得分: 1
在下面的代码中,您可以定义一个查找函数来查找行的索引,然后构建字符串。在下面的代码中:.map 将 col_A 中的每个值发送到映射函数。这形成了包含 col_C 包含该值的行的索引列表。
然后使用此索引列表来形成另一个列表,其中包含这些索引处的 col_B 的值。然后,使用逗号分隔符将所有这些值添加到字符串中。我假设了字符串类型始终是您的意图。
import pandas as pd
df = pd.DataFrame({ 'col_A': ['P1', 'P2', 'P3', 'P4'],
'col_B': ['A', 'B', 'C', 'D'],
'col_C': [ 'P3, P4', 'P1, P3', 'P2, P3', 'P1, P4']
})
def mapper(val):
res = ','.join(df.loc[df[df['col_C'].str.contains(val)].index, 'col_B'])
return res
df['col_D'] = df['col_A'].map(mapper)
print(df)
输出结果如下:
col_A col_B col_C col_D
0 P1 A P3, P4 B,D
1 P2 B P1, P3 C
2 P3 C P2, P3 A,B,C
3 P4 D P1, P4 A,D
希望这对您有所帮助。
英文:
You can define a look-up function to find the indices of the rows and so construct the string. In the code below: .map sends each value in col_A to the mapper function. This forms an index list of the rows in which col_C contains the value.
This index List is then used to form another List of the values from col_B at those indices. Then all those values are added to the
string using a comma separator. I have assumed string types throughout which seems to be your intention.
import pandas as pd
df = pd.DataFrame({ 'col_A': ['P1', 'P2', 'P3', 'P4'],
'col_B': ['A', 'B', 'C', 'D'],
'col_C': [ 'P3, P4', 'P1, P3', 'P2, P3', 'P1, P4']
})
def mapper(val):
res = ','.join(df.loc[df[df['col_C'].str.contains(val)].index, 'col_B'])
return res
df['col_D'] = df['col_A'].map(mapper)
print(df)
gives:
col_A col_B col_C col_D
0 P1 A P3, P4 B,D
1 P2 B P1, P3 C
2 P3 C P2, P3 A,B,C
3 P4 D P1, P4 A,D
答案3
得分: 1
# 输入
import pandas as pd
df = pd.read_clipboard()
df["Column C"] = df["Column C"].str.split(", ") # 假设这是一个包含列表的列
# 之后重现你的输出的一个选项是:
```python
# 找出每行中包含Px的位置,然后进行转置(这样就知道哪些行包含Px)
# 重新索引,以便如果Px不存在,则不会返回错误值
# 使用0来填充NaN(因为重新索引会返回NaN)
# 使用Column B的值来重命名列
dummies = df["Column C"].apply(pd.Series).stack().str\
.get_dummies().sum(level=0).T\
.reindex(df["Column A"]).fillna(0)\
.rename(columns=df["Column B"].to_dict())
# A B C D
# Column A
# P1 0 1 0 1
# P2 0 0 1 0
# P3 1 1 1 0
# P4 1 0 0 1
# 对每行中的所有非零值,将值转换为列名
# 每行的输出将是一个列表
# 重置索引,以便与df匹配(假设索引是标准的0,...)
df["Column D"] = dummies.apply(
lambda row: list(dummies.columns[row.values.nonzero()]), axis=1)\
.reset_index(drop=True)
# Column A Column B Column C Column D
# 0 P1 A [P3, P4] [B, D]
# 1 P2 B [P1, P3] [C]
# 2 P3 C [P2, P3] [A, B, C]
# 3 P4 D [P1, P4] [A, D]
英文:
# input
import pandas as pd
df = pd.read_clipboard()
df["Column C"] = df["Column C"].str.split(", ") # assuming this is a column of lists
One option to then recreate your output is then:
# find which Px are in each row, then transpose (so which rows contain Px)
# reindex so that if any do Px do not occur then wrong values not returned
# fillna with 0 (as reindexing will return NaNs)
# rename columns with Column B values
dummies = df["Column C"].apply(pd.Series).stack().str\
.get_dummies().sum(level=0).T\
.reindex(df["Column A"]).fillna(0)\
.rename(columns=df["Column B"].to_dict())
# A B C D
# Column A
# P1 0 1 0 1
# P2 0 0 1 0
# P3 1 1 1 0
# P4 1 0 0 1
# for all non-zero values in each row, convert values to column names
# the output for each row will be a list
# reset the index so that it matches df (assuming that index is standard 0,...)
df["Column D"] = dummies.apply(
lambda row: list(dummies.columns[row.values.nonzero()]), axis=1)\
.reset_index(drop=True)
# Column A Column B Column C Column D
# 0 P1 A [P3, P4] [B, D]
# 1 P2 B [P1, P3] [C]
# 2 P3 C [P2, P3] [A, B, C]
# 3 P4 D [P1, P4] [A, D]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论