如何在满足条件的情况下填充一列,该条件要跨越其他2个或更多列。

huangapple go评论75阅读模式
英文:

How to populate a column with a value if condition is met across 2+ other colums

问题

我的数据框类似于下面的表格。我有6列,每一列都有'Yes'或'No',表示是否使用了特定的抗生素。

要求在' TREATED '列中填入'True',如果特定的抗生素列的组合包含'Yes'。如果条件不满足,那么我想在'TREATED'列中填入'False'值。

如果 ['AZITH'] 和 ['CLIN'] == 'Yes' 或
['AZITH'] 和 ['CFTX'] 和 ['CLIN'] == 'Yes' 或
['AZITH'] 和 ['CFTX'] 和 ['METRO'] == 'Yes' 或
['AZITH'] 和 ['CFTN'] == 'Yes' 或
['CFTX'] 和 ['DOXY'] 和 ['METRO'] == 'Yes' 或
['CFTN'] 和 ['DOXY'] == 'Yes' 或
['DOXY'] 和 ['METRO'] == 'Yes',
然后在 'TREATED' 列中返回 'True',
否则返回 'False'。

此外,还包括所有6种药物都被使用的情况。如果是这样,那么应该返回'True',因为已经满足了至少使用2种治疗药物的条件。

期望的输出如下:

AZITH CLIN CFTX METRO CFTN DOXY TREATED
Yes Yes No No No No Yes
No Yes No Yes No No No
Yes Yes No No No No Yes
No No No No No No No
Yes Yes Yes Yes Yes Yes Yes
No Yes Yes Yes No Yes Yes
No No No No No Yes No
No No No No No No No
Yes Yes Yes No No No Yes
Yes No Yes Yes No No Yes
Yes No No No Yes No Yes
No No Yes Yes No Yes Yes
No No No No Yes Yes Yes
No No No Yes No Yes Yes
英文:

My dataframe is similar to the table below. I have 6 columns, each with 'Yes' or 'No' if a specific antibiotic was given.

AZITH CLIN CFTX METRO CFTN DOXY TREATED
Yes Yes No No No No
No Yes No Yes No No
Yes Yes No No No No
No No No No No No
Yes Yes Yes Yes Yes Yes
No Yes Yes Yes No Yes
No No No No No Yes
No No No No No No
Yes Yes Yes No No No
Yes No Yes Yes No No
Yes No No No Yes No
No No Yes Yes No Yes
No No No No Yes Yes
No No No Yes No Yes

I want to fill the column 'TREATED' with 'True' if specific combinations of antibiotic columns contain 'Yes.' If the conditions aren't met, then I would like to fill the 'TREATED' column with a 'False' value.

> If ['AZITH'] & ['CLIN'] == 'Yes' |
>
> ['AZITH'] & ['CFTX'] & ['CLIN'] == 'Yes' |
>
> ['AZITH'] & ['CFTX'] & ['METRO']== 'Yes' |
>
> ['AZITH'] & ['CFTN'] == 'Yes' |
>
> ['CFTX'] & ['DOXY'] & ['METRO']== 'Yes' |
>
>['CFTN'] & ['DOXY'] == 'Yes' |
>
> ['DOXY'] & ['METRO']== 'Yes' ,
>
> Then return 'True' in column 'TREATED'
>
> Else 'False'

What I had in mind was some sort of if statement or use of lambda function, however, I am having trouble.

This must not be exclusive to the above combinations but also include for example if all 6 medications were given. If that's the case, then 'True' should be returned because the condition has been met to give at least 2 of the treatment medications.

The desired output is below:

AZITH CLIN CFTX METRO CFTN DOXY TREATED
Yes Yes No No No No Yes
No Yes No Yes No No No
Yes Yes No No No No Yes
No No No No No No No
Yes Yes Yes Yes Yes Yes Yes
No Yes Yes Yes No Yes Yes
No No No No No Yes No
No No No No No No No
Yes Yes Yes No No No Yes
Yes No Yes Yes No No Yes
Yes No No No Yes No Yes
No No Yes Yes No Yes Yes
No No No No Yes Yes Yes
No No No Yes No Yes Yes

答案1

得分: 1

使用您提供的数据帧,以下是一个代码示例,用于创建一个名为"TREATED"的新列,根据一组条件将其设置为"Yes"或"No":

mask = (
    ((df["AZITH"] == "Yes") & (df["CLIN"] == "Yes"))
    | ((df["AZITH"] == "Yes") & (df["CLIN"] == "Yes") & (df["CFTX"] == "Yes"))
    | ((df["AZITH"] == "Yes") & (df["CFTX"] == "Yes") & (df["METRO"] == "Yes"))
    | ((df["AZITH"] == "Yes") & (df["CFTN"] == "Yes"))
    | ((df["CFTX"] == "Yes") & (df["DOXY"] == "Yes") & (df["METRO"] == "Yes"))
    | ((df["CFTN"] == "Yes") & (df["DOXY"] == "Yes"))
    | ((df["DOXY"] == "Yes") & (df["METRO"] == "Yes"))
)
df.loc[mask, "TREATED"] = "Yes"
df = df.fillna("No")

然后,使用print(df)可以打印出结果数据帧,其中包含了新的"TREATED"列。

希望这能帮助您。如果您需要进一步的翻译或有其他问题,请随时提出。

英文:

With the dataframe you provided:

import pandas as pd

df = pd.DataFrame(
    {
        "AZITH": [
            "Yes",
            "No",
            "Yes",
            "No",
            "Yes",
            "No",
            "No",
            "No",
            "Yes",
            "Yes",
            "Yes",
            "No",
            "No",
            "No",
        ],
        "CLIN": [
            "Yes",
            "Yes",
            "Yes",
            "No",
            "Yes",
            "Yes",
            "No",
            "No",
            "Yes",
            "No",
            "No",
            "No",
            "No",
            "No",
        ],
        "CFTX": [
            "No",
            "No",
            "No",
            "No",
            "Yes",
            "Yes",
            "No",
            "No",
            "Yes",
            "Yes",
            "No",
            "Yes",
            "No",
            "No",
        ],
        "METRO": [
            "No",
            "Yes",
            "No",
            "No",
            "Yes",
            "Yes",
            "No",
            "No",
            "No",
            "Yes",
            "No",
            "Yes",
            "No",
            "Yes",
        ],
        "CFTN": [
            "No",
            "No",
            "No",
            "No",
            "Yes",
            "No",
            "No",
            "No",
            "No",
            "No",
            "Yes",
            "No",
            "Yes",
            "No",
        ],
        "DOXY": [
            "No",
            "No",
            "No",
            "No",
            "Yes",
            "Yes",
            "Yes",
            "No",
            "No",
            "No",
            "No",
            "Yes",
            "Yes",
            "Yes",
        ],
    }
)

Here is one way to do it:

mask = (
    ((df["AZITH"] == "Yes") & (df["CLIN"] == "Yes"))
    | ((df["AZITH"] == "Yes") & (df["CLIN"] == "Yes") & (df["CFTX"] == "Yes"))
    | ((df["AZITH"] == "Yes") & (df["CFTX"] == "Yes") & (df["METRO"] == "Yes"))
    | ((df["AZITH"] == "Yes") & (df["CFTN"] == "Yes"))
    | ((df["CFTX"] == "Yes") & (df["DOXY"] == "Yes") & (df["METRO"] == "Yes"))
    | ((df["CFTN"] == "Yes") & (df["DOXY"] == "Yes"))
    | ((df["DOXY"] == "Yes") & (df["METRO"] == "Yes"))
)
df.loc[mask, "TREATED"] = "Yes"
df = df.fillna("No")

Then:

print(df)
# Output
   AZITH CLIN CFTX METRO CFTN DOXY TREATED
0    Yes  Yes   No    No   No   No     Yes
1     No  Yes   No   Yes   No   No      No
2    Yes  Yes   No    No   No   No     Yes
3     No   No   No    No   No   No      No
4    Yes  Yes  Yes   Yes  Yes  Yes     Yes
5     No  Yes  Yes   Yes   No  Yes     Yes
6     No   No   No    No   No  Yes      No
7     No   No   No    No   No   No      No
8    Yes  Yes  Yes    No   No   No     Yes
9    Yes   No  Yes   Yes   No   No     Yes
10   Yes   No   No    No  Yes   No     Yes
11    No   No  Yes   Yes   No  Yes     Yes
12    No   No   No    No  Yes  Yes     Yes
13    No   No   No   Yes   No  Yes     Yes

答案2

得分: 0

这有点抽象,但你可以使用位标志来表示每个“是”(True),为它分配一个二进制值,然后根据 if 语句在线程中进行数学运算。

https://dietertack.medium.com/using-bit-flags-in-c-d39ec6e30f08

英文:

This is a little abstract but you can use bit flag to represent each Yes (True) assign it a binary value and then do the math in threated based on an if statement.

https://dietertack.medium.com/using-bit-flags-in-c-d39ec6e30f08

答案3

得分: 0

你可以使用集合操作,首先将给定的药物作为集合聚合,然后检查所有可能的组合是否有超集:

valid_treatments = [{'AZITH', 'CLIN'}, {'AZITH', 'CFTX', 'CLIN'},
                    {'AZITH', 'CFTX', 'METRO'}, {'AZITH', 'CFTN'},
                    {'CFTX', 'DOXY', 'METRO'}, {'CFTN', 'DOXY'},
                    {'DOXY', 'METRO'},
                   ]

def is_valid(row):
    combination = set(df.columns[row])
    return 'Yes' if any(
               combination.issuperset(v)
               for v in valid_treatments
           ) else 'No'

out = df.assign(TREATED=df.eq('Yes').apply(is_valid, axis=1))

输出:

   AZITH CLIN CFTX METRO CFTN DOXY TREATED
0    Yes  Yes   No    No   No   No     Yes
1     No  Yes   No   Yes   No   No      No
2    Yes  Yes   No    No   No   No     Yes
3     No   No   No    No   No   No      No
4    Yes  Yes  Yes   Yes  Yes  Yes     Yes
5     No  Yes  Yes   Yes   No  Yes     Yes
6     No   No   No    No   No  Yes      No
7     No   No   No    No   No   No      No
8    Yes  Yes  Yes    No   No   No     Yes
9    Yes   No  Yes   Yes   No   No     Yes
10   Yes   No   No    No  Yes   No     Yes
11    No   No  Yes   Yes   No  Yes     Yes
12    No   No   No    No  Yes  Yes     Yes
13    No   No   No   Yes   No  Yes     Yes
英文:

You can use set operations, first aggregate as the sets of given medications, then check over all possible combinations if you have a superset:

valid_treatments = [{'AZITH', 'CLIN'}, {'AZITH', 'CFTX', 'CLIN'},
{'AZITH', 'CFTX', 'METRO'}, {'AZITH', 'CFTN'},
{'CFTX', 'DOXY', 'METRO'}, {'CFTN', 'DOXY'},
{'DOXY', 'METRO'},
]
def is_valid(row):
combination = set(df.columns[row])
return 'Yes' if any(
combination.issuperset(v)
for v in valid_treatments
) else 'No'
out = df.assign(TREATED=df.eq('Yes').apply(is_valid, axis=1))

Output:

   AZITH CLIN CFTX METRO CFTN DOXY TREATED
0    Yes  Yes   No    No   No   No     Yes
1     No  Yes   No   Yes   No   No      No
2    Yes  Yes   No    No   No   No     Yes
3     No   No   No    No   No   No      No
4    Yes  Yes  Yes   Yes  Yes  Yes     Yes
5     No  Yes  Yes   Yes   No  Yes     Yes
6     No   No   No    No   No  Yes      No
7     No   No   No    No   No   No      No
8    Yes  Yes  Yes    No   No   No     Yes
9    Yes   No  Yes   Yes   No   No     Yes
10   Yes   No   No    No  Yes   No     Yes
11    No   No  Yes   Yes   No  Yes     Yes
12    No   No   No    No  Yes  Yes     Yes
13    No   No   No   Yes   No  Yes     Yes

答案4

得分: 0

我的答案将尝试将此解决方案向量化。但我从Mozway那里得到了关于超集的想法。在那之前,我无法弄清如何处理所有的组合。

import numpy as np
import pandas as pd
import itertools

df = pd.DataFrame({"AZITH": ["Yes", "No", "Yes", "No", "Yes", "No", "No", "No", "Yes", "Yes", "Yes", "No", "No", "No", ],
                   "CLIN": ["Yes", "Yes", "Yes", "No", "Yes", "Yes", "No", "No", "Yes", "No", "No", "No", "No", "No", ],
                   "CFTX": ["No", "No", "No", "No", "Yes", "Yes", "No", "No", "Yes", "Yes", "No", "Yes", "No", "No", ],
                   "METRO": ["No", "Yes", "No", "No", "Yes", "Yes", "No", "No", "No", "Yes", "No", "Yes", "No", "Yes", ],
                   "CFTN": ["No", "No", "No", "No", "Yes", "No", "No", "No", "No", "No", "Yes", "No", "Yes", "No", ],
                   "DOXY": ["No", "No", "No", "No", "Yes", "Yes", "Yes", "No", "No", "No", "No", "No", "Yes", "Yes", "Yes", ]})

combos = np.array([[1, 1, 0, 0, 0, 0], [1, 1, 1, 0, 0, 0], [1, 0, 1, 1, 0, 0], [1, 0, 0, 0, 1, 0], [0, 0, 1, 1, 0, 1], [0, 0, 0, 0, 1, 1], [0, 0, 0, 1, 0, 1]])

df = df.replace("Yes", 1)
df = df.replace("No", 0)
c = []
for l in range(len(combos)):
    c.extend(itertools.combinations(range(len(combos)), l))

all_combos = combos
for combo in c[1:]:
    combined = np.sum(combos[combo, :], axis=0)
    all_combos = np.vstack([all_combos, combined])

all_combos[all_combos != 0] = 1
all_combos = np.unique(all_combos, axis=0)
combo_sum = all_combos.sum(axis=1)
all_combos[all_combos == 0] = -1
new_df = df.dot(all_combos.transpose())
for i, x in enumerate(combo_sum):
    new_df.loc[new_df[i] < x, i] = 0

new_df[new_df > 0] = 1
new_df["res"] = new_df.sum(axis=1)
new_df.loc[new_df.res > 0, "res"] = True
new_df.loc[new_df.res == 0, "res"] = False
df["res"] = new_df["res"]
    AZITH  CLIN  CFTX  METRO  CFTN  DOXY    res
0       1     1     0      0     0     0   True
1       0     1     0      1     0     0  False
2       1     1     0      0     0     0   True
3       0     0     0      0     0     0  False
4       1     1     1      1     1     1   True
5       0     1     1      1     0     1  False
6       0     0     0      0     0     1  False
7       0     0     0      0     0     0  False
8       1     1     1      0     0     0   True
9       1     0     1      1     0     0   True
10      1     0     0      0     1     0   True
11      0     0     1      1     0     1   True
12      0     0     0      0     1     1   True
13      0     0     0      1     0     1   True

这段代码的一般解释是,我创建了一个包括所有组合的numpy数组,包括可接受的组合的组合(两个或更多组合的总和)。我删除了重复的组合,剩下的是这些:

np.unique(all_combos, axis=0)
Out[38]: 
array([[0, 0, 0, 0, 1, 1],
       [0, 0, 0, 1, 0, 1],
       [0, 0, 0, 1, 1, 1],
       [0, 0, 1, 1, 0, 1],
       [0, 0, 1, 1, 1, 1],
       [1, 0, 0, 0, 1, 0],
       [1, 0, 0, 0, 1, 1],
       [1, 0, 0, 1, 1, 1],
       [1, 0, 1, 1, 0, 0],
       [1, 0, 1, 1, 0, 1],
       [1, 0, 1, 1, 1, 0],
       [1, 0, 1, 1, 1, 1],
       [1, 1, 0, 0, 0, 0],
       [1, 1, 0, 0, 1, 0],
       [1, 1, 0, 0, 1, 1],
       [1, 1, 0, 1, 0, 1],
       [1, 1, 0, 1, 1, 1],
       [1, 1, 1, 0, 0, 0],
       [1, 1, 1, 0, 1, 0],
       [

<details>
<summary>英文:</summary>

My answer is going to attempt to vectorize this solution. But I got the idea for the superset from Mozway. Before that I couldn&#39;t figure out how to handle all combos

import numpy as np
import pandas as pd
import itertools

df = pd.DataFrame({ "AZITH": ["Yes","No","Yes","No","Yes","No","No","No","Yes","Yes","Yes","No","No","No",],
"CLIN": ["Yes","Yes","Yes","No","Yes","Yes","No","No","Yes","No","No","No","No","No",],
"CFTX": ["No","No","No","No","Yes","Yes","No","No","Yes","Yes","No","Yes","No","No",],
"METRO": ["No","Yes","No","No","Yes","Yes","No","No","No","Yes","No","Yes","No","Yes",],
"CFTN": ["No","No","No","No","Yes","No","No","No","No","No","Yes","No","Yes","No",],
"DOXY": ["No","No","No","No","Yes","Yes","Yes","No","No","No","No","Yes","Yes","Yes",]})
combos = np.array([[1,1,0,0,0,0],[1,1,1,0,0,0],[1,0,1,1,0,0],[1,0,0,0,1,0],[0,0,1,1,0,1],[0,0,0,0,1,1],[0,0,0,1,0,1]])
df = df.replace("Yes",1)
df = df.replace("No",0)
c = []
for l in range(len(combos)):
c.extend(itertools.combinations(range(len(combos)),l))
all_combos = combos
for combo in c[1:]:
combined = np.sum(combos[combo,:],axis=0)
all_combos = np.vstack([all_combos,combined])

all_combos[all_combos!=0]=1
all_combos = np.unique(all_combos,axis=0)
combo_sum = all_combos.sum(axis=1)
all_combos[all_combos==0]=-1
new_df = df.dot(all_combos.transpose())
for i,x in enumerate(combo_sum):
new_df.loc[new_df[i]<x,i] = 0

new_df[new_df>0]=1
new_df["res"] = new_df.sum(axis=1)
new_df.loc[new_df.res>0,"res"] = True
new_df.loc[new_df.res==0,"res"] = False
df["res"] = new_df["res"]

AZITH  CLIN  CFTX  METRO  CFTN  DOXY    res

0 1 1 0 0 0 0 True
1 0 1 0 1 0 0 False
2 1 1 0 0 0 0 True
3 0 0 0 0 0 0 False
4 1 1 1 1 1 1 True
5 0 1 1 1 0 1 False
6 0 0 0 0 0 1 False
7 0 0 0 0 0 0 False
8 1 1 1 0 0 0 True
9 1 0 1 1 0 0 True
10 1 0 0 0 1 0 True
11 0 0 1 1 0 1 True
12 0 0 0 0 1 1 True
13 0 0 0 1 0 1 True

The general explanation of the code is that I create a numpy array of all combos including combinations of combos (sum of two or more combos) that are acceptable. I delete duplicate combinations and this is what is left

np.unique(all_combos,axis=0)
Out[38]:
array([[0, 0, 0, 0, 1, 1],
[0, 0, 0, 1, 0, 1],
[0, 0, 0, 1, 1, 1],
[0, 0, 1, 1, 0, 1],
[0, 0, 1, 1, 1, 1],
[1, 0, 0, 0, 1, 0],
[1, 0, 0, 0, 1, 1],
[1, 0, 0, 1, 1, 1],
[1, 0, 1, 1, 0, 0],
[1, 0, 1, 1, 0, 1],
[1, 0, 1, 1, 1, 0],
[1, 0, 1, 1, 1, 1],
[1, 1, 0, 0, 0, 0],
[1, 1, 0, 0, 1, 0],
[1, 1, 0, 0, 1, 1],
[1, 1, 0, 1, 0, 1],
[1, 1, 0, 1, 1, 1],
[1, 1, 1, 0, 0, 0],
[1, 1, 1, 0, 1, 0],
[1, 1, 1, 0, 1, 1],
[1, 1, 1, 1, 0, 0],
[1, 1, 1, 1, 0, 1],
[1, 1, 1, 1, 1, 0],
[1, 1, 1, 1, 1, 1]])


Any extra medications that are not part of the combo are penalized by setting the value to -1 in the combo list. (If extra medications are not to be penalized then the superset is not needed, you can just compare to a sum of the original combos variable.)
A dot product between the dataset and the set of all combos is then done and the value is compared against the sum of the combo (prior to replacing the 0s with -1s). This means if the value is 3 and the expected outcome of the combo is 3, then it is a valid combo. Below is the sum of the combos as an array

ipdb> combo_sum
array([2, 2, 3, 3, 4, 2, 3, 4, 3, 4, 4, 5, 2, 3, 4, 4, 5, 3, 4, 5, 4, 5,
5, 6])


ipdb> new_df
0 1 2 3 4 5 6 7 8 9 ... 14 15 16 17 18 19 20 21 22 23
0 -2 -2 -2 -2 -2 0 0 0 0 0 ... 2 2 2 2 2 2 2 2 2 2
1 -2 0 0 0 0 -2 -2 0 0 0 ... 0 2 2 0 0 0 2 2 2 2
2 -2 -2 -2 -2 -2 0 0 0 0 0 ... 2 2 2 2 2 2 2 2 2 2
3 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
4 -2 -2 0 0 2 -2 0 2 0 2 ... 2 2 4 0 2 4 2 4 4 6
5 -2 0 0 2 2 -4 -2 0 0 2 ... 0 2 2 0 0 2 2 4 2 4
6 1 1 1 1 1 -1 1 1 -1 1 ... 1 1 1 -1 -1 1 -1 1 -1 1
7 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
8 -3 -3 -3 -1 -1 -1 -1 -1 1 1 ... 1 1 1 3 3 3 3 3 3 3
9 -3 -1 -1 1 1 -1 -1 1 3 3 ... -1 1 1 1 1 1 3 3 3 3
10 0 -2 0 -2 0 2 2 2 0 0 ... 2 0 2 0 2 2 0 0 2 2
11 -1 1 1 3 3 -3 -1 1 1 3 ... -1 1 1 -1 -1 1 1 3 1 3
12 2 0 2 0 2 0 2 2 -2 0 ... 2 0 2 -2 0 2 -2 0 0 2
13 0 2 2 2 2 -2 0 2 0 2 ... 0 2 2 -2 -2 0 0 2 0 2

After the dot product, we replace the valid values with 1 and invalid (less than the expected sum) with 0. We sum on all the combinations to see if any are valid. If the sum of combinations &gt;= 1, then at least one combo was valid. Else, all were invalid.

ipdb> new_df
0 1 2 3 4 5 6 7 8 9 ... 15 16 17 18 19 20 21 22 23 res
0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 1
1 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 1
3 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 1 1
5 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0 0 0 ... 0 0 1 0 0 0 0 0 0 1
9 0 0 0 0 0 0 0 0 1 0 ... 0 0 0 0 0 0 0 0 0 1
10 0 0 0 0 0 1 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 1
11 0 0 0 1 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 1
12 1 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 1
13 0 1 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 1

Replace the final summed column with True or false and apply to original dataframe.
</details>

huangapple
  • 本文由 发表于 2023年2月18日 14:50:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/75491685.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定