Selecting Item from One table and Iterate in another table to see if It exists and Add a column Label

huangapple go评论60阅读模式
英文:

Selecting Item from One table and Iterate in another table to see if It exists and Add a column Label

问题

以下是翻译好的部分:

import pandas as pd

数据帧A

data = {
    "Identity": ["A", "B", "C", "D", "E", "F", "X", "Y", "Z"],
    "Last Purchasing Date": ["20201224", "20220418", "20230312", "20230414", "20230618", "20230417", "20230417", "20230417", "20230416"],
    "Old Package": ["Platinum", "Gold", "Bronze", "Red", "Green", "Bronze", "Bronze", "Bronze", "Bronze"],
    "Country": ["Ghana", "Ghana", "Kenya", "Mozambique", "Astria", "Australia", "Egypt", "South Africa", "Uganda"],
    "Price_USD": [50, 30, 20, 15, 10, 20, 20, 20, 20],
    "Label": ["No", "No", "Yes", "No", "No", "No", "Yes", "yes", "Yes"],
    "New Package": ["Platinum", "Gold", "Gold", "Red", "None", "None", "None", "None", "None"]
}

df = pd.DataFrame(data, columns=["Identity", "Last Purchasing Date", "Old Package", "Country", "Price_USD", "Label", "New Package"])

print(df.to_string())

数据帧B

data = {
    "Identity": ["X", "Y", "Z", "C", "oi", "po", "as", "vvc", "mn", "kml", "oiu"],
    "Last Purchasing Date": ["20230510", "20230630", "20230701", "20230524", "20230618", "20230103", "20230709", "20230323", "20230222", "20230613", "20230629"],
    "Package Name": ["Platinum", "Gold", "Gold", "Red", "Green", "Platinum", "Gold", "Platinum", "Gold", "Red", "Red"],
    "Country": ["Egypt", "South Africa", "Uganda", "Kenya", "Astria", "Australia", "Egypt", "South Africa", "Uganda", "Tanzania", "Zimbabwe"],
    "Price_USD": [50, 30, 30, 20, 10, 50, 30, 50, 30, 15, 15],
    "TransactionID": ["xxcxcjjjkhsdgkkits", "uyerygbfjhyutrev", "hjvfbjhsbdfqwoierb", "ureybjsdfskmncxy", "qwqtvjdbcjapiev", "ttccljqoeuhadl", "lkjkfnksfuhiyewl", "yeuwtevjfdsfawqwutvssl", "qwiqeubkdqweoipmn", "ieyrjbsdfkbkqwpeoi", "poierbsdjfbdflioewww"]
}

df = pd.DataFrame(data, columns=["Identity", "Last Purchasing Date", "Package Name", "Country", "Price_USD", "TransactionID"])

print(df.to_string())

请注意,由于代码部分不需要翻译,因此我将保留原始的代码不作更改。

英文:

I am having two Data frames, Let's say Data Frame A and Data Frame B:
Data Frame A has a list of Customers who used to Purchase Product X call it Bronze before it has been stopped let us say on 10th April 2023, and Data Frame B has a list of Customers who continue to purchase the product after 10th April 2023, now, what I want is to see customers from Table A, who continue to engage (purchase) the product, and the new product they are associated with.

I know there are methods like 'np.where' but I would like to use Iteration, as in taking Identity from table A, and Iterate over table B to see if it is available and label table A.

DataFrame A:

data = {
"Identity": ["A", "B", "C", "D", "E", "F", "X", "Y", "Z"],
"Last Purchasing Date": ["20201224", "20220418", "20230312", "20230414", "20230618", "20230417",       "20230417", "20230417", "20230416"],
"Package Name": ["Platinum", "Gold", "Bronze", "Red", "Green", "Bronze", "Bronze", "Bronze", "Bronze"],
"Country": ["Ghana", "Ghana", "Kenya", "Mozambique", "Astria", "Australia", "Egypt", "South Africa", "Uganda"],
"Price_USD": [50, 30, 20, 15, 10, 20, 20, 20, 20],
"TransactionID": ["xxcxcjjjkhsdg", "uyerygbfjh", "hjvfbjhsbdf", "ureybjsdfsk", "qwqtvjdbcj", "pioerybhjb", "lkjkfnksfuh", "yeuwtevjfdsf", "qwiqeubkd"]
}
df = pd.DataFrame(data)
print(df)

DataFrame B:

import pandas as pd
data = {
"Identity": ["X", "Y", "Z", "C", "oi", "po", "as", "vvc", "mn", "kml", "oiu"],
"Last Purchasing Date": ["20230510", "20230630", "20230701", "20230524", "20230618", "20230103", "20230709", "20230323", "20230222", "20230613", "20230629"],
"Package Name": ["Platinum", "Gold", "Gold", "Red", "Green", "Platinum", "Gold", "Platinum", "Gold", "Red", "Red"],
"Country": ["Egypt", "South Africa", "Uganda", "Kenya", "Astria", "Australia", "Egypt", "South Africa", "Uganda", "Tanzania", "Zimbabwe"],
"Price_USD": [50, 30, 30, 20, 10, 50, 30, 50, 30, 15, 15],
"TransactionID": ["xxcxcjjjkhsdgkkits", "uyerygbfjhyutrev", "hjvfbjhsbdfqwoierb", "ureybjsdfskmncxy", "qwqtvjdbcjapiev", "ttccljqoeuhadl", "lkjkfnksfuhiyewl", "yeuwtevjfdsfawqwutvssl", "qwiqeubkdqweoipmn", "ieyrjbsdfkbkqwpeoi", "poierbsdjfbdflioewww"]
}
df = pd.DataFrame(data, columns=["Identity", "Last Purchasing Date", "Package Name", "Country", "Price_USD", "TransactionID"])
print(df.to_string())

Desired Output:

import pandas as pd
data = {
"Identity": ["A", "B", "C", "D", "E", "F", "X", "Y", "Z"],
"Last Purchasing Date": ["20201224", "20220418", "20230312", "20230414", "20230618", "20230417", "20230417", "20230417", "20230416"],
"Old Package": ["Platinum", "Gold", "Bronze", "Red", "Green", "Bronze", "Bronze", "Bronze", "Bronze"],
"Country": ["Ghana", "Ghana", "Kenya", "Mozambique", "Astria", "Australia", "Egypt", "South Africa", "Uganda"],
"Price_USD": [50, 30, 20, 15, 10, 20, 20, 20, 20],
"Label": ["No", "No", "Yes", "No", "No", "No", "Yes", "yes", "Yes"],
"New Package": ["Platinum", "Gold", "Gold", "Red", "None", "None", "None", "None", "None"]
}
df = pd.DataFrame(data, columns=["Identity", "Last Purchasing Date", "Old Package", "Country", "Price_USD", "Label", "New Package"])
print(df.to_string())

答案1

得分: 1

提供的输出不够清晰,但根据您的逻辑,您需要使用 merge 函数:

out = (dfA
    .merge(dfB.loc[dfB['Last Purchasing Date'].ge('20230410'),
                   ['Identity', 'Package Name']]
              .rename(columns={'Package Name': 'New Package'}),
           on='Identity', how='left')
    .assign(Label=lambda d: np.where(d['New Package'].notna(), 'Yes', 'No'))
)

输出结果:

  Identity Last Purchasing Date Package Name       Country  Price_USD  TransactionID New Package Label
0        A             20201224     Platinum         Ghana         50  xxcxcjjjkhsdg         NaN    No
1        B             20220418         Gold         Ghana         30     uyerygbfjh         NaN    No
2        C             20230312       Bronze         Kenya         20    hjvfbjhsbdf         Red   Yes
3        D             20230414          Red    Mozambique         15    ureybjsdfsk         NaN    No
4        E             20230618        Green        Astria         10     qwqtvjdbcj         NaN    No
5        F             20230417       Bronze     Australia         20     pioerybhjb         NaN    No
6        X             20230417       Bronze         Egypt         20    lkjkfnksfuh    Platinum   Yes
7        Y             20230417       Bronze  South Africa         20   yeuwtevjfdsf        Gold   Yes
8        Z             20230416       Bronze        Uganda         20      qwiqeubkd        Gold   Yes
英文:

The provided output is unclear, but given the logic you need a merge:

out = (dfA
.merge(dfB.loc[dfB['Last Purchasing Date'].ge('20230410'),
['Identity', 'Package Name']]
.rename(columns={'Package Name': 'New Package'}),
on='Identity', how='left')
.assign(Label=lambda d: np.where(d['New Package'].notna(), 'Yes', 'No'))
)

Output:

  Identity Last Purchasing Date Package Name       Country  Price_USD  TransactionID New Package Label
0        A             20201224     Platinum         Ghana         50  xxcxcjjjkhsdg         NaN    No
1        B             20220418         Gold         Ghana         30     uyerygbfjh         NaN    No
2        C             20230312       Bronze         Kenya         20    hjvfbjhsbdf         Red   Yes
3        D             20230414          Red    Mozambique         15    ureybjsdfsk         NaN    No
4        E             20230618        Green        Astria         10     qwqtvjdbcj         NaN    No
5        F             20230417       Bronze     Australia         20     pioerybhjb         NaN    No
6        X             20230417       Bronze         Egypt         20    lkjkfnksfuh    Platinum   Yes
7        Y             20230417       Bronze  South Africa         20   yeuwtevjfdsf        Gold   Yes
8        Z             20230416       Bronze        Uganda         20      qwiqeubkd        Gold   Yes

huangapple
  • 本文由 发表于 2023年7月10日 19:16:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/76653178.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定