2023年7月10日 19:16:11go评论92阅读模式

英文:

Selecting Item from One table and Iterate in another table to see if It exists and Add a column Label

问题

以下是翻译好的部分：

import pandas as pd
数据帧A：
data = {
    "Identity": ["A", "B", "C", "D", "E", "F", "X", "Y", "Z"],
    "Last Purchasing Date": ["20201224", "20220418", "20230312", "20230414", "20230618", "20230417", "20230417", "20230417", "20230416"],
    "Old Package": ["Platinum", "Gold", "Bronze", "Red", "Green", "Bronze", "Bronze", "Bronze", "Bronze"],
    "Country": ["Ghana", "Ghana", "Kenya", "Mozambique", "Astria", "Australia", "Egypt", "South Africa", "Uganda"],
    "Price_USD": [50, 30, 20, 15, 10, 20, 20, 20, 20],
    "Label": ["No", "No", "Yes", "No", "No", "No", "Yes", "yes", "Yes"],
    "New Package": ["Platinum", "Gold", "Gold", "Red", "None", "None", "None", "None", "None"]
}
df = pd.DataFrame(data, columns=["Identity", "Last Purchasing Date", "Old Package", "Country", "Price_USD", "Label", "New Package"])
print(df.to_string())
数据帧B：
data = {
    "Identity": ["X", "Y", "Z", "C", "oi", "po", "as", "vvc", "mn", "kml", "oiu"],
    "Last Purchasing Date": ["20230510", "20230630", "20230701", "20230524", "20230618", "20230103", "20230709", "20230323", "20230222", "20230613", "20230629"],
    "Package Name": ["Platinum", "Gold", "Gold", "Red", "Green", "Platinum", "Gold", "Platinum", "Gold", "Red", "Red"],
    "Country": ["Egypt", "South Africa", "Uganda", "Kenya", "Astria", "Australia", "Egypt", "South Africa", "Uganda", "Tanzania", "Zimbabwe"],
    "Price_USD": [50, 30, 30, 20, 10, 50, 30, 50, 30, 15, 15],
    "TransactionID": ["xxcxcjjjkhsdgkkits", "uyerygbfjhyutrev", "hjvfbjhsbdfqwoierb", "ureybjsdfskmncxy", "qwqtvjdbcjapiev", "ttccljqoeuhadl", "lkjkfnksfuhiyewl", "yeuwtevjfdsfawqwutvssl", "qwiqeubkdqweoipmn", "ieyrjbsdfkbkqwpeoi", "poierbsdjfbdflioewww"]
}
df = pd.DataFrame(data, columns=["Identity", "Last Purchasing Date", "Package Name", "Country", "Price_USD", "TransactionID"])
print(df.to_string())

请注意，由于代码部分不需要翻译，因此我将保留原始的代码不作更改。

英文:

I am having two Data frames, Let's say Data Frame A and Data Frame B:
Data Frame A has a list of Customers who used to Purchase Product X call it Bronze before it has been stopped let us say on 10th April 2023, and Data Frame B has a list of Customers who continue to purchase the product after 10th April 2023, now, what I want is to see customers from Table A, who continue to engage (purchase) the product, and the new product they are associated with.

I know there are methods like 'np.where' but I would like to use Iteration, as in taking Identity from table A, and Iterate over table B to see if it is available and label table A.

DataFrame A:

data = {
&quot;Identity&quot;: [&quot;A&quot;, &quot;B&quot;, &quot;C&quot;, &quot;D&quot;, &quot;E&quot;, &quot;F&quot;, &quot;X&quot;, &quot;Y&quot;, &quot;Z&quot;],
&quot;Last Purchasing Date&quot;: [&quot;20201224&quot;, &quot;20220418&quot;, &quot;20230312&quot;, &quot;20230414&quot;, &quot;20230618&quot;, &quot;20230417&quot;,       &quot;20230417&quot;, &quot;20230417&quot;, &quot;20230416&quot;],
&quot;Package Name&quot;: [&quot;Platinum&quot;, &quot;Gold&quot;, &quot;Bronze&quot;, &quot;Red&quot;, &quot;Green&quot;, &quot;Bronze&quot;, &quot;Bronze&quot;, &quot;Bronze&quot;, &quot;Bronze&quot;],
&quot;Country&quot;: [&quot;Ghana&quot;, &quot;Ghana&quot;, &quot;Kenya&quot;, &quot;Mozambique&quot;, &quot;Astria&quot;, &quot;Australia&quot;, &quot;Egypt&quot;, &quot;South Africa&quot;, &quot;Uganda&quot;],
&quot;Price_USD&quot;: [50, 30, 20, 15, 10, 20, 20, 20, 20],
&quot;TransactionID&quot;: [&quot;xxcxcjjjkhsdg&quot;, &quot;uyerygbfjh&quot;, &quot;hjvfbjhsbdf&quot;, &quot;ureybjsdfsk&quot;, &quot;qwqtvjdbcj&quot;, &quot;pioerybhjb&quot;, &quot;lkjkfnksfuh&quot;, &quot;yeuwtevjfdsf&quot;, &quot;qwiqeubkd&quot;]
}
df = pd.DataFrame(data)
print(df)

DataFrame B:

import pandas as pd
data = {
&quot;Identity&quot;: [&quot;X&quot;, &quot;Y&quot;, &quot;Z&quot;, &quot;C&quot;, &quot;oi&quot;, &quot;po&quot;, &quot;as&quot;, &quot;vvc&quot;, &quot;mn&quot;, &quot;kml&quot;, &quot;oiu&quot;],
&quot;Last Purchasing Date&quot;: [&quot;20230510&quot;, &quot;20230630&quot;, &quot;20230701&quot;, &quot;20230524&quot;, &quot;20230618&quot;, &quot;20230103&quot;, &quot;20230709&quot;, &quot;20230323&quot;, &quot;20230222&quot;, &quot;20230613&quot;, &quot;20230629&quot;],
&quot;Package Name&quot;: [&quot;Platinum&quot;, &quot;Gold&quot;, &quot;Gold&quot;, &quot;Red&quot;, &quot;Green&quot;, &quot;Platinum&quot;, &quot;Gold&quot;, &quot;Platinum&quot;, &quot;Gold&quot;, &quot;Red&quot;, &quot;Red&quot;],
&quot;Country&quot;: [&quot;Egypt&quot;, &quot;South Africa&quot;, &quot;Uganda&quot;, &quot;Kenya&quot;, &quot;Astria&quot;, &quot;Australia&quot;, &quot;Egypt&quot;, &quot;South Africa&quot;, &quot;Uganda&quot;, &quot;Tanzania&quot;, &quot;Zimbabwe&quot;],
&quot;Price_USD&quot;: [50, 30, 30, 20, 10, 50, 30, 50, 30, 15, 15],
&quot;TransactionID&quot;: [&quot;xxcxcjjjkhsdgkkits&quot;, &quot;uyerygbfjhyutrev&quot;, &quot;hjvfbjhsbdfqwoierb&quot;, &quot;ureybjsdfskmncxy&quot;, &quot;qwqtvjdbcjapiev&quot;, &quot;ttccljqoeuhadl&quot;, &quot;lkjkfnksfuhiyewl&quot;, &quot;yeuwtevjfdsfawqwutvssl&quot;, &quot;qwiqeubkdqweoipmn&quot;, &quot;ieyrjbsdfkbkqwpeoi&quot;, &quot;poierbsdjfbdflioewww&quot;]
}
df = pd.DataFrame(data, columns=[&quot;Identity&quot;, &quot;Last Purchasing Date&quot;, &quot;Package Name&quot;, &quot;Country&quot;, &quot;Price_USD&quot;, &quot;TransactionID&quot;])
print(df.to_string())

Desired Output:

import pandas as pd
data = {
&quot;Identity&quot;: [&quot;A&quot;, &quot;B&quot;, &quot;C&quot;, &quot;D&quot;, &quot;E&quot;, &quot;F&quot;, &quot;X&quot;, &quot;Y&quot;, &quot;Z&quot;],
&quot;Last Purchasing Date&quot;: [&quot;20201224&quot;, &quot;20220418&quot;, &quot;20230312&quot;, &quot;20230414&quot;, &quot;20230618&quot;, &quot;20230417&quot;, &quot;20230417&quot;, &quot;20230417&quot;, &quot;20230416&quot;],
&quot;Old Package&quot;: [&quot;Platinum&quot;, &quot;Gold&quot;, &quot;Bronze&quot;, &quot;Red&quot;, &quot;Green&quot;, &quot;Bronze&quot;, &quot;Bronze&quot;, &quot;Bronze&quot;, &quot;Bronze&quot;],
&quot;Country&quot;: [&quot;Ghana&quot;, &quot;Ghana&quot;, &quot;Kenya&quot;, &quot;Mozambique&quot;, &quot;Astria&quot;, &quot;Australia&quot;, &quot;Egypt&quot;, &quot;South Africa&quot;, &quot;Uganda&quot;],
&quot;Price_USD&quot;: [50, 30, 20, 15, 10, 20, 20, 20, 20],
&quot;Label&quot;: [&quot;No&quot;, &quot;No&quot;, &quot;Yes&quot;, &quot;No&quot;, &quot;No&quot;, &quot;No&quot;, &quot;Yes&quot;, &quot;yes&quot;, &quot;Yes&quot;],
&quot;New Package&quot;: [&quot;Platinum&quot;, &quot;Gold&quot;, &quot;Gold&quot;, &quot;Red&quot;, &quot;None&quot;, &quot;None&quot;, &quot;None&quot;, &quot;None&quot;, &quot;None&quot;]
}
df = pd.DataFrame(data, columns=[&quot;Identity&quot;, &quot;Last Purchasing Date&quot;, &quot;Old Package&quot;, &quot;Country&quot;, &quot;Price_USD&quot;, &quot;Label&quot;, &quot;New Package&quot;])
print(df.to_string())

答案1

得分: 1

提供的输出不够清晰，但根据您的逻辑，您需要使用 merge 函数：

out = (dfA
    .merge(dfB.loc[dfB['Last Purchasing Date'].ge('20230410'),
                   ['Identity', 'Package Name']]
              .rename(columns={'Package Name': 'New Package'}),
           on='Identity', how='left')
    .assign(Label=lambda d: np.where(d['New Package'].notna(), 'Yes', 'No'))
)

输出结果：

  Identity Last Purchasing Date Package Name       Country  Price_USD  TransactionID New Package Label
0        A             20201224     Platinum         Ghana         50  xxcxcjjjkhsdg         NaN    No
1        B             20220418         Gold         Ghana         30     uyerygbfjh         NaN    No
2        C             20230312       Bronze         Kenya         20    hjvfbjhsbdf         Red   Yes
3        D             20230414          Red    Mozambique         15    ureybjsdfsk         NaN    No
4        E             20230618        Green        Astria         10     qwqtvjdbcj         NaN    No
5        F             20230417       Bronze     Australia         20     pioerybhjb         NaN    No
6        X             20230417       Bronze         Egypt         20    lkjkfnksfuh    Platinum   Yes
7        Y             20230417       Bronze  South Africa         20   yeuwtevjfdsf        Gold   Yes
8        Z             20230416       Bronze        Uganda         20      qwiqeubkd        Gold   Yes

英文:

The provided output is unclear, but given the logic you need a merge:

out = (dfA
.merge(dfB.loc[dfB[&#39;Last Purchasing Date&#39;].ge(&#39;20230410&#39;),
[&#39;Identity&#39;, &#39;Package Name&#39;]]
.rename(columns={&#39;Package Name&#39;: &#39;New Package&#39;}),
on=&#39;Identity&#39;, how=&#39;left&#39;)
.assign(Label=lambda d: np.where(d[&#39;New Package&#39;].notna(), &#39;Yes&#39;, &#39;No&#39;))
)

Output:

  Identity Last Purchasing Date Package Name       Country  Price_USD  TransactionID New Package Label
0        A             20201224     Platinum         Ghana         50  xxcxcjjjkhsdg         NaN    No
1        B             20220418         Gold         Ghana         30     uyerygbfjh         NaN    No
2        C             20230312       Bronze         Kenya         20    hjvfbjhsbdf         Red   Yes
3        D             20230414          Red    Mozambique         15    ureybjsdfsk         NaN    No
4        E             20230618        Green        Astria         10     qwqtvjdbcj         NaN    No
5        F             20230417       Bronze     Australia         20     pioerybhjb         NaN    No
6        X             20230417       Bronze         Egypt         20    lkjkfnksfuh    Platinum   Yes
7        Y             20230417       Bronze  South Africa         20   yeuwtevjfdsf        Gold   Yes
8        Z             20230416       Bronze        Uganda         20      qwiqeubkd        Gold   Yes

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Selecting Item from One table and Iterate in another table to see if It exists and Add a column Label

问题

答案1

在Julia数据框中查找多列中包含子字符串的行。

改变 Python 模块的名称并保持向后兼容性的最佳实践是什么？

如何使用Selenium从亚马逊网站获取价格。

使用分组的字符串索引拆分数组

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。