英文:
Selecting Item from One table and Iterate in another table to see if It exists and Add a column Label
问题
以下是翻译好的部分:
import pandas as pd
数据帧A:
data = {
"Identity": ["A", "B", "C", "D", "E", "F", "X", "Y", "Z"],
"Last Purchasing Date": ["20201224", "20220418", "20230312", "20230414", "20230618", "20230417", "20230417", "20230417", "20230416"],
"Old Package": ["Platinum", "Gold", "Bronze", "Red", "Green", "Bronze", "Bronze", "Bronze", "Bronze"],
"Country": ["Ghana", "Ghana", "Kenya", "Mozambique", "Astria", "Australia", "Egypt", "South Africa", "Uganda"],
"Price_USD": [50, 30, 20, 15, 10, 20, 20, 20, 20],
"Label": ["No", "No", "Yes", "No", "No", "No", "Yes", "yes", "Yes"],
"New Package": ["Platinum", "Gold", "Gold", "Red", "None", "None", "None", "None", "None"]
}
df = pd.DataFrame(data, columns=["Identity", "Last Purchasing Date", "Old Package", "Country", "Price_USD", "Label", "New Package"])
print(df.to_string())
数据帧B:
data = {
"Identity": ["X", "Y", "Z", "C", "oi", "po", "as", "vvc", "mn", "kml", "oiu"],
"Last Purchasing Date": ["20230510", "20230630", "20230701", "20230524", "20230618", "20230103", "20230709", "20230323", "20230222", "20230613", "20230629"],
"Package Name": ["Platinum", "Gold", "Gold", "Red", "Green", "Platinum", "Gold", "Platinum", "Gold", "Red", "Red"],
"Country": ["Egypt", "South Africa", "Uganda", "Kenya", "Astria", "Australia", "Egypt", "South Africa", "Uganda", "Tanzania", "Zimbabwe"],
"Price_USD": [50, 30, 30, 20, 10, 50, 30, 50, 30, 15, 15],
"TransactionID": ["xxcxcjjjkhsdgkkits", "uyerygbfjhyutrev", "hjvfbjhsbdfqwoierb", "ureybjsdfskmncxy", "qwqtvjdbcjapiev", "ttccljqoeuhadl", "lkjkfnksfuhiyewl", "yeuwtevjfdsfawqwutvssl", "qwiqeubkdqweoipmn", "ieyrjbsdfkbkqwpeoi", "poierbsdjfbdflioewww"]
}
df = pd.DataFrame(data, columns=["Identity", "Last Purchasing Date", "Package Name", "Country", "Price_USD", "TransactionID"])
print(df.to_string())
请注意,由于代码部分不需要翻译,因此我将保留原始的代码不作更改。
英文:
I am having two Data frames, Let's say Data Frame A and Data Frame B:
Data Frame A has a list of Customers who used to Purchase Product X call it Bronze before it has been stopped let us say on 10th April 2023, and Data Frame B has a list of Customers who continue to purchase the product after 10th April 2023, now, what I want is to see customers from Table A, who continue to engage (purchase) the product, and the new product they are associated with.
I know there are methods like 'np.where' but I would like to use Iteration, as in taking Identity from table A, and Iterate over table B to see if it is available and label table A.
DataFrame A:
data = {
"Identity": ["A", "B", "C", "D", "E", "F", "X", "Y", "Z"],
"Last Purchasing Date": ["20201224", "20220418", "20230312", "20230414", "20230618", "20230417", "20230417", "20230417", "20230416"],
"Package Name": ["Platinum", "Gold", "Bronze", "Red", "Green", "Bronze", "Bronze", "Bronze", "Bronze"],
"Country": ["Ghana", "Ghana", "Kenya", "Mozambique", "Astria", "Australia", "Egypt", "South Africa", "Uganda"],
"Price_USD": [50, 30, 20, 15, 10, 20, 20, 20, 20],
"TransactionID": ["xxcxcjjjkhsdg", "uyerygbfjh", "hjvfbjhsbdf", "ureybjsdfsk", "qwqtvjdbcj", "pioerybhjb", "lkjkfnksfuh", "yeuwtevjfdsf", "qwiqeubkd"]
}
df = pd.DataFrame(data)
print(df)
DataFrame B:
import pandas as pd
data = {
"Identity": ["X", "Y", "Z", "C", "oi", "po", "as", "vvc", "mn", "kml", "oiu"],
"Last Purchasing Date": ["20230510", "20230630", "20230701", "20230524", "20230618", "20230103", "20230709", "20230323", "20230222", "20230613", "20230629"],
"Package Name": ["Platinum", "Gold", "Gold", "Red", "Green", "Platinum", "Gold", "Platinum", "Gold", "Red", "Red"],
"Country": ["Egypt", "South Africa", "Uganda", "Kenya", "Astria", "Australia", "Egypt", "South Africa", "Uganda", "Tanzania", "Zimbabwe"],
"Price_USD": [50, 30, 30, 20, 10, 50, 30, 50, 30, 15, 15],
"TransactionID": ["xxcxcjjjkhsdgkkits", "uyerygbfjhyutrev", "hjvfbjhsbdfqwoierb", "ureybjsdfskmncxy", "qwqtvjdbcjapiev", "ttccljqoeuhadl", "lkjkfnksfuhiyewl", "yeuwtevjfdsfawqwutvssl", "qwiqeubkdqweoipmn", "ieyrjbsdfkbkqwpeoi", "poierbsdjfbdflioewww"]
}
df = pd.DataFrame(data, columns=["Identity", "Last Purchasing Date", "Package Name", "Country", "Price_USD", "TransactionID"])
print(df.to_string())
Desired Output:
import pandas as pd
data = {
"Identity": ["A", "B", "C", "D", "E", "F", "X", "Y", "Z"],
"Last Purchasing Date": ["20201224", "20220418", "20230312", "20230414", "20230618", "20230417", "20230417", "20230417", "20230416"],
"Old Package": ["Platinum", "Gold", "Bronze", "Red", "Green", "Bronze", "Bronze", "Bronze", "Bronze"],
"Country": ["Ghana", "Ghana", "Kenya", "Mozambique", "Astria", "Australia", "Egypt", "South Africa", "Uganda"],
"Price_USD": [50, 30, 20, 15, 10, 20, 20, 20, 20],
"Label": ["No", "No", "Yes", "No", "No", "No", "Yes", "yes", "Yes"],
"New Package": ["Platinum", "Gold", "Gold", "Red", "None", "None", "None", "None", "None"]
}
df = pd.DataFrame(data, columns=["Identity", "Last Purchasing Date", "Old Package", "Country", "Price_USD", "Label", "New Package"])
print(df.to_string())
答案1
得分: 1
提供的输出不够清晰,但根据您的逻辑,您需要使用 merge
函数:
out = (dfA
.merge(dfB.loc[dfB['Last Purchasing Date'].ge('20230410'),
['Identity', 'Package Name']]
.rename(columns={'Package Name': 'New Package'}),
on='Identity', how='left')
.assign(Label=lambda d: np.where(d['New Package'].notna(), 'Yes', 'No'))
)
输出结果:
Identity Last Purchasing Date Package Name Country Price_USD TransactionID New Package Label
0 A 20201224 Platinum Ghana 50 xxcxcjjjkhsdg NaN No
1 B 20220418 Gold Ghana 30 uyerygbfjh NaN No
2 C 20230312 Bronze Kenya 20 hjvfbjhsbdf Red Yes
3 D 20230414 Red Mozambique 15 ureybjsdfsk NaN No
4 E 20230618 Green Astria 10 qwqtvjdbcj NaN No
5 F 20230417 Bronze Australia 20 pioerybhjb NaN No
6 X 20230417 Bronze Egypt 20 lkjkfnksfuh Platinum Yes
7 Y 20230417 Bronze South Africa 20 yeuwtevjfdsf Gold Yes
8 Z 20230416 Bronze Uganda 20 qwiqeubkd Gold Yes
英文:
The provided output is unclear, but given the logic you need a merge
:
out = (dfA
.merge(dfB.loc[dfB['Last Purchasing Date'].ge('20230410'),
['Identity', 'Package Name']]
.rename(columns={'Package Name': 'New Package'}),
on='Identity', how='left')
.assign(Label=lambda d: np.where(d['New Package'].notna(), 'Yes', 'No'))
)
Output:
Identity Last Purchasing Date Package Name Country Price_USD TransactionID New Package Label
0 A 20201224 Platinum Ghana 50 xxcxcjjjkhsdg NaN No
1 B 20220418 Gold Ghana 30 uyerygbfjh NaN No
2 C 20230312 Bronze Kenya 20 hjvfbjhsbdf Red Yes
3 D 20230414 Red Mozambique 15 ureybjsdfsk NaN No
4 E 20230618 Green Astria 10 qwqtvjdbcj NaN No
5 F 20230417 Bronze Australia 20 pioerybhjb NaN No
6 X 20230417 Bronze Egypt 20 lkjkfnksfuh Platinum Yes
7 Y 20230417 Bronze South Africa 20 yeuwtevjfdsf Gold Yes
8 Z 20230416 Bronze Uganda 20 qwiqeubkd Gold Yes
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论