Selecting Item from One table and Iterate in another table to see if It exists and Add a column Label

huangapple go评论92阅读模式
英文:

Selecting Item from One table and Iterate in another table to see if It exists and Add a column Label

问题

以下是翻译好的部分:

  1. import pandas as pd
  2. 数据帧A
  3. data = {
  4. "Identity": ["A", "B", "C", "D", "E", "F", "X", "Y", "Z"],
  5. "Last Purchasing Date": ["20201224", "20220418", "20230312", "20230414", "20230618", "20230417", "20230417", "20230417", "20230416"],
  6. "Old Package": ["Platinum", "Gold", "Bronze", "Red", "Green", "Bronze", "Bronze", "Bronze", "Bronze"],
  7. "Country": ["Ghana", "Ghana", "Kenya", "Mozambique", "Astria", "Australia", "Egypt", "South Africa", "Uganda"],
  8. "Price_USD": [50, 30, 20, 15, 10, 20, 20, 20, 20],
  9. "Label": ["No", "No", "Yes", "No", "No", "No", "Yes", "yes", "Yes"],
  10. "New Package": ["Platinum", "Gold", "Gold", "Red", "None", "None", "None", "None", "None"]
  11. }
  12. df = pd.DataFrame(data, columns=["Identity", "Last Purchasing Date", "Old Package", "Country", "Price_USD", "Label", "New Package"])
  13. print(df.to_string())
  14. 数据帧B
  15. data = {
  16. "Identity": ["X", "Y", "Z", "C", "oi", "po", "as", "vvc", "mn", "kml", "oiu"],
  17. "Last Purchasing Date": ["20230510", "20230630", "20230701", "20230524", "20230618", "20230103", "20230709", "20230323", "20230222", "20230613", "20230629"],
  18. "Package Name": ["Platinum", "Gold", "Gold", "Red", "Green", "Platinum", "Gold", "Platinum", "Gold", "Red", "Red"],
  19. "Country": ["Egypt", "South Africa", "Uganda", "Kenya", "Astria", "Australia", "Egypt", "South Africa", "Uganda", "Tanzania", "Zimbabwe"],
  20. "Price_USD": [50, 30, 30, 20, 10, 50, 30, 50, 30, 15, 15],
  21. "TransactionID": ["xxcxcjjjkhsdgkkits", "uyerygbfjhyutrev", "hjvfbjhsbdfqwoierb", "ureybjsdfskmncxy", "qwqtvjdbcjapiev", "ttccljqoeuhadl", "lkjkfnksfuhiyewl", "yeuwtevjfdsfawqwutvssl", "qwiqeubkdqweoipmn", "ieyrjbsdfkbkqwpeoi", "poierbsdjfbdflioewww"]
  22. }
  23. df = pd.DataFrame(data, columns=["Identity", "Last Purchasing Date", "Package Name", "Country", "Price_USD", "TransactionID"])
  24. print(df.to_string())

请注意,由于代码部分不需要翻译,因此我将保留原始的代码不作更改。

英文:

I am having two Data frames, Let's say Data Frame A and Data Frame B:
Data Frame A has a list of Customers who used to Purchase Product X call it Bronze before it has been stopped let us say on 10th April 2023, and Data Frame B has a list of Customers who continue to purchase the product after 10th April 2023, now, what I want is to see customers from Table A, who continue to engage (purchase) the product, and the new product they are associated with.

I know there are methods like 'np.where' but I would like to use Iteration, as in taking Identity from table A, and Iterate over table B to see if it is available and label table A.

DataFrame A:

  1. data = {
  2. "Identity": ["A", "B", "C", "D", "E", "F", "X", "Y", "Z"],
  3. "Last Purchasing Date": ["20201224", "20220418", "20230312", "20230414", "20230618", "20230417", "20230417", "20230417", "20230416"],
  4. "Package Name": ["Platinum", "Gold", "Bronze", "Red", "Green", "Bronze", "Bronze", "Bronze", "Bronze"],
  5. "Country": ["Ghana", "Ghana", "Kenya", "Mozambique", "Astria", "Australia", "Egypt", "South Africa", "Uganda"],
  6. "Price_USD": [50, 30, 20, 15, 10, 20, 20, 20, 20],
  7. "TransactionID": ["xxcxcjjjkhsdg", "uyerygbfjh", "hjvfbjhsbdf", "ureybjsdfsk", "qwqtvjdbcj", "pioerybhjb", "lkjkfnksfuh", "yeuwtevjfdsf", "qwiqeubkd"]
  8. }
  9. df = pd.DataFrame(data)
  10. print(df)

DataFrame B:

  1. import pandas as pd
  2. data = {
  3. "Identity": ["X", "Y", "Z", "C", "oi", "po", "as", "vvc", "mn", "kml", "oiu"],
  4. "Last Purchasing Date": ["20230510", "20230630", "20230701", "20230524", "20230618", "20230103", "20230709", "20230323", "20230222", "20230613", "20230629"],
  5. "Package Name": ["Platinum", "Gold", "Gold", "Red", "Green", "Platinum", "Gold", "Platinum", "Gold", "Red", "Red"],
  6. "Country": ["Egypt", "South Africa", "Uganda", "Kenya", "Astria", "Australia", "Egypt", "South Africa", "Uganda", "Tanzania", "Zimbabwe"],
  7. "Price_USD": [50, 30, 30, 20, 10, 50, 30, 50, 30, 15, 15],
  8. "TransactionID": ["xxcxcjjjkhsdgkkits", "uyerygbfjhyutrev", "hjvfbjhsbdfqwoierb", "ureybjsdfskmncxy", "qwqtvjdbcjapiev", "ttccljqoeuhadl", "lkjkfnksfuhiyewl", "yeuwtevjfdsfawqwutvssl", "qwiqeubkdqweoipmn", "ieyrjbsdfkbkqwpeoi", "poierbsdjfbdflioewww"]
  9. }
  10. df = pd.DataFrame(data, columns=["Identity", "Last Purchasing Date", "Package Name", "Country", "Price_USD", "TransactionID"])
  11. print(df.to_string())

Desired Output:

  1. import pandas as pd
  2. data = {
  3. "Identity": ["A", "B", "C", "D", "E", "F", "X", "Y", "Z"],
  4. "Last Purchasing Date": ["20201224", "20220418", "20230312", "20230414", "20230618", "20230417", "20230417", "20230417", "20230416"],
  5. "Old Package": ["Platinum", "Gold", "Bronze", "Red", "Green", "Bronze", "Bronze", "Bronze", "Bronze"],
  6. "Country": ["Ghana", "Ghana", "Kenya", "Mozambique", "Astria", "Australia", "Egypt", "South Africa", "Uganda"],
  7. "Price_USD": [50, 30, 20, 15, 10, 20, 20, 20, 20],
  8. "Label": ["No", "No", "Yes", "No", "No", "No", "Yes", "yes", "Yes"],
  9. "New Package": ["Platinum", "Gold", "Gold", "Red", "None", "None", "None", "None", "None"]
  10. }
  11. df = pd.DataFrame(data, columns=["Identity", "Last Purchasing Date", "Old Package", "Country", "Price_USD", "Label", "New Package"])
  12. print(df.to_string())

答案1

得分: 1

提供的输出不够清晰,但根据您的逻辑,您需要使用 merge 函数:

  1. out = (dfA
  2. .merge(dfB.loc[dfB['Last Purchasing Date'].ge('20230410'),
  3. ['Identity', 'Package Name']]
  4. .rename(columns={'Package Name': 'New Package'}),
  5. on='Identity', how='left')
  6. .assign(Label=lambda d: np.where(d['New Package'].notna(), 'Yes', 'No'))
  7. )

输出结果:

  1. Identity Last Purchasing Date Package Name Country Price_USD TransactionID New Package Label
  2. 0 A 20201224 Platinum Ghana 50 xxcxcjjjkhsdg NaN No
  3. 1 B 20220418 Gold Ghana 30 uyerygbfjh NaN No
  4. 2 C 20230312 Bronze Kenya 20 hjvfbjhsbdf Red Yes
  5. 3 D 20230414 Red Mozambique 15 ureybjsdfsk NaN No
  6. 4 E 20230618 Green Astria 10 qwqtvjdbcj NaN No
  7. 5 F 20230417 Bronze Australia 20 pioerybhjb NaN No
  8. 6 X 20230417 Bronze Egypt 20 lkjkfnksfuh Platinum Yes
  9. 7 Y 20230417 Bronze South Africa 20 yeuwtevjfdsf Gold Yes
  10. 8 Z 20230416 Bronze Uganda 20 qwiqeubkd Gold Yes
英文:

The provided output is unclear, but given the logic you need a merge:

  1. out = (dfA
  2. .merge(dfB.loc[dfB['Last Purchasing Date'].ge('20230410'),
  3. ['Identity', 'Package Name']]
  4. .rename(columns={'Package Name': 'New Package'}),
  5. on='Identity', how='left')
  6. .assign(Label=lambda d: np.where(d['New Package'].notna(), 'Yes', 'No'))
  7. )

Output:

  1. Identity Last Purchasing Date Package Name Country Price_USD TransactionID New Package Label
  2. 0 A 20201224 Platinum Ghana 50 xxcxcjjjkhsdg NaN No
  3. 1 B 20220418 Gold Ghana 30 uyerygbfjh NaN No
  4. 2 C 20230312 Bronze Kenya 20 hjvfbjhsbdf Red Yes
  5. 3 D 20230414 Red Mozambique 15 ureybjsdfsk NaN No
  6. 4 E 20230618 Green Astria 10 qwqtvjdbcj NaN No
  7. 5 F 20230417 Bronze Australia 20 pioerybhjb NaN No
  8. 6 X 20230417 Bronze Egypt 20 lkjkfnksfuh Platinum Yes
  9. 7 Y 20230417 Bronze South Africa 20 yeuwtevjfdsf Gold Yes
  10. 8 Z 20230416 Bronze Uganda 20 qwiqeubkd Gold Yes

huangapple
  • 本文由 发表于 2023年7月10日 19:16:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/76653178.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定