英文:
Fill values based on reference table
问题
import pandas as pd
# Your reference dataframe
reference_df = pd.DataFrame({
'Product': ['Shirt', 'Sneakers', 'Pants', 'Tennis ball', 'Football ball', 'Football boots'],
'Brand': ['Zara', 'Nike', 'Zara', 'Wilson', 'Adidas', 'Adidas']
})
# Your dataframe to be filled
df = pd.DataFrame({
'Product': ['Shirt', 'Shirt', 'Pants', 'Tennis ball', 'Shirt', 'Football boots', 'Football boots', 'Football boots', 'Pants', 'Sneakers', 'Football ball', 'Football boots'],
'Brand': [None, None, None, None, None, None, None, None, None, None, None, None]
})
# Use map to fill the 'Brand' column based on the 'Product' from the reference dataframe
df['Brand'] = df['Product'].map(reference_df.set_index('Product')['Brand'])
# Print the resulting dataframe
df
This code will fill the 'Brand' column in your dataframe based on the 'Product' using the reference dataframe, giving you the desired output.
英文:
I have two df with different lenghts, one (reference/dictionary) with various types of products and brand who makes them, and another df with just the products, I have to fill brand based on the reference table
Reference df is like
Product | Brand |
---|---|
Shirt | Zara |
Sneakers | Nike |
Pants | Zara |
Tennis ball | Wilson |
Football ball | Adidas |
Football boots | Adidas |
The df to be filled is something like
Product | Brand |
---|---|
Shirt | NaN |
Shirt | NaN |
Pants | NaN |
Tennis ball | NaN |
Shirt | NaN |
Football boots | NaN |
Football boots | NaN |
Football boots | NaN |
Pants | NaN |
Sneakers | NaN |
Football ball | NaN |
Football boots | NaN |
(+100k rows) | NaN |
I´ve tried the following code
df['Brand'] = df['Brand'].map(referencedf.set_index('Product')['Brand'])
df
Unfortunately, the output is not the desired and it always returns the same brand for all products
Product | Brand |
---|---|
Shirt | Zara |
Shirt | Zara |
Pants | Zara |
Tennis ball | Zara |
Shirt | Zara |
Football boots | Zara |
Football boots | Zara |
Football boots | Zara |
Pants | Zara |
Sneakers | Zara |
Football ball | Zara |
Football boots | Zara |
(+100k rows) | Zara |
Any ideas on how can i get the output right?
答案1
得分: 0
df['Brand'] = df['Product'].map(reference_df.set_index('Product')['Brand'])
print(df)
输出:
Product Brand
0 衬衫 Zara
1 衬衫 Zara
2 裤子 Zara
3 网球 Wilson
4 衬衫 Zara
5 足球鞋 Adidas
6 足球鞋 Adidas
7 足球鞋 Adidas
8 裤子 Zara
9 运动鞋 Nike
10 足球 Adidas
11 足球鞋 Adidas
英文:
Try:
df['Brand'] = df['Product'].map(reference_df.set_index('Product')['Brand'])
print(df)
Prints:
Product Brand
0 Shirt Zara
1 Shirt Zara
2 Pants Zara
3 Tennis ball Wilson
4 Shirt Zara
5 Football boots Adidas
6 Football boots Adidas
7 Football boots Adidas
8 Pants Zara
9 Sneakers Nike
10 Football ball Adidas
11 Football boots Adidas
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论