英文:
Make a new column in pandas dataset base on another column string
问题
我理解你的问题。你想根据“Product Type”列的特定字符串值添加一个新列,但出现问题,因为如果第一行包含“HW-VG54-NAH”,整列将被设置为“Gateway”,即使有不同的产品类型行也会如此。如果第一行是“HW-CM31”,则整列将被添加为“Camera”。
你可以尝试将添加新列的逻辑更改为仅在每行的“Product Type”匹配条件时才设置新列的值。这可以通过以下方式完成:
# 添加新列
df['Device Type'] = np.where(df['Product Type'].str.contains('VG'), 'Gateway',
np.where(df['Product Type'].str.contains('AG'), 'Asset',
np.where(df['Product Type'].str.contains('CM3'), 'Camera', '')))
这将根据每一行的“Product Type”值设置“Device Type”列的值,而不是仅仅依赖于第一行的值。这应该解决你的问题。
英文:
I am trying to add a new column to a panda dataset depending on another column containing a specific string. I am doing this while going through a list of csv files.
I have this
csv_files = glob.glob(os.path.join(path, "*.csv"))
print('Found',len(csv_files),'files')
#Headers to be checked
header_list = ['Created Date', 'Order Number', 'Shipping Address', 'Shipping Contact', 'Shipping Email', 'Product Type', 'Quantity', 'Serial', 'Activation Status']
list_of_df = []
for f in csv_files:
print('File Name:', f.split("\\")[-1])
#read the file
df = pd.read_csv(f, index_col=None, header=0)
#check type
conditions = [
(df['Product Type'].str.contains('VG').any()),
(df['Product Type'].str.contains('AG').any()),
(df['Product Type'].str.contains('CM3').any())
]
values = ['Gateway', 'Asset', 'Camera']
# check the headers
import_headers = df.axes[1]
a = [i for i in import_headers if i not in header_list]
if not a:
print('Headers are OK, file is imported')
#Prepare the data
#Drop columns
df.drop(df.columns[[2,3,4]], axis=1, inplace=True)
#Fill Activation Status with 0
df["Activation Status"] = df["Activation Status"].fillna("0")
df['Activation Status'] = df['Activation Status'].replace('Activated', '1')
#remove unwanted rows
df = df.loc[df['Product Type'].str.contains('HW-', regex=True, na=True)]
#add new column
df['Device Type'] =np.select(conditions, values)
list_of_df.append(df)
else:
print('Headers are not OK, file is not imported')
print('Headers not found:', a)
print('Headers found:', import_headers)
df = pd.concat(list_of_df, axis=0, ignore_index=True)
Product Type column contains this kind of data: HW-VG54-NAH, HW-CM31, HW-AG46
The new column is added but if the first row contains HW-VG54-NAH it will put the whole column as Gateway, even though there are rows with different product types.
If the first row is HW-CM31 all column will be added as Camera.
答案1
得分: 0
以下是已翻译的代码部分:
import pandas as pd
df = pd.DataFrame(
{
"Order Number": [9, 8, 0, 5],
"Product Type": ["HW-VG54-NAH", "HW-CM31", "HW-TU35", "HW-AG46"],
}
)
print(df)
# 输出
订单号 产品类型
0 9 HW-VG54-NAH
1 8 HW-CM31
2 0 HW-TU35
3 5 HW-AG46
device_types = {"VG": "网关", "AG": "资产", "CM3": "摄像头"}
df["设备类型"] = (
df["产品类型"]
.str.extract(f"({'|'.join(device_types.keys())})") # 对第一行返回 'VG'
.applymap(lambda x: device_types.get(x, pd.NA)) # 返回 device_types['VG']
)
# 然后:
订单号 产品类型 设备类型
0 9 HW-VG54-NAH 网关
1 8 HW-CM31 摄像头
2 0 HW-TU35 <NA>
3 5 HW-AG46 资产
请注意,我已将英文中的列名称和字典键值翻译成了中文。如果您需要更多帮助,请告诉我。
英文:
With the following toy dataframe:
import pandas as pd
df = pd.DataFrame(
{
"Order Number": [9, 8, 0, 5],
"Product Type": ["HW-VG54-NAH", "HW-CM31", "HW-TU35", "HW-AG46"],
}
)
print(df)
# Output
Order Number Product Type
0 9 HW-VG54-NAH
1 8 HW-CM31
2 0 HW-TU35
3 5 HW-AG46
Here is one way to do it, where a dictionary replaces conditions
and values
in your code:
device_types = {"VG": "Gateway", "AG": "Asset", "CM3": "Camera"}
And you add the new column by using str.extract with a regex expression:
df["Device Type"] = (
df["Product Type"]
.str.extract(f"({'|'.join(device_types.keys())})") # will return 'VG' for first row
.applymap(lambda x: device_types.get(x, pd.NA)) # will return device_types['VG']
)
Then:
Order Number Product Type Device Type
0 9 HW-VG54-NAH Gateway
1 8 HW-CM31 Camera
2 0 HW-TU35 <NA>
3 5 HW-AG46 Asset
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论