问题

我理解你的问题。你想根据“Product Type”列的特定字符串值添加一个新列，但出现问题，因为如果第一行包含“HW-VG54-NAH”，整列将被设置为“Gateway”，即使有不同的产品类型行也会如此。如果第一行是“HW-CM31”，则整列将被添加为“Camera”。

你可以尝试将添加新列的逻辑更改为仅在每行的“Product Type”匹配条件时才设置新列的值。这可以通过以下方式完成：

# 添加新列
df['Device Type'] = np.where(df['Product Type'].str.contains('VG'), 'Gateway',
                              np.where(df['Product Type'].str.contains('AG'), 'Asset',
                                       np.where(df['Product Type'].str.contains('CM3'), 'Camera', '')))

这将根据每一行的“Product Type”值设置“Device Type”列的值，而不是仅仅依赖于第一行的值。这应该解决你的问题。

英文:

I am trying to add a new column to a panda dataset depending on another column containing a specific string. I am doing this while going through a list of csv files.
I have this

csv_files = glob.glob(os.path.join(path, &quot;*.csv&quot;))
print(&#39;Found&#39;,len(csv_files),&#39;files&#39;)

#Headers to be checked
header_list = [&#39;Created Date&#39;, &#39;Order Number&#39;, &#39;Shipping Address&#39;, &#39;Shipping Contact&#39;, &#39;Shipping Email&#39;, &#39;Product Type&#39;, &#39;Quantity&#39;, &#39;Serial&#39;, &#39;Activation Status&#39;]

list_of_df = []



for f in csv_files: 
    
    print(&#39;File Name:&#39;, f.split(&quot;\\&quot;)[-1])
    
    #read the file
    df = pd.read_csv(f, index_col=None, header=0)

    #check type
    conditions = [
        (df[&#39;Product Type&#39;].str.contains(&#39;VG&#39;).any()),
        (df[&#39;Product Type&#39;].str.contains(&#39;AG&#39;).any()),
        (df[&#39;Product Type&#39;].str.contains(&#39;CM3&#39;).any())
        ]
    values = [&#39;Gateway&#39;, &#39;Asset&#39;, &#39;Camera&#39;]

    # check the headers
    import_headers = df.axes[1]
    a = [i for i in import_headers if i not in header_list]
    
    if not a:
        print(&#39;Headers are OK, file is imported&#39;)
        #Prepare the data
        #Drop columns
        df.drop(df.columns[[2,3,4]], axis=1, inplace=True)

        #Fill Activation Status with 0
        df[&quot;Activation Status&quot;] = df[&quot;Activation Status&quot;].fillna(&quot;0&quot;)
        df[&#39;Activation Status&#39;] = df[&#39;Activation Status&#39;].replace(&#39;Activated&#39;, &#39;1&#39;)
        
        #remove unwanted rows
        df = df.loc[df[&#39;Product Type&#39;].str.contains(&#39;HW-&#39;, regex=True, na=True)]
        
        #add new column
        df[&#39;Device Type&#39;] =np.select(conditions, values)
        
        list_of_df.append(df)
    else:
        print(&#39;Headers are not OK, file is not imported&#39;)
        print(&#39;Headers not found:&#39;, a)
        print(&#39;Headers found:&#39;, import_headers)

df = pd.concat(list_of_df, axis=0, ignore_index=True)

Product Type column contains this kind of data: HW-VG54-NAH, HW-CM31, HW-AG46

The new column is added but if the first row contains HW-VG54-NAH it will put the whole column as Gateway, even though there are rows with different product types.
If the first row is HW-CM31 all column will be added as Camera.

答案1

得分: 0

以下是已翻译的代码部分：

import pandas as pd

df = pd.DataFrame(
    {
        "Order Number": [9, 8, 0, 5],
        "Product Type": ["HW-VG54-NAH", "HW-CM31", "HW-TU35", "HW-AG46"],
    }
)

print(df)
# 输出
   订单号 产品类型
0     9 HW-VG54-NAH
1     8     HW-CM31
2     0     HW-TU35
3     5     HW-AG46

device_types = {"VG": "网关", "AG": "资产", "CM3": "摄像头"}

df["设备类型"] = (
    df["产品类型"]
    .str.extract(f"({'|'.join(device_types.keys())})")  # 对第一行返回 'VG'
    .applymap(lambda x: device_types.get(x, pd.NA))  # 返回 device_types['VG']
)

# 然后：
   订单号 产品类型 设备类型
0     9 HW-VG54-NAH   网关
1     8     HW-CM31  摄像头
2     0     HW-TU35  <NA>
3     5     HW-AG46   资产

请注意，我已将英文中的列名称和字典键值翻译成了中文。如果您需要更多帮助，请告诉我。

英文:

With the following toy dataframe:

import pandas as pd

df = pd.DataFrame(
    {
        &quot;Order Number&quot;: [9, 8, 0, 5],
        &quot;Product Type&quot;: [&quot;HW-VG54-NAH&quot;, &quot;HW-CM31&quot;, &quot;HW-TU35&quot;, &quot;HW-AG46&quot;],
    }
)

print(df)
# Output
   Order Number Product Type
0             9  HW-VG54-NAH
1             8      HW-CM31
2             0      HW-TU35
3             5      HW-AG46

Here is one way to do it, where a dictionary replaces conditions and values in your code:

device_types = {&quot;VG&quot;: &quot;Gateway&quot;, &quot;AG&quot;: &quot;Asset&quot;, &quot;CM3&quot;: &quot;Camera&quot;}

And you add the new column by using str.extract with a regex expression:

df[&quot;Device Type&quot;] = (
    df[&quot;Product Type&quot;]
    .str.extract(f&quot;({&#39;|&#39;.join(device_types.keys())})&quot;)  # will return &#39;VG&#39; for first row
    .applymap(lambda x: device_types.get(x, pd.NA))  # will return device_types[&#39;VG&#39;]
)

Then:

   Order Number Product Type Device Type
0             9  HW-VG54-NAH     Gateway
1             8      HW-CM31      Camera
2             0      HW-TU35        &lt;NA&gt;
3             5      HW-AG46       Asset

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在pandas数据集中基于另一列字符串创建新列。

问题

答案1

如何在Keras中将多个fit调用重新分组到单个epoch中

如何在Graphistry中处理节点在多个列中出现时的合并。

从字符串数组中提取包含子字符串的字符串（Python）

将两个Pandas列中具有相同值的唯一值组合在一起。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论