问题

我有一个包含以下列的汽车数据框。

Car_code、car_model、update_Date、sensor_codes，每辆车在sensor_codes中列出多个传感器，以斜杠 / 分隔，如下所示：

df = pd.DataFrame([['x','iii-2019-10-16','18/04/2019','115/556/879/115'],
                   ['x','iii-2019-10-16','21/07/2019','87/998/115'],
                   ['x','iii-2019-10-16','','115/556/879'],
                   ['x','zzz-2020-10-25','12/04/2022',''],
                   ['y','qqq-2018-05-28','10/12/2017','789/554/745'], 
                   ['y','qqq-2018-05-28','15/02/2018','789/554/75'],
                   ['y','ooo-2019-11-22','30/05/2019','55'],
                   ['y','rrr-16-12-2020','16/12/2020',''],
                   ['z','ppt-2019-12-03','07/02/2018','889/654/750'],
                   ['z','ttt-2019-12-03','28/05/2019','119/55/75'],
                   ['z','ttt-2019-12-03','09/09/2019'],
                   ['z','ttt-2019-12-03','30/09/2019']
                  ],
                  columns=['Car_code','car_model','update_Date','sensor_codes'])
df

我需要创建一个新的数据框，只包含两列：Car_code 和传感器代码（其中包含唯一的传感器）。所以对于每个 car_code，将有多行，每一行都有一个传感器，如下所示：

英文:

I have a data frame for the cars with these columns.

Car_code, car_model, update_Date, sensor_codes each car has multiple sensors listed in sensor_codes and separated by / as below:

df = pd.DataFrame([[&#39;x&#39;,&#39;iii-2019-10-16&#39;,&#39;18/04/2019&#39;,&#39;115/556/879/115&#39;],
                   [&#39;x&#39;,&#39;iii-2019-10-16&#39;,&#39;21/07/2019&#39;,&#39;87/998/115&#39;],
                   [&#39;x&#39;,&#39;iii-2019-10-16&#39;,&#39;&#39;,&#39;115/556/879&#39;],
                   [&#39;x&#39;,&#39;zzz-2020-10-25&#39;,&#39;12/04/2022&#39;,&#39;&#39;],
                   [&#39;y&#39;,&#39;qqq-2018-05-28&#39;,&#39;10/12/2017&#39;,&#39;789/554/745&#39;], 
                   [&#39;y&#39;,&#39;qqq-2018-05-28&#39;,&#39;15/02/2018&#39;,&#39;789/554/75&#39;],
                   [&#39;y&#39;,&#39;ooo-2019-11-22&#39;,&#39;30/05/2019&#39;,&#39;55&#39;],
                   [&#39;y&#39;,&#39;rrr-16-12-2020&#39;,&#39;16/12/2020&#39;,&#39;&#39;],
                   [&#39;z&#39;,&#39;ppt-2019-12-03&#39;,&#39;07/02/2018&#39;,&#39;889/654/750&#39;],
                   [&#39;z&#39;,&#39;ttt-2019-12-03&#39;,&#39;28/05/2019&#39;,&#39;119/55/75&#39;],
                   [&#39;z&#39;,&#39;ttt-2019-12-03&#39;,&#39;09/09/2019&#39;],
                   [&#39;z&#39;,&#39;ttt-2019-12-03&#39;,&#39;30/09/2019&#39;]
                  
                  ],
                  columns=[&#39;Car_code&#39;,&#39;car_model&#39;,&#39;update_Date&#39;,&#39;sensor_codes&#39;])
df

I need to create a new data frame that has just two columns: Car_code, and sensor code (which will contain the unique sensor) so for each car_code there will be multiple rows each one has one sensor like below:

答案1

得分: 0

以下是代码的中文翻译部分：

df_result = (
    df[['Car_code']]
    .assign(sensor_codes=df['sensor_codes'].str.split('/'))
    .explode('sensor_codes')
    .loc[lambda df:
         df['sensor_codes'].notna() & df['sensor_codes'].str.strip().ne('')
    ].drop_duplicates(keep='first')
    .assign(sensor_codes=lambda df: df['sensor_codes'].astype('int'))
    .reset_index(drop=True)
)

代码的翻译完成。

英文:

You could try:

df_result = (
    df[[&#39;Car_code&#39;]]
    .assign(sensor_codes=df[&#39;sensor_codes&#39;].str.split(&#39;/&#39;))
    .explode(&#39;sensor_codes&#39;)
    .loc[lambda df:
         df[&#39;sensor_codes&#39;].notna() &amp; df[&#39;sensor_codes&#39;].str.strip().ne(&#39;&#39;)
    ].drop_duplicates(keep=&#39;first&#39;)
    .assign(sensor_codes=lambda df: df[&#39;sensor_codes&#39;].astype(&#39;int&#39;))
    .reset_index(drop=True)
)

Pick the column Car_code as a dataframe (therefore the double brackets) and add sensor_codes as new column, but use .str.split('/') to split the items along '/' into lists.
Then .explode the sensor_codes column to flatten it, which keeps the respective items in Car_code.
Afterwards use .loc to filter out the rows that have either a NaN (here None) or an empty string in sensor_codes (the .str.strip is just in case there's only whitespace in an item).
Then drop duplicate rows except for the first ones.
Finally cast sensor_codes into integers and reset the index (both optional, might well be that you don't need it).

Result for the sample:

   Car_code sensor_codes
0         x          115
1         x          556
2         x          879
3         x           87
4         x          998
5         y          789
6         y          554
7         y          745
8         y           75
9         y           55
10        z          889
11        z          654
12        z          750
13        z          119
14        z           55
15        z           75

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

解析数据帧中由已知分隔符分隔的列，然后重新整理它。

问题

答案1

Python签署EIP-712消息用于blur.io

如何使用Python翻译文本？

如何从utf-8中恢复符号

获取或设置Python中Redis中多个键的缓存

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论