问题

我有一个名为'primary_role'的列，其中包含有关组织中用户角色/帐户类型的信息。有一个与其同名的列，其中包含与'primary_role'列中的值相对应的数据。我需要提取该列中的数据。

例如，'primary_role'列下具有值'只读账户'的行将在名为'只读账户'的另一列中具有数据。而'Manager'则将在'Manager'列中具有数据，而此人的'只读账户'列将为空。请注意，这些列中存储的值是类似字典的数据的字符串解释。我正在使用ast.literal_eval将该字符串转换为字典。

我需要做的是提取匹配用户角色特定列下的数据，并将该信息存储在名为prop_id或类似的单独列中。在这些数据中存储了一个冗余的字符串值，我正在使用isdigit()来去除它。

以下是我正在做的一个示例代码，仅适用于一行：

role_col = users_df.iloc[0]['primary_role'] # primary role列的值
ls = ast.literal_eval(users_df[role_col].iloc[0])['organization'] # 将角色列值用作列名以获取值，将其转换为实际字典
cleaned = [int(x) for x in ls if x.strip().isdigit()] # 仅返回数字（某些用户管理多个属性）

我最初的想法是创建一个循环，使用users_df.iterrows()，然后从那里开始工作，但考虑到有超过5000行，这似乎有点太多了。

英文:

I have a column called 'primary_role' that contains information about the user's role/account type in the organization. There is a corresponding column with the same name as the value in their 'primary_role' column. I need to extract the data from this column.

For example a row with a value of 'Read-Only Account' under primary_role column, will have data in another column called 'Read-Only Account'. A 'Manager', will have data in the 'Manager' column, while the 'Read-Only Account' column will be blank for this person. Just a note, the value stored in these columns are string interpretations of a dictionary like data. I'm using ast.literal_eval to convert that string to a dictionary.

What I need to do is extract the data under the matching user role specific column and store that information in a separate column called prop_id or something similar. There is a redundant string value stored in this data that I'm removing using isdigit().

Here's a sample code for what I'm doing with just one row:

role_col = users_df.iloc[0][&#39;primary_role&#39;] # primary role column value
ls = ast.literal_eval(users_df[role_col].iloc[0])[&#39;organization&#39;] # feed role column value as column name to get value, convert to actual dictionary
cleaned = [ int(x) for x in ls if x.strip().isdigit() ] # returns only numbers (some users manage multiple properties)

My first thought was to create a for loop using users_df.iterrows() and work from there, but this seems a bit too much considering there are over 5000 rows.

答案1

得分: 1

One possible way is to use the apply function to create a new column based on the values of the primary_role column and the corresponding role-specific column. You can write a custom function that takes a row as an input and returns the cleaned data from the matching column. For example:

import ast

def get_prop_id(row):
  role_col = row['primary_role']
  ls = ast.literal_eval(row[role_col])['organization']
  cleaned = [int(x) for x in ls if x.strip().isdigit()]
  return cleaned

users_df['prop_id'] = users_df.apply(get_prop_id, axis=1)

The apply function will apply your custom function to each row and assign the result to the new column.

英文:

import ast

def get_prop_id(row):
  role_col = row[&#39;primary_role&#39;]
  ls = ast.literal_eval(row[role_col])[&#39;organization&#39;]
  cleaned = [int(x) for x in ls if x.strip().isdigit()]
  return cleaned

users_df[&#39;prop_id&#39;] = users_df.apply(get_prop_id, axis=1)

The apply function will apply your custom function to each row and assign the result to the new column.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

最好的方法是如何迭代每一行，针对以下情况？

问题

答案1

多行字符串包含数字

如何修复Python 3.10上的importlib，以便它可以正确调用entry_points()？

Pandas 转换为日期时间

计算集成模型的估计误差标准差

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论