最好的方法是如何迭代每一行,针对以下情况?

huangapple go评论69阅读模式
英文:

Best way to iterate through every row for following case?

问题

我有一个名为'primary_role'的列,其中包含有关组织中用户角色/帐户类型的信息。有一个与其同名的列,其中包含与'primary_role'列中的值相对应的数据。我需要提取该列中的数据。

例如,'primary_role'列下具有值'只读账户'的行将在名为'只读账户'的另一列中具有数据。而'Manager'则将在'Manager'列中具有数据,而此人的'只读账户'列将为空。请注意,这些列中存储的值是类似字典的数据的字符串解释。我正在使用ast.literal_eval将该字符串转换为字典。

我需要做的是提取匹配用户角色特定列下的数据,并将该信息存储在名为prop_id或类似的单独列中。在这些数据中存储了一个冗余的字符串值,我正在使用isdigit()来去除它。

以下是我正在做的一个示例代码,仅适用于一行:

role_col = users_df.iloc[0]['primary_role'] # primary role列的值
ls = ast.literal_eval(users_df[role_col].iloc[0])['organization'] # 将角色列值用作列名以获取值,将其转换为实际字典
cleaned = [int(x) for x in ls if x.strip().isdigit()] # 仅返回数字(某些用户管理多个属性)

我最初的想法是创建一个循环,使用users_df.iterrows(),然后从那里开始工作,但考虑到有超过5000行,这似乎有点太多了。

英文:

I have a column called 'primary_role' that contains information about the user's role/account type in the organization. There is a corresponding column with the same name as the value in their 'primary_role' column. I need to extract the data from this column.

For example a row with a value of 'Read-Only Account' under primary_role column, will have data in another column called 'Read-Only Account'. A 'Manager', will have data in the 'Manager' column, while the 'Read-Only Account' column will be blank for this person. Just a note, the value stored in these columns are string interpretations of a dictionary like data. I'm using ast.literal_eval to convert that string to a dictionary.

What I need to do is extract the data under the matching user role specific column and store that information in a separate column called prop_id or something similar. There is a redundant string value stored in this data that I'm removing using isdigit().

Here's a sample code for what I'm doing with just one row:

role_col = users_df.iloc[0]['primary_role'] # primary role column value
ls = ast.literal_eval(users_df[role_col].iloc[0])['organization'] # feed role column value as column name to get value, convert to actual dictionary
cleaned = [ int(x) for x in ls if x.strip().isdigit() ] # returns only numbers (some users manage multiple properties)

My first thought was to create a for loop using users_df.iterrows() and work from there, but this seems a bit too much considering there are over 5000 rows.

答案1

得分: 1

One possible way is to use the apply function to create a new column based on the values of the primary_role column and the corresponding role-specific column. You can write a custom function that takes a row as an input and returns the cleaned data from the matching column. For example:

import ast

def get_prop_id(row):
  role_col = row['primary_role']
  ls = ast.literal_eval(row[role_col])['organization']
  cleaned = [int(x) for x in ls if x.strip().isdigit()]
  return cleaned

users_df['prop_id'] = users_df.apply(get_prop_id, axis=1)

The apply function will apply your custom function to each row and assign the result to the new column.

英文:

One possible way is to use the apply function to create a new column based on the values of the primary_role column and the corresponding role-specific column. You can write a custom function that takes a row as an input and returns the cleaned data from the matching column. For example:

import ast

def get_prop_id(row):
  role_col = row['primary_role']
  ls = ast.literal_eval(row[role_col])['organization']
  cleaned = [int(x) for x in ls if x.strip().isdigit()]
  return cleaned

users_df['prop_id'] = users_df.apply(get_prop_id, axis=1)

The apply function will apply your custom function to each row and assign the result to the new column.

huangapple
  • 本文由 发表于 2023年5月11日 04:05:18
  • 转载请务必保留本文链接:https://go.coder-hub.com/76222206.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定