这是在pandas中获取行中单列内容的最有效方法吗?

huangapple go评论57阅读模式
英文:

Is this the most efficient way to get the contents of a single column in a row in pandas?

问题

我有一个熊猫的数据框我无法保证列的顺序确切的列名甚至是否所有列都存在

我需要能够处理数据框中的每一行并提取特定列的值

目前我是通过搜索熊猫系列的 axes 列表来进行的以匹配类似下面的正则表达式在这个例子中我简化了错误处理):

```python
for idx, row in df.iterrows():     
    
    # 从轴列表中获取与正则表达式匹配的列的名称           
    matching_string = [string for string in row.axes[0].tolist() if re.match('^myregexp *', string)]
           
           # 在继续打印值之前,检查我们只有一个匹配
           if len(matching_string) == 1:
               print(row[matching_string[0]])
           else:
              print('出现了问题')

这会是最有效的方法吗?我无法弄清楚如何直接将正则表达式放入 row['column_name'] 中。

我知道我可以在数据框级别筛选列名,但如果一些列丢失,这不一定会保证顺序,因此我不能通过索引引用它们,例如:

row[3]
英文:

I have a panda's dataframe where I cannot guarantee the order of the columns, the exact names of the columns or even if all the columns will exist.

I need to be able to process each row in the data frame and extract the values for certain columns.

At the moment I am doing it by searching the panda's series.axes list for a string that matches a regex like below ( I've simplified error handling in this example )

for idx, row in df.iterrows():     
    
    #get the name of the column that matches the regex from the axes list           
    matching_string = [string for string in row.axes[0].tolist() if re.match('^myregexp *', string)]
           
           # check we only have one match before proceeding to print the value
           if len(matching_string) == 1:
               print(row[matching_string[0]])
           else:
              print('something went wrong')

Would this be the most efficient way? I couldn't work out how to put the regex directly into row['column_name'].

I know I can filter the column names at a dataframe level, but that wouldn't necessarily guarantee order if some of the columns are missing and therefore I couldn't reference them by index, i.e

row[3]

答案1

得分: 1

I think you should test your condition outside of the loop. You can also use filter:

# .squeeze() convert a one column DataFrame to a Series
sr = df.filter(regex='^myregexp *').squeeze(axis=1)
if isinstance(sr, pd.DataFrame):
    print('something went wrong')
else:
    # do stuff here
    for row in sr:
        print(row)
英文:

IIUC, I think you should test your condition outside of the loop. You can also use filter:

# .squeeze() convert a one column DataFrame to a Series
sr = df.filter(regex='^myregexp *').squeeze(axis=1)
if isinstance(sr, pd.DataFrame):
    print('something went wrong')
else:
    # do stuff here
    for row in sr:
        print(row)

huangapple
  • 本文由 发表于 2023年7月13日 14:46:25
  • 转载请务必保留本文链接:https://go.coder-hub.com/76676594.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定