英文:
Is this the most efficient way to get the contents of a single column in a row in pandas?
问题
我有一个熊猫的数据框,我无法保证列的顺序,确切的列名,甚至是否所有列都存在。
我需要能够处理数据框中的每一行,并提取特定列的值。
目前我是通过搜索熊猫系列的 axes 列表来进行的,以匹配类似下面的正则表达式(在这个例子中,我简化了错误处理):
```python
for idx, row in df.iterrows():
# 从轴列表中获取与正则表达式匹配的列的名称
matching_string = [string for string in row.axes[0].tolist() if re.match('^myregexp *', string)]
# 在继续打印值之前,检查我们只有一个匹配
if len(matching_string) == 1:
print(row[matching_string[0]])
else:
print('出现了问题')
这会是最有效的方法吗?我无法弄清楚如何直接将正则表达式放入 row['column_name']
中。
我知道我可以在数据框级别筛选列名,但如果一些列丢失,这不一定会保证顺序,因此我不能通过索引引用它们,例如:
row[3]
英文:
I have a panda's dataframe where I cannot guarantee the order of the columns, the exact names of the columns or even if all the columns will exist.
I need to be able to process each row in the data frame and extract the values for certain columns.
At the moment I am doing it by searching the panda's series.axes list for a string that matches a regex like below ( I've simplified error handling in this example )
for idx, row in df.iterrows():
#get the name of the column that matches the regex from the axes list
matching_string = [string for string in row.axes[0].tolist() if re.match('^myregexp *', string)]
# check we only have one match before proceeding to print the value
if len(matching_string) == 1:
print(row[matching_string[0]])
else:
print('something went wrong')
Would this be the most efficient way? I couldn't work out how to put the regex directly into row['column_name'].
I know I can filter the column names at a dataframe level, but that wouldn't necessarily guarantee order if some of the columns are missing and therefore I couldn't reference them by index, i.e
row[3]
答案1
得分: 1
I think you should test your condition outside of the loop. You can also use filter
:
# .squeeze() convert a one column DataFrame to a Series
sr = df.filter(regex='^myregexp *').squeeze(axis=1)
if isinstance(sr, pd.DataFrame):
print('something went wrong')
else:
# do stuff here
for row in sr:
print(row)
英文:
IIUC, I think you should test your condition outside of the loop. You can also use filter
:
# .squeeze() convert a one column DataFrame to a Series
sr = df.filter(regex='^myregexp *').squeeze(axis=1)
if isinstance(sr, pd.DataFrame):
print('something went wrong')
else:
# do stuff here
for row in sr:
print(row)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论