2023年7月13日 14:46:25go评论72阅读模式

英文:

Is this the most efficient way to get the contents of a single column in a row in pandas?

问题

我有一个熊猫的数据框，我无法保证列的顺序，确切的列名，甚至是否所有列都存在。

我需要能够处理数据框中的每一行，并提取特定列的值。

目前我是通过搜索熊猫系列的 axes 列表来进行的，以匹配类似下面的正则表达式（在这个例子中，我简化了错误处理）：

```python
for idx, row in df.iterrows():     
    
    # 从轴列表中获取与正则表达式匹配的列的名称           
    matching_string = [string for string in row.axes[0].tolist() if re.match('^myregexp *', string)]
           
           # 在继续打印值之前，检查我们只有一个匹配
           if len(matching_string) == 1:
               print(row[matching_string[0]])
           else:
              print('出现了问题')

这会是最有效的方法吗？我无法弄清楚如何直接将正则表达式放入 row['column_name'] 中。

我知道我可以在数据框级别筛选列名，但如果一些列丢失，这不一定会保证顺序，因此我不能通过索引引用它们，例如：

row[3]

英文:

I have a panda's dataframe where I cannot guarantee the order of the columns, the exact names of the columns or even if all the columns will exist.

I need to be able to process each row in the data frame and extract the values for certain columns.

At the moment I am doing it by searching the panda's series.axes list for a string that matches a regex like below ( I've simplified error handling in this example )

for idx, row in df.iterrows():     
    
    #get the name of the column that matches the regex from the axes list           
    matching_string = [string for string in row.axes[0].tolist() if re.match(&#39;^myregexp *&#39;, string)]
           
           # check we only have one match before proceeding to print the value
           if len(matching_string) == 1:
               print(row[matching_string[0]])
           else:
              print(&#39;something went wrong&#39;)

Would this be the most efficient way? I couldn't work out how to put the regex directly into row['column_name'].

I know I can filter the column names at a dataframe level, but that wouldn't necessarily guarantee order if some of the columns are missing and therefore I couldn't reference them by index, i.e

row[3]

答案1

得分: 1

I think you should test your condition outside of the loop. You can also use filter:

# .squeeze() convert a one column DataFrame to a Series
sr = df.filter(regex='^myregexp *').squeeze(axis=1)
if isinstance(sr, pd.DataFrame):
    print('something went wrong')
else:
    # do stuff here
    for row in sr:
        print(row)

英文:

IIUC, I think you should test your condition outside of the loop. You can also use filter:

# .squeeze() convert a one column DataFrame to a Series
sr = df.filter(regex=&#39;^myregexp *&#39;).squeeze(axis=1)
if isinstance(sr, pd.DataFrame):
    print(&#39;something went wrong&#39;)
else:
    # do stuff here
    for row in sr:
        print(row)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

这是在pandas中获取行中单列内容的最有效方法吗？

问题

答案1

Creating a new column in a Pandas DataFrame based on the previous quarter and the same ID in another DataFrame

如何根据不同列中的值填充 Pandas DataFrame 中的空值？

在数据框列中查找值的列表索引。

三角形数：用Python的嵌套循环

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论