2023年5月22日 06:01:20go评论101阅读模式

英文:

Grouping indexes in a Pandas crosstable

问题

你可以使用 Pandas 的 crosstab 函数来创建这样的交叉表，然后使用字符串的一部分来进行索引的分组。以下是如何实现的代码：

# 创建交叉表
ct = pandas.crosstab(index=df['Region'], columns=df['Answer'])
# 通过字符串一部分来进行索引的分组
ct.index = ct.index.str.split().str[-1]  # 使用空格分割字符串，并选择最后一个部分作为索引
# 重新索引为大陆（Continent）
ct.index = ['Europe' if 'Europe' in idx else 'America' for idx in ct.index]
# 重命名列标签
ct.columns = ['no', 'yes']
# 打印结果
print(ct)

这段代码会产生你想要的输出：

         no  yes
Continent         
America    2    2
Europe     1    2

英文:

I have a dataframe in Pandas that looks like this:

df = pandas.DataFrame({
    &#39;Age&#39;: [21,22,21,23,23,21,21],
    &#39;Region&#39;: [&#39;North America&#39;, &#39;Europe East&#39;, &#39;Europe West&#39;, &#39;South America&#39;,
               &#39;North America&#39;, &#39;North America&#39;, &#39;Europe West&#39;],
    &#39;Answer&#39;: [&#39;yes&#39;,&#39;yes&#39;,&#39;no&#39;,&#39;yes&#39;,&#39;no&#39;,&#39;no&#39;,&#39;yes&#39;]})
   Age         Region Answer
0   21  North America    yes
1   22    Europe East    yes
2   21    Europe West     no
3   23  South America    yes
4   23  North America     no
5   21  North America     no
6   21    Europe West    yes

And I need a way to produce a cross or pivot table like this:

Answer         no  yes
Continent                
Europe          1    2    
America         2    2

Using the Pandas crosstab function I managed to produce this table:

ct = pandas.crosstab(index=df[&#39;Region&#39;], columns=df[&#39;Answer&#39;])
Answer         no  yes
Region                
Europe East     0    1
Europe West     1    1
North America   2    1
South America   0    1

But then I don't know how to group the indexes that have some part of the string in common.

Is there anyway to do it?

答案1

得分: 1

你可以使用正则表达式从区域中提取出大陆名称。

ct.groupby(
    ct.index.str.extract(r'(Europe|America)', expand=False).rename('Continent'),
    sort=False,
).sum()

Answer     no  yes
Continent         
Europe      1    2
America     2    2

英文:

You can use a regex to extract the continent name from the region.

ct.groupby(
    ct.index.str.extract(r&#39;(Europe|America)&#39;, expand=False).rename(&#39;Continent&#39;),
    sort=False,
    ).sum()

Answer     no  yes
Continent         
Europe      1    2
America     2    2

答案2

得分: 0

你可以创建一个函数来从地区获取大陆名称，并将结果系列用作数据的索引。它会类似于这样：

def get_continent(region):
    if 'America' in region:
        return '美洲'
    if 'Europe' in region:
        return '欧洲'
    
    return '未知大陆'
    
continent = df['Region'].apply(get_continent)
ct = pd.crosstab(index=continent, columns=df['Answer'], rownames=['大陆'])

请注意，我已经将地区的英文名称翻译成了中文。

英文:

You could create a function to get the continent name from a region, and use the resulting series as the index for the data. It would look something like this:

def get_continent(region):
    if &#39;America&#39; in region:
        return &#39;America&#39;
    if &#39;Europe&#39; in region:
        return &#39;Europe&#39;
    return &#39;Unknown Continent&#39;
continent = df[&#39;Region&#39;].apply(get_continent)
ct = pd.crosstab(index=continent, columns=df[&#39;Answer&#39;], rownames=[&#39;Continent&#39;])

答案3

得分: 0

你可以在交叉表之前使用 .map()。我想这将更加灵活，以防您有像"China"这样的地区，您可以将其映射为"Asia"，而不必创建特殊的字符串匹配规则。

region_to_continent = {
    'Europe East': 'Europe',
    'Europe West': 'Europe',
    'North America': 'America',
    'South America': 'America',
}
pd.crosstab(
    index=df['Region'].map(region_to_continent).rename('Continent'),
    columns=df['Answer'],
)

Answer     no  yes
Continent         
America     2    2
Europe      1    2

英文:

You can .map() before the crosstab. I imagine this will be more flexible in case you have regions like "China", in which case you can map it to "Asia" instead of having to create a special string matching rule.

region_to_continent = {
    &#39;Europe East&#39;: &#39;Europe&#39;,
    &#39;Europe West&#39;: &#39;Europe&#39;,
    &#39;North America&#39;: &#39;America&#39;,
    &#39;South America&#39;: &#39;America&#39;,
}
pd.crosstab(
    index=df[&#39;Region&#39;].map(region_to_continent).rename(&#39;Continent&#39;),
    columns=df[&#39;Answer&#39;],
)

Answer     no  yes
Continent         
America     2    2
Europe      1    2

答案4

得分: -1

我尝试使用groupby和透视表，但由于重复项而无法正常工作。不过，你可以尝试这段代码：首先提取Region列的公共部分，然后构建交叉表

import pandas 
df['Continent'] = df['Region'].str.extract('(\w+)', expand=False)
result = pandas.crosstab(index=df['Continent'], columns=df['Answer'])
print(result)

英文:

I try to use the goupby and pivot table but it doesn't works due to duplicates.
However you can try this code : first you will extract the common part of Region column and then you will build your crosstable

import pandas 
df[&#39;Continent&#39;] = df[&#39;Region&#39;].str.extract(&#39;(\w+)&#39;, expand=False)
result = pandas.crosstab(index=df[&#39;Continent&#39;], columns=df[&#39;Answer&#39;])
print(result)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在Pandas crosstable中对索引进行分组。

问题

答案1

答案2

答案3

答案4

在使用 Snakemake 变量在 R 脚本中时出错。

Transform dataframe Python

Groupby and transform across a group, not within it.

在Pandas中按索引和名称查找数值

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。