英文:
Grouping indexes in a Pandas crosstable
问题
你可以使用 Pandas 的 crosstab
函数来创建这样的交叉表,然后使用字符串的一部分来进行索引的分组。以下是如何实现的代码:
# 创建交叉表
ct = pandas.crosstab(index=df['Region'], columns=df['Answer'])
# 通过字符串一部分来进行索引的分组
ct.index = ct.index.str.split().str[-1] # 使用空格分割字符串,并选择最后一个部分作为索引
# 重新索引为大陆(Continent)
ct.index = ['Europe' if 'Europe' in idx else 'America' for idx in ct.index]
# 重命名列标签
ct.columns = ['no', 'yes']
# 打印结果
print(ct)
这段代码会产生你想要的输出:
no yes
Continent
America 2 2
Europe 1 2
英文:
I have a dataframe in Pandas that looks like this:
df = pandas.DataFrame({
'Age': [21,22,21,23,23,21,21],
'Region': ['North America', 'Europe East', 'Europe West', 'South America',
'North America', 'North America', 'Europe West'],
'Answer': ['yes','yes','no','yes','no','no','yes']})
Age Region Answer
0 21 North America yes
1 22 Europe East yes
2 21 Europe West no
3 23 South America yes
4 23 North America no
5 21 North America no
6 21 Europe West yes
And I need a way to produce a cross or pivot table like this:
Answer no yes
Continent
Europe 1 2
America 2 2
Using the Pandas crosstab function I managed to produce this table:
ct = pandas.crosstab(index=df['Region'], columns=df['Answer'])
Answer no yes
Region
Europe East 0 1
Europe West 1 1
North America 2 1
South America 0 1
But then I don't know how to group the indexes that have some part of the string in common.
Is there anyway to do it?
答案1
得分: 1
你可以使用正则表达式从区域中提取出大陆名称。
ct.groupby(
ct.index.str.extract(r'(Europe|America)', expand=False).rename('Continent'),
sort=False,
).sum()
Answer no yes
Continent
Europe 1 2
America 2 2
英文:
You can use a regex to extract the continent name from the region.
ct.groupby(
ct.index.str.extract(r'(Europe|America)', expand=False).rename('Continent'),
sort=False,
).sum()
Answer no yes
Continent
Europe 1 2
America 2 2
答案2
得分: 0
你可以创建一个函数来从地区获取大陆名称,并将结果系列用作数据的索引。它会类似于这样:
def get_continent(region):
if 'America' in region:
return '美洲'
if 'Europe' in region:
return '欧洲'
return '未知大陆'
continent = df['Region'].apply(get_continent)
ct = pd.crosstab(index=continent, columns=df['Answer'], rownames=['大陆'])
请注意,我已经将地区的英文名称翻译成了中文。
英文:
You could create a function to get the continent name from a region, and use the resulting series as the index for the data. It would look something like this:
def get_continent(region):
if 'America' in region:
return 'America'
if 'Europe' in region:
return 'Europe'
return 'Unknown Continent'
continent = df['Region'].apply(get_continent)
ct = pd.crosstab(index=continent, columns=df['Answer'], rownames=['Continent'])
答案3
得分: 0
你可以在交叉表之前使用 .map()
。我想这将更加灵活,以防您有像"China"这样的地区,您可以将其映射为"Asia",而不必创建特殊的字符串匹配规则。
region_to_continent = {
'Europe East': 'Europe',
'Europe West': 'Europe',
'North America': 'America',
'South America': 'America',
}
pd.crosstab(
index=df['Region'].map(region_to_continent).rename('Continent'),
columns=df['Answer'],
)
Answer no yes
Continent
America 2 2
Europe 1 2
英文:
You can .map()
before the crosstab. I imagine this will be more flexible in case you have regions like "China", in which case you can map it to "Asia" instead of having to create a special string matching rule.
region_to_continent = {
'Europe East': 'Europe',
'Europe West': 'Europe',
'North America': 'America',
'South America': 'America',
}
pd.crosstab(
index=df['Region'].map(region_to_continent).rename('Continent'),
columns=df['Answer'],
)
Answer no yes
Continent
America 2 2
Europe 1 2
答案4
得分: -1
我尝试使用groupby和透视表,但由于重复项而无法正常工作。不过,你可以尝试这段代码:首先提取Region列的公共部分,然后构建交叉表
import pandas
df['Continent'] = df['Region'].str.extract('(\w+)', expand=False)
result = pandas.crosstab(index=df['Continent'], columns=df['Answer'])
print(result)
英文:
I try to use the goupby and pivot table but it doesn't works due to duplicates.
However you can try this code : first you will extract the common part of Region column and then you will build your crosstable
import pandas
df['Continent'] = df['Region'].str.extract('(\w+)', expand=False)
result = pandas.crosstab(index=df['Continent'], columns=df['Answer'])
print(result)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论