在Pandas crosstable中对索引进行分组。

huangapple go评论101阅读模式
英文:

Grouping indexes in a Pandas crosstable

问题

你可以使用 Pandas 的 crosstab 函数来创建这样的交叉表,然后使用字符串的一部分来进行索引的分组。以下是如何实现的代码:

  1. # 创建交叉表
  2. ct = pandas.crosstab(index=df['Region'], columns=df['Answer'])
  3. # 通过字符串一部分来进行索引的分组
  4. ct.index = ct.index.str.split().str[-1] # 使用空格分割字符串,并选择最后一个部分作为索引
  5. # 重新索引为大陆(Continent)
  6. ct.index = ['Europe' if 'Europe' in idx else 'America' for idx in ct.index]
  7. # 重命名列标签
  8. ct.columns = ['no', 'yes']
  9. # 打印结果
  10. print(ct)

这段代码会产生你想要的输出:

  1. no yes
  2. Continent
  3. America 2 2
  4. Europe 1 2
英文:

I have a dataframe in Pandas that looks like this:

  1. df = pandas.DataFrame({
  2. 'Age': [21,22,21,23,23,21,21],
  3. 'Region': ['North America', 'Europe East', 'Europe West', 'South America',
  4. 'North America', 'North America', 'Europe West'],
  5. 'Answer': ['yes','yes','no','yes','no','no','yes']})
  6. Age Region Answer
  7. 0 21 North America yes
  8. 1 22 Europe East yes
  9. 2 21 Europe West no
  10. 3 23 South America yes
  11. 4 23 North America no
  12. 5 21 North America no
  13. 6 21 Europe West yes

And I need a way to produce a cross or pivot table like this:

  1. Answer no yes
  2. Continent
  3. Europe 1 2
  4. America 2 2

Using the Pandas crosstab function I managed to produce this table:

  1. ct = pandas.crosstab(index=df['Region'], columns=df['Answer'])
  2. Answer no yes
  3. Region
  4. Europe East 0 1
  5. Europe West 1 1
  6. North America 2 1
  7. South America 0 1

But then I don't know how to group the indexes that have some part of the string in common.

Is there anyway to do it?

答案1

得分: 1

你可以使用正则表达式从区域中提取出大陆名称。

  1. ct.groupby(
  2. ct.index.str.extract(r'(Europe|America)', expand=False).rename('Continent'),
  3. sort=False,
  4. ).sum()
  1. Answer no yes
  2. Continent
  3. Europe 1 2
  4. America 2 2
英文:

You can use a regex to extract the continent name from the region.

  1. ct.groupby(
  2. ct.index.str.extract(r'(Europe|America)', expand=False).rename('Continent'),
  3. sort=False,
  4. ).sum()
  1. Answer no yes
  2. Continent
  3. Europe 1 2
  4. America 2 2

答案2

得分: 0

你可以创建一个函数来从地区获取大陆名称,并将结果系列用作数据的索引。它会类似于这样:

  1. def get_continent(region):
  2. if 'America' in region:
  3. return '美洲'
  4. if 'Europe' in region:
  5. return '欧洲'
  6. return '未知大陆'
  7. continent = df['Region'].apply(get_continent)
  8. ct = pd.crosstab(index=continent, columns=df['Answer'], rownames=['大陆'])

请注意,我已经将地区的英文名称翻译成了中文。

英文:

You could create a function to get the continent name from a region, and use the resulting series as the index for the data. It would look something like this:

  1. def get_continent(region):
  2. if 'America' in region:
  3. return 'America'
  4. if 'Europe' in region:
  5. return 'Europe'
  6. return 'Unknown Continent'
  7. continent = df['Region'].apply(get_continent)
  8. ct = pd.crosstab(index=continent, columns=df['Answer'], rownames=['Continent'])

答案3

得分: 0

你可以在交叉表之前使用 .map()。我想这将更加灵活,以防您有像"China"这样的地区,您可以将其映射为"Asia",而不必创建特殊的字符串匹配规则。

  1. region_to_continent = {
  2. 'Europe East': 'Europe',
  3. 'Europe West': 'Europe',
  4. 'North America': 'America',
  5. 'South America': 'America',
  6. }
  7. pd.crosstab(
  8. index=df['Region'].map(region_to_continent).rename('Continent'),
  9. columns=df['Answer'],
  10. )
  1. Answer no yes
  2. Continent
  3. America 2 2
  4. Europe 1 2
英文:

You can .map() before the crosstab. I imagine this will be more flexible in case you have regions like "China", in which case you can map it to "Asia" instead of having to create a special string matching rule.

  1. region_to_continent = {
  2. 'Europe East': 'Europe',
  3. 'Europe West': 'Europe',
  4. 'North America': 'America',
  5. 'South America': 'America',
  6. }
  7. pd.crosstab(
  8. index=df['Region'].map(region_to_continent).rename('Continent'),
  9. columns=df['Answer'],
  10. )
  1. Answer no yes
  2. Continent
  3. America 2 2
  4. Europe 1 2

答案4

得分: -1

我尝试使用groupby和透视表,但由于重复项而无法正常工作。不过,你可以尝试这段代码:首先提取Region列的公共部分,然后构建交叉表

  1. import pandas
  2. df['Continent'] = df['Region'].str.extract('(\w+)', expand=False)
  3. result = pandas.crosstab(index=df['Continent'], columns=df['Answer'])
  4. print(result)
英文:

I try to use the goupby and pivot table but it doesn't works due to duplicates.
However you can try this code : first you will extract the common part of Region column and then you will build your crosstable

  1. import pandas
  2. df['Continent'] = df['Region'].str.extract('(\w+)', expand=False)
  3. result = pandas.crosstab(index=df['Continent'], columns=df['Answer'])
  4. print(result)

huangapple
  • 本文由 发表于 2023年5月22日 06:01:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/76302096.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定