英文:
Making sns.lmplot, scatterplot with two groups of data summed on each row respectively
问题
基本上,我的数据集看起来是这样的。
Private Sector Expenditure Public Sector Expenditure \
year
2001 20.4502 11.8767
2002 20.5501 13.1333
2003 20.5362 13.4328
2004 25.6956 14.7190
2005 25.6956 15.5087
2006 32.8184 17.1671
2007 42.2216 21.0410
2008 51.0546 21.0410
2009 36.9461 23.1826
2010 37.7380 25.4141
2011 44.5643 28.1917
2012 42.4928 28.2885
2013 43.3318 30.6922
2014 50.0689 33.0973
2015 55.1194 37.2753
2016 53.4095 37.9928
2017 53.8613 36.7543
Content Services Revenue Hardware Revenue IT Services Revenue \
year
2001 13.2 29.8 32.0
2002 16.0 27.4 36.5
2003 9.0 61.7 25.6
2004 12.0 61.1 26.8
2005 6.3 64.9 25.4
2006 6.3 41.5 29.1
2007 9.8 52.1 44.2
2008 10.9 61.6 62.4
2009 13.0 161.0 71.0
2010 15.0 137.0 75.0
2011 22.0 139.0 67.0
2012 15.0 139.0 75.0
2013 19.0 159.0 75.0
2014 21.0 170.0 100.0
2015 21.0 205.0 102.0
2016 17.0 193.0 106.0
2017 0.0 188.0 207.0
Software Revenue Telecommunication Services Revenue \
year
2001 9.0 58.5
2002 10.2 60.7
2003 37.6 16.6
2004 32.8 16.4
2005 45.9 15.8
2006 16.8 54.9
2007 16.9 58.3
2008 21.3 72.0
2009 64.0 94.0
2010 30.0 106.0
2011 33.0 97.0
2012 33.0 92.0
2013 97.0 108.0
2014 105.0 110.0
2015 102.0 99.0
2016 74.0 90.0
2017 69.0 79.0
Total Mobile Subscriptions
year
2001 2877017.0
2002 3067033.0
2003 3358817.0
2004 3675142.0
2005 4090633.0
2006 4391733.0
2007 5073833.0
2008 6112742.0
2009 6576875.0
2010 7058117.0
2011 7540733.0
2012 7868608.0
2013 8235317.0
2014 8273658.0
2015 8140783.0
2016 8312475.0
2017 8427542.0
我试图制作一个 seaborn.lmplot,横坐标为 ['Private Sector Expenditure', 'Public Sector Expenditure'],纵坐标为 ['Content Services Revenue', 'Hardware Revenue', 'IT Services Revenue', 'Software Revenue', 'Telecommunication Services Revenue'],其中每行的列被求和,以返回每年的一个值。
filled_revenue = df_final.groupby(sum(['Private Sector Expenditure', 'Public Sector Expenditure']))
filled_expenditure = df_final.groupby(sum(['Content Services Revenue', 'Hardware Revenue',
'IT Services Revenue', 'Software Revenue', 'Telecommunication Services Revenue']))
sns.lmplot(data=df_final, x=filled_expenditure, y=filled_revenue)
我尝试这样做,但显然有些问题,我没有足够的经验来理解如何对数据进行逐行子集和求和。
英文:
Basically, my dataset looks like this.
Private Sector Expenditure Public Sector Expenditure \
year
2001 20.4502 11.8767
2002 20.5501 13.1333
2003 20.5362 13.4328
2004 25.6956 14.7190
2005 25.6956 15.5087
2006 32.8184 17.1671
2007 42.2216 21.0410
2008 51.0546 21.0410
2009 36.9461 23.1826
2010 37.7380 25.4141
2011 44.5643 28.1917
2012 42.4928 28.2885
2013 43.3318 30.6922
2014 50.0689 33.0973
2015 55.1194 37.2753
2016 53.4095 37.9928
2017 53.8613 36.7543
Content Services Revenue Hardware Revenue IT Services Revenue \
year
2001 13.2 29.8 32.0
2002 16.0 27.4 36.5
2003 9.0 61.7 25.6
2004 12.0 61.1 26.8
2005 6.3 64.9 25.4
2006 6.3 41.5 29.1
2007 9.8 52.1 44.2
2008 10.9 61.6 62.4
2009 13.0 161.0 71.0
2010 15.0 137.0 75.0
2011 22.0 139.0 67.0
2012 15.0 139.0 75.0
2013 19.0 159.0 75.0
2014 21.0 170.0 100.0
2015 21.0 205.0 102.0
2016 17.0 193.0 106.0
2017 0.0 188.0 207.0
Software Revenue Telecommunication Services Revenue \
year
2001 9.0 58.5
2002 10.2 60.7
2003 37.6 16.6
2004 32.8 16.4
2005 45.9 15.8
2006 16.8 54.9
2007 16.9 58.3
2008 21.3 72.0
2009 64.0 94.0
2010 30.0 106.0
2011 33.0 97.0
2012 33.0 92.0
2013 97.0 108.0
2014 105.0 110.0
2015 102.0 99.0
2016 74.0 90.0
2017 69.0 79.0
Total Mobile Subscriptions
year
2001 2877017.0
2002 3067033.0
2003 3358817.0
2004 3675142.0
2005 4090633.0
2006 4391733.0
2007 5073833.0
2008 6112742.0
2009 6576875.0
2010 7058117.0
2011 7540733.0
2012 7868608.0
2013 8235317.0
2014 8273658.0
2015 8140783.0
2016 8312475.0
2017 8427542.0
I am trying to make a seaborn.lmplot of ['Private Sector Expenditure', 'Public Sector Expenditure'] on the x-axis and ['Content Services Revenue', 'Hardware Revenue', 'IT Services Revenue', 'Software Revenue', 'Telecommunication Services Revenue'] on the y-axis where the columns are summed up every row to return one value for each year on the x and y axis.
filled_revenue = df_final.groupby(sum(['Private Sector Expenditure', 'Public Sector Expenditure']))
filled_expenditure = df_final.groupby(sum(['Content Services Revenue', 'Hardware Revenue',
'IT Services Revenue', 'Software Revenue', 'Telecommunication Services Revenue']))
sns.lmplot(data = df_final, x = filled_expenditure, y = filled_revenue)
i tried doing this but clearly there's something wrong and i'm not experienced enough to understand how to subset and sum the data per row
答案1
得分: 1
我认为您将尝试将**"私营部门支出","公共部门支出" => X轴和"内容服务收入","硬件收入","IT服务收入","软件收入","电信服务收入" => Y轴**进行求和。
因此,您的最终代码将是:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.DataFrame({
"Private Sector Expenditure": [20.4502, 20.5501, 20.5362, 25.6956, 25.6956, 32.8184, 42.2216, 51.0546, 36.9461, 37.7380, 44.5643, 42.4928, 43.3318, 50.0689, 55.1194, 53.4095, 53.8613],
"Public Sector Expenditure": [11.8767, 13.1333, 13.4328, 14.7190, 15.5087, 17.1671, 21.0410, 21.0410, 23.1826, 25.4141, 28.1917, 28.2885, 30.6922, 33.0973, 37.2753, 37.9928, 36.7543],
"Content Services Revenue": [13.2, 16.0, 9.0, 12.0, 6.3, 6.3, 9.8, 10.9, 13.0, 15.0, 22.0, 15.0, 19.0, 21.0, 21.0, 17.0, 0.0],
"Hardware Revenue": [29.8, 27.4, 61.7, 61.1, 64.9, 41.5, 52.1, 61.6, 161.0, 137.0, 139.0, 139.0, 159.0, 170.0, 205.0, 193.0, 188.0],
"IT Services Revenue": [32.0, 36.5, 25.6, 26.8, 25.4, 29.1, 44.2, 62.4, 71.0, 75.0, 67.0, 75.0, 75.0, 100.0, 102.0, 106.0, 207.0],
"Software Revenue": [9.0, 10.2, 37.6, 32.8, 45.9, 16.8, 16.9, 21.3, 64.0, 30.0, 33.0, 33.0, 97.0, 105.0, 102.0, 74.0, 69.0],
"Telecommunication Services Revenue": [58.5, 60.7, 16.6, 16.4, 15.8, 54.9, 58.3, 72.0, 94.0, 106.0, 97.0, 92.0, 108.0, 110.0, 99.0, 90.0, 79.0],
"Total Mobile Subscriptions": [2877017.0, 3067033.0, 3358817.0, 3675142.0, 4090633.0, 4391733.0, 5073833.0, 6112742.0, 6576875.0, 7058117.0, 7540733.0, 7868608.0, 8235317.0, 8273658.0, 8140783.0, 8312475.0, 8427542.0]
}, index=[2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017])
df["Expenditure"] = df[["Private Sector Expenditure", "Public Sector Expenditure"]].sum(axis=1)
df["Revenue"] = df[["Content Services Revenue", "Hardware Revenue", "IT Services Revenue", "Software Revenue", "Telecommunication Services Revenue"]].sum(axis=1)
sns.lmplot(data=df, x="Expenditure", y="Revenue")
plt.show()
结果将是:
英文:
I think you will try to sum "Private Sector Expenditure", "Public Sector Expenditure" => X-axis and "Content Services Revenue", "Hardware Revenue", "IT Services Revenue", "Software Revenue", "Telecommunication Services Revenue" => Y-axis
So your final code will be for your data is :
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.DataFrame({
"Private Sector Expenditure": [20.4502, 20.5501, 20.5362, 25.6956, 25.6956, 32.8184, 42.2216, 51.0546, 36.9461, 37.7380, 44.5643, 42.4928, 43.3318, 50.0689, 55.1194, 53.4095, 53.8613],
"Public Sector Expenditure": [11.8767, 13.1333, 13.4328, 14.7190, 15.5087, 17.1671, 21.0410, 21.0410, 23.1826, 25.4141, 28.1917, 28.2885, 30.6922, 33.0973, 37.2753, 37.9928, 36.7543],
"Content Services Revenue": [13.2, 16.0, 9.0, 12.0, 6.3, 6.3, 9.8, 10.9, 13.0, 15.0, 22.0, 15.0, 19.0, 21.0, 21.0, 17.0, 0.0],
"Hardware Revenue": [29.8, 27.4, 61.7, 61.1, 64.9, 41.5, 52.1, 61.6, 161.0, 137.0, 139.0, 139.0, 159.0, 170.0, 205.0, 193.0, 188.0],
"IT Services Revenue": [32.0, 36.5, 25.6, 26.8, 25.4, 29.1, 44.2, 62.4, 71.0, 75.0, 67.0, 75.0, 75.0, 100.0, 102.0, 106.0, 207.0],
"Software Revenue": [9.0, 10.2, 37.6, 32.8, 45.9, 16.8, 16.9, 21.3, 64.0, 30.0, 33.0, 33.0, 97.0, 105.0, 102.0, 74.0, 69.0],
"Telecommunication Services Revenue": [58.5, 60.7, 16.6, 16.4, 15.8, 54.9, 58.3, 72.0, 94.0, 106.0, 97.0, 92.0, 108.0, 110.0, 99.0, 90.0, 79.0],
"Total Mobile Subscriptions": [2877017.0, 3067033.0, 3358817.0, 3675142.0, 4090633.0, 4391733.0, 5073833.0, 6112742.0, 6576875.0, 7058117.0, 7540733.0, 7868608.0, 8235317.0, 8273658.0, 8140783.0, 8312475.0, 8427542.0]
}, index=[2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017])
df["Expenditure"] = df[["Private Sector Expenditure", "Public Sector Expenditure"]].sum(axis=1)
df["Revenue"] = df[["Content Services Revenue", "Hardware Revenue", "IT Services Revenue", "Software Revenue", "Telecommunication Services Revenue"]].sum(axis=1)
sns.lmplot(data=df, x="Expenditure", y="Revenue")
plt.show()
Result will be :
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论