英文:
Extended Describe Pandas and beyond
问题
我是新手对于python和pandas。我的问题与以下问题有关:https://stackoverflow.com/questions/23959207/advanced-describe-pandas
是否可以添加一些函数来回答像新手一样:
几何平均数,加权平均数,调和平均数,几何标准偏差等。
import pandas as pd
def describex(data):
data = pd.DataFrame(data)
stats = data.describe()
skewness = data.skew()
kurtosis = data.kurtosis()
skewness_df = pd.DataFrame({'skewness':skewness}).T
kurtosis_df = pd.DataFrame({'kurtosis':kurtosis}).T
return stats.append([kurtosis_df,skewness_df])
所以基本上我想要添加一些来自scipy.stats之类的东西,这不是来自pandas的这些函数。我希望从描述性统计信息中获得比标准描述提供的更多信息。到目前为止,我尝试了添加更多来自pandas的函数,对此我可以接受,但无法附加更多来自pandas之外的函数。请问我该怎么做?
英文:
I am new to python and pandas. My question is related to that question:
<https://stackoverflow.com/questions/23959207/advanced-describe-pandas>
Is it possible to add some functions to reply by noobie like:
geometric mean, weighted mean, harmonic mean, geometric standard deviation, etc.
import pandas as pd
def describex(data):
data = pd.DataFrame(data)
stats = data.describe()
skewness = data.skew()
kurtosis = data.kurtosis()
skewness_df = pd.DataFrame({'skewness':skewness}).T
kurtosis_df = pd.DataFrame({'kurtosis':kurtosis}).T
return stats.append([kurtosis_df,skewness_df])
So basically I am interested in adding something for example from scipy.stats that is not as these functions above originated from pandas. I want to have much more informations from descriptive statistics than standard describe offers. What I tried so far was adding more functions from pandas, and with that I am OK, but wasn't able to attach more functions that are outside of pandas.
How do I do it, please ?
答案1
得分: 0
有几件事情你可以做。
一些建议是使用pandas-profiling库,它可以生成关于数据的全面报告,包括基本统计信息、相关性分析、数据类型分析、缺失值分析等。这可以是一个非常有用的工具,用来快速获取数据集的全面概述。
另一个建议是使用scipy.stats库将高级统计信息添加到你的自定义函数中。scipy.stats库可能有一个用来计算你寻找的任何统计量的函数。
例如,
import pandas as pd
import numpy as np
from scipy.stats import gmean
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
def describex(data):
data = pd.DataFrame(data)
stats = data.describe()
skewness = data.skew()
kurtosis = data.kurtosis()
skewness_df = pd.DataFrame({'skewness':skewness}).T
kurtosis_df = pd.DataFrame({'kurtosis':kurtosis}).T
gmean_df = pd.DataFrame(df.apply(gmean, axis=0),columns=['gmean']).T
return stats.append([kurtosis_df,skewness_df,gmean_df])
print(describex(df))
希望对你有帮助!
英文:
There are a couple of things you could do.
One suggestion is to use the pandas-profiling library, which can generate a comprehensive report on the data including basic statistics, correlation analysis, data type analysis, missing values analysis, and more. This can be a very useful tool for quickly getting a comprehensive overview of the dataset.
Another suggestion is to use the scipy.stats library to add any advanced statistics to your custom function. The scipy.stats library probably has a function to compute any statistic you're looking for.
For example,
import pandas as pd
import numpy as np
from scipy.stats import gmean
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
def describex(data):
data = pd.DataFrame(data)
stats = data.describe()
skewness = data.skew()
kurtosis = data.kurtosis()
skewness_df = pd.DataFrame({'skewness':skewness}).T
kurtosis_df = pd.DataFrame({'kurtosis':kurtosis}).T
gmean_df = pd.DataFrame(df.apply(gmean, axis=0),columns=['gmean']).T
return stats.append([kurtosis_df,skewness_df,gmean_df])
print(describex(df))
Hope this helps!
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论