描述扩展的Pandas及更多内容

huangapple go评论78阅读模式
英文:

Extended Describe Pandas and beyond

问题

我是新手对于python和pandas。我的问题与以下问题有关:https://stackoverflow.com/questions/23959207/advanced-describe-pandas

是否可以添加一些函数来回答像新手一样:
几何平均数,加权平均数,调和平均数,几何标准偏差等。

import pandas as pd
    
def describex(data):
    data = pd.DataFrame(data)
    stats = data.describe()
    skewness = data.skew()
    kurtosis = data.kurtosis()
    skewness_df = pd.DataFrame({'skewness':skewness}).T
    kurtosis_df = pd.DataFrame({'kurtosis':kurtosis}).T
    return stats.append([kurtosis_df,skewness_df])

所以基本上我想要添加一些来自scipy.stats之类的东西,这不是来自pandas的这些函数。我希望从描述性统计信息中获得比标准描述提供的更多信息。到目前为止,我尝试了添加更多来自pandas的函数,对此我可以接受,但无法附加更多来自pandas之外的函数。请问我该怎么做?

英文:

I am new to python and pandas. My question is related to that question:
<https://stackoverflow.com/questions/23959207/advanced-describe-pandas>

Is it possible to add some functions to reply by noobie like:
geometric mean, weighted mean, harmonic mean, geometric standard deviation, etc.

import pandas as pd
    
    def describex(data):
        data = pd.DataFrame(data)
        stats = data.describe()
        skewness = data.skew()
        kurtosis = data.kurtosis()
        skewness_df = pd.DataFrame({&#39;skewness&#39;:skewness}).T
        kurtosis_df = pd.DataFrame({&#39;kurtosis&#39;:kurtosis}).T
        return stats.append([kurtosis_df,skewness_df])

So basically I am interested in adding something for example from scipy.stats that is not as these functions above originated from pandas. I want to have much more informations from descriptive statistics than standard describe offers. What I tried so far was adding more functions from pandas, and with that I am OK, but wasn't able to attach more functions that are outside of pandas.
How do I do it, please ?

答案1

得分: 0

有几件事情你可以做。

一些建议是使用pandas-profiling库,它可以生成关于数据的全面报告,包括基本统计信息、相关性分析、数据类型分析、缺失值分析等。这可以是一个非常有用的工具,用来快速获取数据集的全面概述。

另一个建议是使用scipy.stats库将高级统计信息添加到你的自定义函数中。scipy.stats库可能有一个用来计算你寻找的任何统计量的函数。

例如,

import pandas as pd
import numpy as np
from scipy.stats import gmean

df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))

def describex(data):
    data = pd.DataFrame(data)
    stats = data.describe()
    skewness = data.skew()
    kurtosis = data.kurtosis()
    skewness_df = pd.DataFrame({'skewness':skewness}).T
    kurtosis_df = pd.DataFrame({'kurtosis':kurtosis}).T
    gmean_df = pd.DataFrame(df.apply(gmean, axis=0),columns=['gmean']).T
    return stats.append([kurtosis_df,skewness_df,gmean_df])

print(describex(df))

希望对你有帮助!

英文:

There are a couple of things you could do.

One suggestion is to use the pandas-profiling library, which can generate a comprehensive report on the data including basic statistics, correlation analysis, data type analysis, missing values analysis, and more. This can be a very useful tool for quickly getting a comprehensive overview of the dataset.

Another suggestion is to use the scipy.stats library to add any advanced statistics to your custom function. The scipy.stats library probably has a function to compute any statistic you're looking for.

For example,

import pandas as pd
import numpy as np
from scipy.stats import gmean

df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list(&#39;ABCD&#39;))

def describex(data):
    data = pd.DataFrame(data)
    stats = data.describe()
    skewness = data.skew()
    kurtosis = data.kurtosis()
    skewness_df = pd.DataFrame({&#39;skewness&#39;:skewness}).T
    kurtosis_df = pd.DataFrame({&#39;kurtosis&#39;:kurtosis}).T
    gmean_df = pd.DataFrame(df.apply(gmean, axis=0),columns=[&#39;gmean&#39;]).T
    return stats.append([kurtosis_df,skewness_df,gmean_df])

print(describex(df))

Hope this helps!

huangapple
  • 本文由 发表于 2023年2月16日 02:31:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/75464068.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定