英文:
pypy pandas correlation slower than python
问题
I just wanted to give a try PyPy for pandas operations and I was thinking to use some part of code might be faster with PyPy but apparently it is slower than python.
What is the reason behind that?
That's my code sample, just reads example data from csv and computes correlation.
with python: 7 minute
with pypy: 8.5 minute
import pandas as pd
import time
t = time.time()
df = pd.read_csv('./dfn.csv', index_col=0)
df.T.corr()
print(time.time()-t)
英文:
I just wanted to give a try PyPy for pandas operations and I was thinking to use some part of code might be faster with PyPy but apparently it is slower than python.
What is the reason behind that?
That's my code sample, just reads example data from csv and computes correlation.
with python: 7 minute
with pypy: 8.5 minute
import pandas as pd
import time
t = time.time()
df = pd.read_csv('./dfn.csv', index_col=0)
df.T.corr()
print(time.time()-t)
答案1
得分: 1
科学计算的 Python 软件栈大部分实际上是用 C/C++ 编写的。因此,当你使用 pandas 中的 read_csv
或 T.corr()
等例程时,实际上并不是在运行 Python 代码,而是在运行编译过的代码。PyPy 无法显著加速这些代码。此外,与 C/C++ 代码的接口目前是使用 CPython C-API 编写的。为了让 PyPy 使用这些代码,它必须模拟 CPython C-API,这会导致速度较慢。参见此博文了解原因。我们希望HPy能改变这种状况,使得在 PyPy(以及其他 Python 实现)上的 C/C++ 互操作更快。
英文:
Much of the scientific python software stack actually is written in C/C++. So when you use pandas routines like read_csv
or T.corr()
, you are not hitting python code, rather compiled code. PyPy cannot speed that code up much. Additionally, the interfaces to the C/C++ code are currently written using the CPython C-API. In order for PyPy to use that code, it must emulate the CPython C-API which is slow. See this blog post for the reasons. We hope HPy will change that situation and make C/C++ interop on PyPy (and other python implementations) faster.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论