问题

I have a series of files to process and I need DataFrames that will have the same names as the files. The number of files is large so I wanted to make the processing parallel with joblib. Unfortunately joblib is not accepting exec as an element of a function to execute. Is there a better solution to this problem?

The script to process the files looks like that:

files = Path(".").glob("*.out")

for output in files: 
    df_name = str(output).strip(".out")
    exec(str(df_name) + " = pd.DataFrame(columns = ['col_1', 'col_2', 'col_3'])")
    exec(str(df_name) + "_tmp = pd.DataFrame(columns = ['col_1', 'col_2', 'col_3'])")
    . . .

I need a way to initialize DataFrames from filenames in such a way that it would be acceptable for joblib.

英文:

The script to process the files looks like that:

files = Path(&quot;.&quot;).glob(&quot;*.out&quot;)

for output in files: 
    df_name = str(output).strip(&quot;.out&quot;)
    exec(str(df_name) + &quot; = pd.DataFrame(columns = [&#39;col_1&#39;, &#39;col_2&#39;, &#39;col_3&#39;])&quot;)
    exec(str(df_name) + &quot;_tmp = pd.DataFrame(columns = [&#39;col_1&#39;, &#39;col_2&#39;, &#39;col_3&#39;])&quot;)
    . . .

I need a way to initialize DataFrames from filenames in such a way that it would be acceptable for joblib.

答案1

得分: 1

可以使用globals来动态创建变量：

import pandas as pd
from pathlib import Path
from joblib import Parallel, delayed

files = Path('.').glob('*.out')

def load_file(filename):
    return (filename, pd.read_csv(filename))

with Parallel(n_jobs=10) as parallel:
    results = parallel(delayed(load_file)(file) for file in list(files)[:20])
for name, df in results:
    globals()[f'{name.stem}'] = df
    globals()[f'{name.stem}_tmp'] = df.copy()

英文:

You can use globals to create variables dynamically:

import pandas as pd
from pathlib import Path
from joblib import Parallel, delayed

files = Path(&#39;.&#39;).glob(&#39;*.out&#39;)

def load_file(filename):
    return (filename, pd.read_csv(filename))

with Parallel(n_jobs=10) as parallel:
    results = parallel(delayed(load_file)(file) for file in list(files)[:20])
for name, df in results:
    globals()[f&#39;{name.stem}&#39;] = df
    globals()[f&#39;{name.stem}_tmp&#39;] = df.copy()

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用文件名初始化新的DataFrame而不执行

问题

答案1

使用存储在另一个数据框中的索引引用数据框。

如何从不利的表示中恢复树形结构？

Pandas – 在组内使用来自组的值进行缩放

复制整行并保留格式

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论