英文:
Using filename to initialize new DataFrame without exec
问题
I have a series of files to process and I need DataFrames that will have the same names as the files. The number of files is large so I wanted to make the processing parallel with joblib. Unfortunately joblib is not accepting exec
as an element of a function to execute. Is there a better solution to this problem?
The script to process the files looks like that:
files = Path(".").glob("*.out")
for output in files:
df_name = str(output).strip(".out")
exec(str(df_name) + " = pd.DataFrame(columns = ['col_1', 'col_2', 'col_3'])")
exec(str(df_name) + "_tmp = pd.DataFrame(columns = ['col_1', 'col_2', 'col_3'])")
. . .
I need a way to initialize DataFrames from filenames in such a way that it would be acceptable for joblib.
英文:
I have a series of files to process and I need DataFrames that will have the same names as the files. The number of files is large so I wanted to make the processing parallel with joblib. Unfortunately joblib is not accepting exec
as an element of a function to execute. Is there a better solution to this problem?
The script to process the files looks like that:
files = Path(".").glob("*.out")
for output in files:
df_name = str(output).strip(".out")
exec(str(df_name) + " = pd.DataFrame(columns = ['col_1', 'col_2', 'col_3'])")
exec(str(df_name) + "_tmp = pd.DataFrame(columns = ['col_1', 'col_2', 'col_3'])")
. . .
I need a way to initialize DataFrames from filenames in such a way that it would be acceptable for joblib.
答案1
得分: 1
可以使用globals
来动态创建变量:
import pandas as pd
from pathlib import Path
from joblib import Parallel, delayed
files = Path('.').glob('*.out')
def load_file(filename):
return (filename, pd.read_csv(filename))
with Parallel(n_jobs=10) as parallel:
results = parallel(delayed(load_file)(file) for file in list(files)[:20])
for name, df in results:
globals()[f'{name.stem}'] = df
globals()[f'{name.stem}_tmp'] = df.copy()
英文:
You can use globals
to create variables dynamically:
import pandas as pd
from pathlib import Path
from joblib import Parallel, delayed
files = Path('.').glob('*.out')
def load_file(filename):
return (filename, pd.read_csv(filename))
with Parallel(n_jobs=10) as parallel:
results = parallel(delayed(load_file)(file) for file in list(files)[:20])
for name, df in results:
globals()[f'{name.stem}'] = df
globals()[f'{name.stem}_tmp'] = df.copy()
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论