使用文件名初始化新的DataFrame而不执行

huangapple go评论61阅读模式
英文:

Using filename to initialize new DataFrame without exec

问题

I have a series of files to process and I need DataFrames that will have the same names as the files. The number of files is large so I wanted to make the processing parallel with joblib. Unfortunately joblib is not accepting exec as an element of a function to execute. Is there a better solution to this problem?

The script to process the files looks like that:

files = Path(".").glob("*.out")

for output in files: 
    df_name = str(output).strip(".out")
    exec(str(df_name) + " = pd.DataFrame(columns = ['col_1', 'col_2', 'col_3'])")
    exec(str(df_name) + "_tmp = pd.DataFrame(columns = ['col_1', 'col_2', 'col_3'])")
    . . . 

I need a way to initialize DataFrames from filenames in such a way that it would be acceptable for joblib.

英文:

I have a series of files to process and I need DataFrames that will have the same names as the files. The number of files is large so I wanted to make the processing parallel with joblib. Unfortunately joblib is not accepting exec as an element of a function to execute. Is there a better solution to this problem?

The script to process the files looks like that:

files = Path(".").glob("*.out")

for output in files: 
    df_name = str(output).strip(".out")
    exec(str(df_name) + " = pd.DataFrame(columns = ['col_1', 'col_2', 'col_3'])")
    exec(str(df_name) + "_tmp = pd.DataFrame(columns = ['col_1', 'col_2', 'col_3'])")
    . . . 

I need a way to initialize DataFrames from filenames in such a way that it would be acceptable for joblib.

答案1

得分: 1

可以使用globals来动态创建变量:

import pandas as pd
from pathlib import Path
from joblib import Parallel, delayed

files = Path('.').glob('*.out')

def load_file(filename):
    return (filename, pd.read_csv(filename))

with Parallel(n_jobs=10) as parallel:
    results = parallel(delayed(load_file)(file) for file in list(files)[:20])
for name, df in results:
    globals()[f'{name.stem}'] = df
    globals()[f'{name.stem}_tmp'] = df.copy()
英文:

You can use globals to create variables dynamically:

import pandas as pd
from pathlib import Path
from joblib import Parallel, delayed

files = Path('.').glob('*.out')

def load_file(filename):
    return (filename, pd.read_csv(filename))

with Parallel(n_jobs=10) as parallel:
    results = parallel(delayed(load_file)(file) for file in list(files)[:20])
for name, df in results:
    globals()[f'{name.stem}'] = df
    globals()[f'{name.stem}_tmp'] = df.copy()

huangapple
  • 本文由 发表于 2023年3月8日 18:12:05
  • 转载请务必保留本文链接:https://go.coder-hub.com/75671713.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定