英文:
How to query from a csv repeatedly throughout the day in a simulation code
问题
我正在尝试用Python编写一个模拟代码。这个模拟代码依赖于一个大型CSV文件的输入,每天模拟都有一个单独的CSV文件。我需要在每个模拟日进行多次查询(这些查询基于CSV文件中的时间列)。
我考虑使用pandas.read_csv
将其读取为数据框,并将结果存储,然后从这个数据框中查询。编码要求之一是我不希望在查询现场存储数据框。
我认为最简单的方法是使用一个类,例如,
import pandas as pd
class DailyCSVLoader:
def __init__(self, filepath):
self.df = pd.read_csv(filepath)
def query(self, time):
# 返回与时间对应的行
用法示例:
import datetime
path = "/path/to/csv/file/filename.csv"
time = datetime.datetime(year=2020, month=1, day=1, hour=12, minute=2, second=0)
loader = DailyCSVLoader(path)
loader.query(time)
然而,在我的特定代码库中,可能在类外部以一个函数和可能包含数据框的静态变量的形式更容易些,例如,
import pandas as pd
# 因为我不希望调用方存储数据框,所以我决定将它保持为一个静态变量
def daily_csv_loader(filepath):
daily_csv_loader.df = pd.read_csv(filepath)
def query(time, df):
# 返回与时间对应的数据框的行
用法示例:
import datetime
path = "/path/to/csv/file/filename.csv"
time = datetime.datetime(year=2020, month=1, day=1, hour=12, minute=2, second=0)
daily_csv_loader(path)
query(time, daily_csv_loader.df)
还有其他方法吗?最好是一种函数式方法(我宁愿不在这里使用面向对象编程,如之前所暗示的)。是否有一种函数式方法可以使用单个函数完成,可能是带有嵌套函数的形式?
英文:
I'm trying to write a simulation code in python. This simulation code relies on inputs for a large csv file, and there is a separate csv file for each day in the simulation. I need to make numerous queries (the queries are based on time, which are columns in the csv file) each simulation day.
I'm thinking of using pandas.read_csv
to read this in as a dataframe, and store the result and then query from this dataframe. One coding requirement is I don't want the dataframe stored at the query site.
I think the easiest way to do this is with a class, e.g.,
import pandas as pd
class DailyCSVLoader:
def __init__(filepath):
self.df = pd.read_csv(filepath)
def query(time):
# return the rows corresponding to time
with usage:
import datetime
path = "/path/to/csv/file/filename.csv"
time = datetime.datetime(year=2020, month=1, day=1, hour=12, minute=2, second=0)
loader = DailyCSVLoader(path)
loader.query()
However, for my particular codebase, it might be slightly easier to do this outside of a class and with just a function and perhaps a static variable that holds the dataframe, e.g.,
import pandas as pd
# because I don't want the calling site to store df, I decided to keep it as a static variable here
def daily_csv_loader(filepath):
daily_csv_loader.df = pd.read_csv(filepath)
def query(time, df):
# return rows from df corresponding to time
with usage
import datetime
path = "/path/to/csv/file/filename.csv"
time = datetime.datetime(year=2020, month=1, day=1, hour=12, minute=2, second=0)
daily_csv_loader(filepath)
query(time, daily_csv_loader.)
Are there any other approaches here, preferably a functional approach (I would prefer not to use OOP here as alluded to previously)? Is there a functional approach that can be done with a single function, perhaps with nested functions?
答案1
得分: 2
以下是您要的翻译内容:
import csv
import datetime as dt
import io
from typing import Any, Dict
def load_queryable_csv(csv_str):
# 一个类似函数式的示例,带有嵌套函数,不会在调用点暴露底层的csv/dataframe对象,最终通过单个函数调用进行调用:
# 快速且简单的设置,也可以使用pandas并/或通用解析
rows = csv.DictReader(io.StringIO(csv_str))
parsed_rows = [
{'a': row['a'], 'b': int(row['b']), 'time': dt.datetime.fromisoformat(row['time'])}
for row in rows
]
# "查询"
def query_csv(**column_matchers):
def matcher(row: Dict[str, Any]):
return all(row[col] == val for col, val in column_matchers.items())
return list(filter(matcher, parsed_rows))
return query_csv
query_csv = load_queryable_csv('''\
a,b,time
x,2,2020-01-01 12:02:00
y,4,2020-01-01 12:02:01
''')
time = dt.datetime(year=2020, month=1, day=1, hour=12, minute=2, second=0)
query_csv(time=time)
# => [{'a': 'x', 'b': 2, 'time': datetime.datetime(2020, 1, 1, 12, 2)}]
query_csv(a='y')
# => [{'a': 'y', 'b': 4, 'time': datetime.datetime(2020, 1, 1, 12, 2, 1)}]
希望这对您有帮助!
英文:
An example that is functional-esque, with nested functions, doesn't expose the underlying csv/dataframe object at the callsite, ultimately invoked via a single function:
import csv
import datetime as dt
import io
from typing import Any, Dict
def load_queryable_csv(csv_str):
# a quick & easy setup, could have used pandas instead and/or generalized parsing
rows = csv.DictReader(io.StringIO(csv_str))
parsed_rows = [
{'a': row['a'], 'b': int(row['b']), 'time': dt.datetime.fromisoformat(row['time'])}
for row in rows
]
# "querying"
def query_csv(**column_matchers):
def matcher(row: Dict[str, Any]):
return all(row[col] == val for col, val in column_matchers.items())
return list(filter(matcher, parsed_rows))
return query_csv
query_csv = load_queryable_csv("""\
a,b,time
x,2,2020-01-01 12:02:00
y,4,2020-01-01 12:02:01
""")
time = dt.datetime(year=2020, month=1, day=1, hour=12, minute=2, second=0)
query_csv(time=time)
# => [{'a': 'x', 'b': 2, 'time': datetime.datetime(2020, 1, 1, 12, 2)}]
query_csv(a='y')
# => [{'a': 'y', 'b': 4, 'time': datetime.datetime(2020, 1, 1, 12, 2, 1)}]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论