2023年5月18日 02:12:31go评论82阅读模式

英文:

How to query from a csv repeatedly throughout the day in a simulation code

问题

我正在尝试用Python编写一个模拟代码。这个模拟代码依赖于一个大型CSV文件的输入，每天模拟都有一个单独的CSV文件。我需要在每个模拟日进行多次查询（这些查询基于CSV文件中的时间列）。

我考虑使用pandas.read_csv将其读取为数据框，并将结果存储，然后从这个数据框中查询。编码要求之一是我不希望在查询现场存储数据框。

我认为最简单的方法是使用一个类，例如，

import pandas as pd
class DailyCSVLoader:
  def __init__(self, filepath):
    self.df = pd.read_csv(filepath)
  def query(self, time):
    # 返回与时间对应的行

用法示例：

import datetime

path = "/path/to/csv/file/filename.csv"
time = datetime.datetime(year=2020, month=1, day=1, hour=12, minute=2, second=0)
loader = DailyCSVLoader(path)
loader.query(time)

然而，在我的特定代码库中，可能在类外部以一个函数和可能包含数据框的静态变量的形式更容易些，例如，

import pandas as pd

# 因为我不希望调用方存储数据框，所以我决定将它保持为一个静态变量
def daily_csv_loader(filepath):
  daily_csv_loader.df = pd.read_csv(filepath)

def query(time, df):
  # 返回与时间对应的数据框的行

用法示例：

import datetime

path = "/path/to/csv/file/filename.csv"
time = datetime.datetime(year=2020, month=1, day=1, hour=12, minute=2, second=0)
daily_csv_loader(path)
query(time, daily_csv_loader.df)

还有其他方法吗？最好是一种函数式方法（我宁愿不在这里使用面向对象编程，如之前所暗示的）。是否有一种函数式方法可以使用单个函数完成，可能是带有嵌套函数的形式？

英文:

I'm trying to write a simulation code in python. This simulation code relies on inputs for a large csv file, and there is a separate csv file for each day in the simulation. I need to make numerous queries (the queries are based on time, which are columns in the csv file) each simulation day.

I'm thinking of using pandas.read_csv to read this in as a dataframe, and store the result and then query from this dataframe. One coding requirement is I don't want the dataframe stored at the query site.

I think the easiest way to do this is with a class, e.g.,

import pandas as pd
class DailyCSVLoader:
  def __init__(filepath):
    self.df = pd.read_csv(filepath)
  def query(time):
    # return the rows corresponding to time

with usage:

import datetime

path = &quot;/path/to/csv/file/filename.csv&quot;
time = datetime.datetime(year=2020, month=1, day=1, hour=12, minute=2, second=0)
loader = DailyCSVLoader(path)
loader.query()

However, for my particular codebase, it might be slightly easier to do this outside of a class and with just a function and perhaps a static variable that holds the dataframe, e.g.,

import pandas as pd

# because I don&#39;t want the calling site to store df, I decided to keep it as a static variable here
def daily_csv_loader(filepath):
  daily_csv_loader.df = pd.read_csv(filepath)


def query(time, df):
  # return rows from df corresponding to time

with usage

import datetime

path = &quot;/path/to/csv/file/filename.csv&quot;
time = datetime.datetime(year=2020, month=1, day=1, hour=12, minute=2, second=0)
daily_csv_loader(filepath)
query(time, daily_csv_loader.)

Are there any other approaches here, preferably a functional approach (I would prefer not to use OOP here as alluded to previously)? Is there a functional approach that can be done with a single function, perhaps with nested functions?

答案1

得分: 2

以下是您要的翻译内容：

import csv
import datetime as dt
import io
from typing import Any, Dict

def load_queryable_csv(csv_str):
    # 一个类似函数式的示例，带有嵌套函数，不会在调用点暴露底层的csv/dataframe对象，最终通过单个函数调用进行调用：

    # 快速且简单的设置，也可以使用pandas并/或通用解析
    rows = csv.DictReader(io.StringIO(csv_str))
    parsed_rows = [
        {'a': row['a'], 'b': int(row['b']), 'time': dt.datetime.fromisoformat(row['time'])}
        for row in rows
    ]

    # "查询"
    def query_csv(**column_matchers):
        def matcher(row: Dict[str, Any]):
            return all(row[col] == val for col, val in column_matchers.items())

        return list(filter(matcher, parsed_rows))

    return query_csv

query_csv = load_queryable_csv('''\
a,b,time
x,2,2020-01-01 12:02:00
y,4,2020-01-01 12:02:01
''')

time = dt.datetime(year=2020, month=1, day=1, hour=12, minute=2, second=0)

query_csv(time=time)
# => [{'a': 'x', 'b': 2, 'time': datetime.datetime(2020, 1, 1, 12, 2)}]

query_csv(a='y')
# => [{'a': 'y', 'b': 4, 'time': datetime.datetime(2020, 1, 1, 12, 2, 1)}]

希望这对您有帮助！

英文:

An example that is functional-esque, with nested functions, doesn't expose the underlying csv/dataframe object at the callsite, ultimately invoked via a single function:

import csv
import datetime as dt
import io
from typing import Any, Dict

def load_queryable_csv(csv_str):
    # a quick &amp; easy setup, could have used pandas instead and/or generalized parsing
    rows = csv.DictReader(io.StringIO(csv_str))
    parsed_rows = [
        {&#39;a&#39;: row[&#39;a&#39;], &#39;b&#39;: int(row[&#39;b&#39;]), &#39;time&#39;: dt.datetime.fromisoformat(row[&#39;time&#39;])}
        for row in rows
    ]

    # &quot;querying&quot;
    def query_csv(**column_matchers):
        def matcher(row: Dict[str, Any]):
            return all(row[col] == val for col, val in column_matchers.items())

        return list(filter(matcher, parsed_rows))

    return query_csv

query_csv = load_queryable_csv(&quot;&quot;&quot;\
a,b,time
x,2,2020-01-01 12:02:00
y,4,2020-01-01 12:02:01
&quot;&quot;&quot;)

time = dt.datetime(year=2020, month=1, day=1, hour=12, minute=2, second=0)

query_csv(time=time)
# =&gt; [{&#39;a&#39;: &#39;x&#39;, &#39;b&#39;: 2, &#39;time&#39;: datetime.datetime(2020, 1, 1, 12, 2)}]

query_csv(a=&#39;y&#39;)
# =&gt; [{&#39;a&#39;: &#39;y&#39;, &#39;b&#39;: 4, &#39;time&#39;: datetime.datetime(2020, 1, 1, 12, 2, 1)}]

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在模拟代码中多次在一天中重复查询来自CSV的数据。

问题

答案1

将多列中的所有元素合并成一个系列中的一列，同时保留NaN值。

如何创建一个交互式窗口，其中显示图像变化？

STEM图像分析使用OpenCV

Kotlin中的类共同行为和不可变性

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论