实现Python中用于不同项的时间序列的__mul__方法

huangapple go评论59阅读模式
英文:

Implementing __mul__ for timeseries in Python for different items

问题

I am trying to implement a piecewise constant function class in python.
我正在尝试在Python中实现分段常数函数类。

I am wondering what is the best way to implement these __mul__ and __add__ functions.
我想知道如何最好地实现这些__mul____add__函数。

Essentially, I want to be able to handle two cases: multiply by a constant or multiply by another timeseries.
基本上,我想要处理两种情况:乘以一个常数或乘以另一个时间序列。

For multiply or adding the constants, I am just going to return a constant times the values with the existing timepoints. For multiply by timeseries, will construct a new series with timepoints being the union of the timepoints and multiply / add them one by one.
对于乘以或加上常数,我只会返回常数乘以具有现有时间点的值。对于乘以时间序列,将构造一个新的时间序列,时间点是时间序列的时间点的并集,并逐个相乘/相加。

Whilst I know how to do both, should I just do
虽然我知道如何做这两种情况,我应该只是这样做吗?

if type(other) == int or type(other) == float:
if type(other) == int or type(other) == float:
if type(other) == int or type(other) == float:
if type(other) == int or type(other) == float:
if type(other) == int or type(other) == float:
if type(other) == int or type(other) == float:

This looks very ugly to me. In C++, one would be able to do an operator overload with different input type. What is the Pythonic way of doing this? In particular, I am concerned that I might have missed out an eligible type in the statement with a statement like this one
这看起来对我来说很丑陋。在C++中,可以使用不同的输入类型进行运算符重载。在Python中,怎么做才符合Python的方式?我特别担心在这个语句中可能会错过一种合适的类型。

if type(other) == int or type(other) == float:
if type(other) == int or type(other) == float:

Also, the implementation of add, mul, sub, div function implementation will almost be identical (except the operator bit). Is there a way of writing one function which each individual one of them would just call?
此外,add、mul、sub、div函数的实现几乎相同(除了运算符部分)。是否有一种方法可以编写一个函数,每个单独的函数只需调用它们?

Essentially I was thinking about a method like
基本上,我考虑的是一种方法,类似于

def operation_by_const(self, operator, other):
def operation_by_const(self, operator, other):
def operation_by_const(self, operator, other):

where ... should be replaced by apply the add/mul/div/sub operator from np.array to const, but I am not sure what is the correct syntax for this in Python. Even though this function does not save much ink for multiplying constant case, for implementation of timeseries multiplied by a timeseries, it would save a lot of copy pasting.
其中...应该被替换为将add/mul/div/sub运算符从np.array应用于const的语法,但我不确定Python中的正确语法。即使这个函数在乘以常数的情况下并没有节省多少代码,但在实现时间序列与时间序列相乘的情况下,它可以节省很多复制粘贴。

Here is the rest of the code
以下是代码的其余部分

from bisect import bisect_left
from functools import cached_property
import numpy as np

class PiecewiseConstant:
class PiecewiseConstant:
class PiecewiseConstant:

def init(self, times: np.array, values: np.array):
def init(self, times: np.array, values: np.array):

entire non-empty

整个非空

整个非空

assert(len(times) == len(values))
assert(len(times) == len(values))
assert(len(times) == len(values))

for i, j in zip(times[0:-1], times[1:]):
for i, j in zip(times[0:-1], times[1:]):
for i, j in zip(times[0:-1], times[1:]):
for i, j in zip(times[0:-1], times[1:]):
for i, j in zip(times[0:-1], times[1:]):
for i, j in zip(times[0:-1], times[1:]):

if i >= j:
if i >= j:
if i >= j:
if i >= j:
if i >= j:
if i >= j:

raise ValueError("Time series needs to have increasing indices")
raise ValueError("Time series needs to have increasing indices")
raise ValueError("Time series needs to have increasing indices")
raise ValueError("Time series needs to have increasing indices")
raise ValueError("Time series needs to have increasing indices")
raise ValueError("Time series needs to have increasing indices")

self.times = times
self.times = times
self.times = times
self.times = times
self.times = times
self.times = times
self.times = times
self.times = times

self.values = values
self.values = values
self.values = values
self.values = values
self.values = values
self.values = values
self.values = values
self.values = values

def getitem(self, time):
def getitem(self, time):
def getitem(self, time):

if time < self.times[0]:
if time < self.times[0]:
if time < self.times[0]:

return self.values[0]
return self.values[0]
return self.values[0]

time_index = bisect_left(self.times, time)
time_index = bisect_left(self.times, time)
time_index = bisect_left(self.times, time)

if time is greater than all our time points or less than the time corresponding to the index

如果时间大于我们所有的时间点或小于与索引对应的时间

如果时间大于我们所有的时间点或小于与索引对应的时间

if time_index == len(self.times) or time < self.times[time_index]:
if time_index == len(self.times) or time < self.times[time_index]:
if time_index == len(self.times) or time < self

英文:

I am trying to implement a piecewise constant function class in python.
I am wondering what is the best way to implement these __mul__ and __add__ functions.
Essentially, I want to be able to handle two cases: multiply by a constant or multiply by another timeseries.

For multiply or adding the constants, I am just going to return a constant times the values with the existing timepoints. For multiply by timeseries, will construct a new series with timepoints being the union of the timepoints and multiply / add them one by one.

Whilst I know how to do both, should I just do

if type(other) == int or type(other) == float:
    # implementation
else:
    # implementation

This looks very ugly to me. In C++, one would be able to do an operator overload with different input type. What is the Pythonic way of doing this? In particular, I am concerned that I might have missed out an eligible type in the statement with a statement like this one

if type(other) == int or type(other) == float:

Also, the implementation of add, mul, sub, div function implementation will almost be identical (except the operator bit). Is there a way of writing one function which each individual one of them would just call?

Essentially I was thinking about a method like

def operation_by_const(self, operator, other):
    return PiecewiseConstant(self.times, self.values .... other) 

where ... should be replaced by apply the add/mul/div/sub operator from np.array to const, but I am not sure what is the correct syntax for this in Python. Even though this function does not save much ink for multiplying constant case, for implementation of timeseries multiplied by a timeseries, it would save a lot of copy pasting.

Here is the rest of the code

from bisect import bisect_left
from functools import cached_property
import numpy as np


class PiecewiseConstant:
    def __init__(self, times: np.array, values: np.array):

        # entire non-empty
        assert(len(times) == len(values))
        assert(len(times))

        for i, j in zip(times[0:-1], times[1:]):
            if i &gt;= j:
                raise ValueError(&quot;Time series needs to have increasing indices&quot;)

        self.times = times
        self.values = values

    def __getitem__(self, time):

        if time &lt; self.times[0]:
            return self.values[0]

        time_index = bisect_left(self.times, time)

        # if time is greater than all our time points or less than the time corresponding to the index
        if time_index == len(self.times) or time &lt; self.times[time_index]:
            return self.values[time_index - 1]

        return self.values[time_index]
    
    def __setitem__(self, time):
        raise NotImplementedError(&quot;Setting values of timeseries through indexing is not allowed&quot;)
    
    def __mul__(self, other):
        pass

    def __add__(self, other):
        pass

EDIT: I have now done this. Would appreciate some feedback.

def apply_binary_operator(self, operator, other):
    assert isinstance(other, PiecewiseConstant)
    if self.times == other.times:
        return PiecewiseConstant(self.times, f&quot;self.values.{operator}(other.values)&quot;)
    
    else:
        times = []
        values = []

        self_it = 0
        other_it = 0
        self_time = self.times[self_it]
        other_time = other.values[other_it]

        while self_it != len(self.times) and other_it != len(other.times):
            if self_time &lt;= other_time:
                times.append(self_time)
                values.append(eval(f&quot;self.values[self_it].{operator}(other[self_time])&quot;))
                self_it += 1
                if self_time == other_time:
                    other_it += 1
                    other_time = other.times[other_it]
                self_time = self.times[self_it]
                
            else:
                times.append(other_time)
                values.append(eval(f&quot;other.values[other_it].{operator}(self[other_time])&quot;))
                other_it += 1 

def __mul__(self, other):
    if isinstance(other, PiecewiseConstant):
        return self.apply_binary_operator(&quot;__mul__&quot;, other)

答案1

得分: 2

以下是代码部分的翻译:

from typing import Callable
from operator import mul, add


class PiecewiseConstant:

    def apply_binary_operator(self, operator: Callable, other: PiecewiseConstant) -> PiecewiseConstant:
        
        if self.times == other.times:
            return PiecewiseConstant(self.times, operator(self.values, other.values)) 
            
        times = []
        values = []

        self_it, self_times = enumerate(self.times)
        other_it, other_times = enumerate(other.times)
    
        try:
            self_it, self_time = next(self_times)
            other_it, other_time = next(other_times)
            while True: 
                if self_time <= other_time:
                    times.append(self_time)
                    op1 = self.values[self_it]
                    op2 = other[self_time]
                    self_it, self_time = next(self_times)
                    if self_time == other_time:
                        other_it, other_time = next(other_times)
                else:
                    times.append(other_time)
                    op1 = other.values[other_it]
                    op2 = self[other_time]
                    other_it, other_time = next(other_times)
                    
                values.append(operator(op1, op2))
        except StopIteration:
            pass
        
        return self.__class__(np.array(times), np.array(values))
        

    def __mul__(self, other):
        if isinstance other, PiecewiseConstant:
            return self.apply_binary_operator(mul, other)
        ...

    def __add__(self, other):
        if isinstance(other, PiecewiseConstant):
            return self.apply_binary_operator(add, other)
        ...

希望这些翻译对您有所帮助。如果您有任何其他问题,请随时提问。

英文:

In Python there is no syntax-bound polymorphism, as you ask, although similar behavior can be achieved via decorators. functools.singledispatch being one (but I think it would not work with methods, because it just looks at the first argument, you'd have to build your own or use a 3rdy party lib).

Anyway, this is more or less a digression: the more common pattern is an if/elif block to select the arguments inside your method - as you found out.

You also found out about isinstance which is way better than type(obj) == mytype, which is good. You can do from numbers import Number and then use isinstance(obj, Number) to match all numeric types (int, float, complex, Fraction, Decimal and custom defined classes)

Also, from Python 3.10 there is the match/case construct if one happens to dislike if/elif. As it is more verbose, and doesn't really add functionality in this case, keeping if/elif is simpler.

Now, getting into the "good parts": you don't have to pass the operator around as a string, and then resort to "eval" to perform your computations. Functions are first class objects, remember? you can either pass an op = lambda x, y: x * y as your operator to the method, or, to be even more elegant and a bit more performant, you can do from operator import mul, and pass that around.

Also, the assert in the beginning of your function is not really a common practice in Python: it will blow at runtime, not at compile time, and as such, you can just leave the guard off, and let it blow up whenever the passed in argumentsobjects don't meet one of the interface requirements in the code itself. You will get a TypeError instead - but with the advantage that if someone builds a type that is compatible with your timeseries for the purposes of this operation, it will be able to benefit from this code: that is calling "duck typing".
The better practice to assert what is being passed to the function would be, rather than the "assert" statement, annotate the whole code and use a tool for static type checking, such as mypy.

It is a bit bureaucratic for 90% of end-uses of the language (as in contrast to writing lib/framework code that will be used by others), but then, so is inserting the assert statements.

That said, I rewrote some parts of your code with more idiomatic Python -
One thing that I was in doubt to change is to use the "except StopIteration" in the large block instead of the specific "while" condition: if you prefere that, the iter/next protocol accept a sentinel value you can test for in the while (check the historic editions of this post to see the example).

from typing import Callable
from operator import mul, add


class PiecewiseConstant:

    # bellow, lightweight typehinting in the parameter annotations which are already useful
    # for a human glancing at the code. 
    def apply_binary_operator(self, operator: Callable, other: PiecewiseConstant) -&gt; PiecewiseConstant:
        
        if self.times == other.times:
            # if self.values is a custom class, the operators from operator module
            # will also call their respective dunder methods (__mul__ and __add__)
            return PiecewiseConstant(self.times, operator(self.values, other.values)) 
            # otherwise, if &quot;.values&quot; is a plain list, this will work:
            
            return PiecewiseConstant(self.times, [operator(own_v, other_v) for own_v, other_v in zip(self.values, other.values)]) 
            # these, of course, assuming your __init__ has  a provision when getting an actual object
            # and not a string to be evalued
        
        # there is a &quot;return&quot; when the `if` matches above. No need for
        # an extra identation level with an `else` block here:
        times = []
        values = []

        # Using the iterator protocol
        #  This allows changing all &quot;index increment&quot; and &quot;fetch element at index&quot; code
        # ocurrences for a single &quot;next&quot; call: your example were missing
        # one &quot;fetch element at index&quot; in one of the occurrences.
        self_it, self_times = enumerate(self.times)
        other_it, other_times = enumerate(other.times)
    
        try:

            self_it, self_time = next(self_times)
            other_it, other_time = next(other_times)
            while True: # we exit when one of the series is exhausted
                if self_time &lt;= other_time:
                    times.append(self_time)
                    op1 = self.values[self_it]
                    op2 = other[self_time]
                    self_it, self_time = next(self_times)
                    # I don&#39;t know about the precisions/noise you
                    # have in your data - but you might want to check
                    # &quot;math.isclose&quot; instead of &quot;==&quot; here: https://docs.python.org/3/library/math.html#math.isclose
                    if self_time == other_time:
                        other_it, other_time = next(other_times)
                else:
                    times.append(other_time)
                    op1 = other.values[other_it]
                    op2 = self[other_time]
                    other_it, other_time = next(other_times)
                    
                # refactoring to a single append is optional, of course,
                # but whenever there is code &quot;doing things&quot; I prefer
                # to have it in a single place if there is an option.
                values.append(operator(op1, op2))
        except StopIteration:
            # whenever a series tries to advance beyond its end,
            # a stopIteration is raised when &quot;next&quot; is called.
            # 
            # this is the _same_ protocol that is used in Python&#39;s
            # native &quot;for&quot; loops. The custom stepping of this task
            # requires us to make the handling explicit
            pass
        #  return code missing in your snippet.- guessing it would be:
        return self.__class__(np.array(times), np.array(values))
        # using &quot;self.__class__ allows the code to work effectively if your class
        # is inherited: you return instances of the subclass, instead of
        # hardcoding the class where this method is defined.
            

    def __mul__(self, other):
        if isinstance(other, PiecewiseConstant):
            return self.apply_binary_operator(mul, other)
        ...

    def __add__(self, other):
        if isinstance(other, PiecewiseConstant):
            return self.apply_binary_operator(add, other)
        ...

huangapple
  • 本文由 发表于 2023年4月4日 18:20:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/75928216.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定