如何处理将数据拆分成天的应用程序在处理时区更改时?

huangapple go评论54阅读模式
英文:

How do apps that work with splitting data into days deal with timezone changes?

问题

现在这个应用程序会在每天结束后根据你为每天记录的被祝福的打喷嚏与未被祝福的打喷嚏的比例给你评分。假设你在某一天获得了90%以上的祝福成功率,你会得到一个小奖品。

但是如果你在3月23日凌晨2点记录了一个打喷嚏,然后飞往世界另一端的时区,此时已经是3月22日晚上11点,你突然在当地时间记录了前一天的打喷嚏。这就会修改你在之前时区的已经有得分的那一天的得分。

我能想到的解决办法都有很大的缺陷。比如,如果你将每个打喷嚏事件存储为协调世界时(UTC),然后根据新的时区调整显示,这可能会改变以前的日常得分。所有的打喷嚏都会有时间偏移,因此一些奖励可能会被追溯地取消。

如果你将打喷嚏与日期关联起来(例如,22/03/2023),那么当你飞往另一个时区时,那一天可能会往前挪一天。

这似乎是不可能解决的问题。然而,许多应用开发者都会遇到这个问题,因为有许多应用符合这个主题(例如 Duolingo)。

如何最好地解决这个问题呢?

英文:

Let's say an app tracks the number of times you blessed people sneezing during each day.

Now this app scores you every day after the day is over based on the proportion of blessed sneezes versus unblessed sneezes. Let's say you get a little prize on the day that you get 90%+ bless success rate.

But then what happens if you log as a sneeze at 2am on the 23rd of March and you fly across the world into a different timezone and its 11pm on the 22nd of March? Now suddenly you're logging sneezes for the previous day in local time. Now you're modifying the score for a day that already passed in the timezone you were previously in and already have a score for.

Every solution I can think of to this issue has major drawbacks.

For example, if you store every sneeze event in UTC and have the display adjust according to the new timezone, that might change previous daily scores. All the sneezes would be offset, and so some of your rewards might be retroactively retracted.

If instead you associate a sneeze with the Date (eg, 22/03/2023), then when you fly to another timezone, that day might go back one.

It seems impossible to solve. Yet, it must be something that many app developers encounter as there are many apps that fit this theme (e.g. Duolingo).

How can this problem best be approached?

答案1

得分: 1

在夏令时期间从一个时区旅行到另一个时区并不是常见情况。您应该设计系统以满足大多数情况(使系统尽可能简单),然后根据产品人员的输入逐步完善处理边缘情况的方法。

同时,请记住,每个设计决策都会伴随着牺牲。为了更好地支持X质量属性,您可能会在Y质量属性上“造成损害”。根据要求,您应该对可行选项进行权衡分析。您应该选择那种在技术和业务层面上都可接受的权衡方案。


更新#1

> 无论何时您前往不同的时区(例如,从美国到英国),您手机上的当地时间都会发生变化。您的应用程序将不得不以某种方式处理这种变化。对于具有“每日记录”等功能的应用程序,它们必须有一种处理这种变化的方法。我的问题是,解决此问题的标准方法是什么,对于流行的应用程序而言。

从报告的事件的角度来看,您所处的时区并不重要。以下事件发生在完全相同的时间:

UTC 偏移 当地时间
2022年11月23日11:00:01 GMT +2 2022年11月23日13:00:01
2022年11月23日11:00:01 GMT +3 2022年11月23日14:00:01

偏移量存储在以便能够以当地时间显示事件。从每日总计的角度来看,只有UTC才重要。


更新#2

根据新的评论,我假设我们正在谈论晚期事件。从处理的角度来看,特定事件比预期到达的晚。在流处理的情况下,通常会有窗口。一个晚期事件在W<sub>T+1</sub>到达,但它属于W<sub>T</sub>。在这种情况下,我们有一个宽限期,直到我们愿意重新计算W<sub>T</sub>。例如,如果您每分钟创建新的滚动窗口,那么宽限期可能为半分钟。

您使用了术语每日报告。基于此,我认为这里我们正在讨论批处理而不是流处理。假设我们在凌晨3点计算前一天的每日报告。在这种情况下,我们已经有了3小时的宽限期。

假设批处理已经完成,但我们仍然收到新的事件,这些事件应指示重新处理。在这里,您有多个选项,以下是两种最常见的选项:

  • 标记/排队那些需要重新处理的作业
  • 随着新事件的到来更新累积值

后者只能在我们可以假设以下情况下才能工作:

  • 有一种方法可以用于从已累积的数据和新事件计算新的聚合值
  • 该方法既不计算也不消耗资源
  • 预期的晚到数量被认为很低
  • 等等。

前者假设的事情较少。当一个新事件到达并导致已完成的作业重新处理时,我们应该将该作业标记为脏的。可以通过标记作业或将作业的ID放入重新处理队列中等方式来实现。这里的关键点是您支持重新处理,但仅限于一次(在宽限期结束时)。

如果您在凌晨3点进行了第一批处理,那么例如可以在上午8点进行第二批处理。这当然是一种权衡(时间表与准确性)。在奖励系统中,就像您提到的那样,在第二次处理结束时成就/奖杯将显示在用户的个人资料中。(您可以在StackOverflow中看到相同的延迟徽章。)

另一种选择是执行区域批处理。与其在完全相同的时间计算所有报告,不如让不同的数据中心在它们当地的凌晨3点进行计算。如果您的客户也是区域感知的话,这种跟随太阳策略会很有效。

英文:

Travelling from one timezone to another during day light saving is not a common use case. You should design your system for the majority of use cases (to keep the system as simple as possible) and then iteratively refine the handling of the edge cases based on the input of product people.

Also please bear in mind each and every design decision comes with a sacrifice. In order to better support X quality attribute you might "cause harm" on Y quality attribute. Based on the requirements you should do a trade-off analysis against the viable options. You should choose that alternative where the trade-off is acceptable from technical and business perspective as well.


UPDATE #1

> Any time you travel to a different timezone (e.g., from USA to England), your local time on your phone will change. Your apps will have to deal with this change in some way. For apps that have "daily records" and such, they would have to have to a way to deal with the change. My question was what the standard approach to solving this problem is with popular applications.

From the reported event perspective it does not matter that you are in which timezone. The following events happened in the exact same time

UTC OFFSET LOCAL
23/11/2022 11:00:01 GMT+2 23/11/2022 13:00:01
23/11/2022 11:00:01 GMT+3 23/11/2022 14:00:01

The offset is stored to be able to display events in the local time. From daily aggregate perspective the UTC matters only.


UPDATE #2

Based on the new comments I assume we are talking about late events. From the processing perspective a given event arrived later than expected. In case of stream processing we usually have windows. A late event arrived in W<sub>T+1</sub> but it belongs to W<sub>T</sub>. In this case we have a grace period until we have the willingness to recompute the W<sub>T</sub>. For instance if you create new tumbling windows in every minute then the grace period could a half minute.

You have used the term daily report. Based on that I assume that here we are talking about batch processing rather than stream processing. Let's say we calculate the previous day's daily reports at 3am. In this case we already have grace period with 3 hours.

Let's say the batch processing is already done but we still receive new events which should indicate re-processing. Here you have multiple options, here are the two most common:

  • Flag / Queue those jobs which needs to be reprocessed
  • Update the accumulated value as new events arrive

The latter one could only work if we can assume the followings:

  • There is a method which can be used to calculate the new aggregate from the already accumulated data and a new event
  • The method is not computation and/or resource heavy
  • The anticipated late arrivals volume is considered low
  • etc.

The former one assumes less things. When a new event arrives which induce the reprocessing of a finished job then we should mark that job as dirty. Either by flagging the job or by putting the job's id into a reprocess queue or ... The main point here is that you support a reprocessing but only once (at end of the grace period).

If you have done the first batch processing at 3am you can do the second at 8am for instance. Which is a trade-off of course (timelines vs accurateness). In a rewarding system, like the one you mentioned, the achievement / trophy would appear on the user's profile at the end of the second processing. (You can see the same delayed badges here at StackOverflow.)

Yet another alternative could be to perform regional batch processing. Rather than computing all the reports at the exact same time, different data centres can do it in their local 3am. This follow the sun strategy works well if your customers are also region-aware.

huangapple
  • 本文由 发表于 2023年3月23日 09:24:04
  • 转载请务必保留本文链接:https://go.coder-hub.com/75818537.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定