将带有UTC偏移的日期时间字符串解析为time_t。

huangapple go评论65阅读模式
英文:

Parsing datetime string with UTC offset to time_t

问题

这个问题在这个帖子中有所提示,但是那个问题的答案并没有解决这个问题,我在各处找到的建议和提示也有冲突。

我的问题相对简单,但在深入研究时,我有点困惑。

假设我有一个格式如下的字符串:2023-06-07 03:04:56 -0700

目标是将其规范化为一个时期时间戳(C中的time_t)。我以为这应该很简单,但似乎不是。这里的问题似乎是末尾的-0700

似乎strptime(3)忽略了%z的修改,可能,也许(再次强调,关于如何在不同实现中使用它,我得到了不同的报告)。值得一提的是,我正在使用Linux/glibc,所以我更关心它是否在那里工作,而不是它是否在C标准中。

稍微尝试了一下,我觉得strptime确实忽略了时区偏移。struct tm中的小时只是字符串中的小时。小时不会根据时区偏移进行修改。据说非标准的tm_gmoff成员是用来处理这个的,但是当我读取它时,我似乎只得到一个巨大的值,绝对比任何UTC偏移都大得多,所以我不确定如何处理它。

例如:

#define _XOPEN_SOURCE

#include <stdio.h>
#include <string.h>
#include <time.h>

int main()
{
    struct tm tm;
    time_t epoch;
    char buf[40];

    strcpy(buf, "2023-06-07 03:04:56 -0700");
    memset(&tm, 0, sizeof(tm));

    strptime(buf, "%Y-%m-%d %H:%M:%S %z", &tm);
    printf("Parsed datetime %s (hour %d, offset %lu)\n", buf, tm.tm_hour, tm.__tm_gmtoff);
    tm.tm_isdst = -1;
    setenv("TZ", "US/Eastern");
    epoch = mktime(&tm);
    printf("Parsed datetime -> epoch %lu\n", epoch); // 7:04AM UTC
    epoch = timegm(&tm);
    printf("Parsed datetime -> epoch %lu\n", epoch); // 3:04AM UTC
    return 0;
}

在https://www.onlinegdb.com/online_c_compiler上运行时,会得到:

Parsed datetime 2023-06-07 03:04:56 -0700 (hour 3, offset 18446744073709526416)
Parsed datetime -> epoch 1686121496
Parsed datetime -> epoch 1686107096

请注意,字符串中的-0700偏移是任意的,系统上的本地时区也是任意的。例如,-0700是太平洋时间,但系统可能位于东部时间,实际上这与问题完全无关(即不应该使用本地时区进行转换,因为它与问题无关 - 应该使用偏移的时区,而且重要的是,本地时区不应该破坏答案)。

在上面的示例中,正确的答案是10:04 AM UTC(显然字符串应该转换为这个)。盲目使用mktime会得到错误的答案,而timegm更加错误。问题似乎是没有考虑到偏移。使用timegm的第二个答案本来是正确的,如果struct tm中添加了+7小时的偏移,或者如果timegm根据struct tm中的某些内容(例如tm_gmtoff)将+7小时添加到答案中。但似乎没有发生这两种情况。

除非编写一个手动函数来解析时间字符串中的%z并手动将此偏移添加到time_t中,否则是否有更好的“内置”方法来使用标准函数完成此操作?(在这里,可移植性并不是特别重要,只要在glibc中工作即可。)考虑到这似乎是一种非常常见的转换类型,我认为必须有一种正确的方法来做到这一点,而不需要手动进行计算,使用gmtime。我认为这就是tm_gmtoff的用途,但似乎不是这样 - 我有什么遗漏吗?

英文:

This question is hinted at in this one, but the answer to that question doesn't answer this question at all, and I've conflicting suggestions and hints scattered around.

My problem is relatively simple, but in digging into it, I'm getting a bit tripped up.

Suppose I have a string in a format like this: 2023-06-07 03:04:56 -0700

The goal is to normalize this into an epoch timestamp (time_t in C). I assumed this would be simple enough, but it seems not. The gotcha here seems to be the -0700 at the end.

It seems that strptime(3) ignores the %z modified, possibly, maybe (again, I've conflicting reports as to how this is used, in different implementations, etc.). FWIW, I'm using Linux/glibc so I more care about whether it works there, not that it's not in the C standard.

Playing around with it a little bit, it seemed to me like strptime does ignore the timezone offset. The hour in the struct tm is simply the hour in the string. The hour isn't modified based on the timezone offset at all. Supposedly that's what the non-standard tm_gmoff member is for, but I seem to just get a gigantic value when reading that that is definitely much larger than any UTC offset in seconds, so I'm not sure what to make of that either.

As an example:

#define _XOPEN_SOURCE

#include &lt;stdio.h&gt;
#include &lt;string.h&gt;
#include &lt;time.h&gt;

int main()
{
    struct tm tm;
    time_t epoch;
    char buf[40];

    strcpy(buf, &quot;2023-06-07 03:04:56 -0700&quot;);
	memset(&amp;tm, 0, sizeof(tm));

	strptime(buf, &quot;%Y-%m-%d %H:%M:%S %z&quot;, &amp;tm);
	printf(&quot;Parsed datetime %s (hour %d, offset %lu)\n&quot;, buf, tm.tm_hour, tm.__tm_gmtoff);
	tm.tm_isdst = -1;
	setenv(&quot;TZ&quot;, &quot;US/Eastern&quot;);
	epoch = mktime(&amp;tm);
    printf(&quot;Parsed datetime -&gt; epoch %lu\n&quot;, epoch); // 7:04AM UTC
    epoch = timegm(&amp;tm);
    printf(&quot;Parsed datetime -&gt; epoch %lu\n&quot;, epoch); // 3:04AM UTC
    return 0;
}

when run on https://www.onlinegdb.com/online_c_compiler, gives:

Parsed datetime 2023-06-07 03:04:56 -0700 (hour 3, offset 18446744073709526416)
Parsed datetime -&gt; epoch 1686121496
Parsed datetime -&gt; epoch 1686107096

Note that -0700 offset in the string is arbitrary, and the local time zone on the system is also arbitrary. For example, -0700 is Pacific Time, but the system could be in Eastern Time, which is actually completely irrelevant to the problem (i.e. the local time zone should not be used in the conversion, since it's irrelevant - the time zone of the offset should be used instead - and importantly, the local time zone should not mess up the answer).

Above, the correct answer is 10:04AM UTC (what the string obviously should convert to). Blindly using mktime gives the wrong answer, and timegm is even more off. The problem seems to be that the offset is not taken into account here. The second answer using timegm would be correct, if the struct tm had +7 hours added to it for the offset, or if timegm added +7 hours to the answer based on something in the struct tm, such as tm_gmtoff. But neither of those things seems to happen.

Short of writing a manual function to parse the %z in the time string and manually add this offset to the time_t, is there a better "builtin" way of doing this with standard functions? (Portability isn't super important here, as long as it works in glibc.) Given this would seem to be a very common type of conversion, I'm thinking there must be a way to do this properly without manually doing calculations, using gmtime. I thought this was what tm_gmtoff was for but it seems otherwise - am I missing something here?

答案1

得分: 0

以下是已翻译的部分:

  1. __tm_gmtoff 是有符号的,所以用 %lu 打印时会出现错误。
  2. __tm_gmtoff 被正确设置(例如 -7 * 3600)。
  3. 执行 setenv("TZ",...) 不起作用。它使用系统设置的本地时区(例如,-0700 是美国太平洋(?)夏令时,但我得到的是 -0400(美国东部夏令时)。
  4. timegm 会忽略 __tm_gmtoff
  5. 在 Linux/glibc 上,符号是 tm_gmtoff(据我所知)。
  6. 更好的方法是使用 timegm 并手动应用 tm_gmtoff 以获取正确的时区。

这是代码的一部分,没有翻译:

//#define _XOPEN_SOURCE
#define _GNU_SOURCE

#include &lt;stdio.h&gt;
#include &lt;stdlib.h&gt;
#include &lt;string.h&gt;
#include &lt;time.h&gt;

// ...(省略部分代码)...

int
main()
{
    char buf[40];

    // 太平洋时间???
    strcpy(buf, "2023-06-07 03:04:56 -0700");

    orig(buf);
    fix1(buf);
    fix2(buf);

    return 0;
}

这是程序的输出部分,没有翻译:

--------------------------------------------------------------------------------
ORIG:

BUF: 2023-06-07 03:04:56 -0700
Parsed datetime 2023-06-07 03:04:56 -0700 (hour 3, offset 18446744073709526416)
TMX: 2023/06/07-03:04:56 (-25200/-7) (from strptime)
Parsed mktime -&gt; epoch 1686121496
Parsed timegm -&gt; epoch 1686107096
diff = 14400 (4.000)

--------------------------------------------------------------------------------
FIX1:

BUF: 2023-06-07 03:04:56 -0700
TMX: 2023/06/07-03:04:56 (-25200/-7) (from strptime)

TOD: 1686121496 (from mktime)
TMX: 2023/06/07-03:04:56 (-14400/-4) (from mktime)

TOD: 1686107096 (from timegm)
TMX: 2023/06/07-03:04:56 (0/0) (from timegm)
diff = 14400 (4.000)

--------------------------------------------------------------------------------
FIX2:

BUF: 2023-06-07 03:04:56 -0700
TMX: 2023/06/07-03:04:56 (-25200/-7) (from strptime)

TOD: 1686107096 (from timegm)
TMX: 2023/06/07-03:04:56 (0/0) (from timegm)

TOD: 1686132296 (from timegm+offset)
TMX: 2023/06/07-10:04:56 (0/0) (from timegm+offset)
TMX: 2023/06/07-03:04:56 (-25200/-7) (from localtime_r)
diff = 0 (0.000)

请注意,这些都是代码和输出的原始部分,没有进行翻译。

英文:

A few issues ...

  1. The __tm_gmtoff is signed [so it printed incorrectly] with %lu
  2. __tm_gmtoff is set correctly (e.g. -7 * 3600).
  3. Doing setenv(&quot;TZ&quot;,...) does not work. It uses the local timezone set by the system. (e.g. -0700 is US/Pacific(?) DST but I got -0400 (US/Eastern DST).
  4. timegm will ignore __tm_gmtoff
  5. On linux/glibc, the symbol is tm_gmtoff [AFAICT].
  6. Better to use timegm and apply tm_gmtoff manually to get the correct timezone.

Here is the somewhat corrected code (in stages). It may still be broken. Important to read the comments:

//#define _XOPEN_SOURCE
#define _GNU_SOURCE

#include &lt;stdio.h&gt;
#include &lt;stdlib.h&gt;
#include &lt;string.h&gt;
#include &lt;time.h&gt;

void
sepline(const char *tag)
{

	printf(&quot;\n&quot;);
	for (int col = 1;  col &lt;= 80;  ++col)
		putchar(&#39;-&#39;);
	printf(&quot;\n&quot;);
	printf(&quot;%s:\n&quot;,tag);
	printf(&quot;\n&quot;);
}

void
tmshow(const struct tm *tm,const char *tag)
{

	printf(&quot;TMX: %4.4d/%2.2d/%2.2d-%2.2d:%2.2d:%2.2d (%ld/%ld) (from %s)\n&quot;,
		tm-&gt;tm_year + 1900,tm-&gt;tm_mon + 1,tm-&gt;tm_mday,
		tm-&gt;tm_hour,tm-&gt;tm_min,tm-&gt;tm_sec,tm-&gt;tm_gmtoff,tm-&gt;tm_gmtoff / 3600,
		tag);
}

void
todshow(time_t tod,int gmtflg,const char *tag)
{
	struct tm tm;

	if (gmtflg)
		gmtime_r(&amp;tod,&amp;tm);
	else
		localtime_r(&amp;tod,&amp;tm);

	printf(&quot;\n&quot;);
	printf(&quot;TOD: %ld (from %s)\n&quot;,tod,tag);
	tmshow(&amp;tm,tag);
}

void
orig(const char *buf)
{
	struct tm tm;
	memset(&amp;tm, 0, sizeof(tm));

	sepline(&quot;ORIG&quot;);

	printf(&quot;BUF: %s\n&quot;,buf);
	strptime(buf, &quot;%Y-%m-%d %H:%M:%S %z&quot;, &amp;tm);
	printf(&quot;Parsed datetime %s (hour %d, offset %lu)\n&quot;,
		buf, tm.tm_hour, tm.tm_gmtoff);

	tm.tm_isdst = -1;
	setenv(&quot;TZ&quot;, &quot;US/Eastern&quot;, 1);
	tmshow(&amp;tm,&quot;strptime&quot;);

	time_t epoch_mktime = mktime(&amp;tm);
	printf(&quot;Parsed mktime -&gt; epoch %lu\n&quot;, epoch_mktime);	// 7:04AM UTC

	time_t epoch_timegm = timegm(&amp;tm);
	printf(&quot;Parsed timegm -&gt; epoch %lu\n&quot;, epoch_timegm);	// 3:04AM UTC

	time_t diff = epoch_mktime - epoch_timegm;
	printf(&quot;diff = %ld (%.3f)\n&quot;,diff,diff / 3600.0);
}

void
fix1(const char *buf)
{
	struct tm tm;
	memset(&amp;tm, 0, sizeof(tm));

	sepline(&quot;FIX1&quot;);

	printf(&quot;BUF: %s\n&quot;,buf);
	strptime(buf, &quot;%Y-%m-%d %H:%M:%S %z&quot;, &amp;tm);
#if 0
	printf(&quot;Parsed datetime %s (hour %d, offset %ld/%ld)\n&quot;,
		buf, tm.tm_hour, tm.tm_gmtoff, tm.tm_gmtoff / 3600);
#endif

	tm.tm_isdst = -1;
	//setenv(&quot;TZ&quot;, &quot;US/Eastern&quot;, 1);
	unsetenv(&quot;TZ&quot;);
	tmshow(&amp;tm,&quot;strptime&quot;);

	time_t epoch_mktime = mktime(&amp;tm);
	//printf(&quot;Parsed mktime -&gt; epoch %lu\n&quot;, epoch_mktime);	// 7:04AM UTC
	todshow(epoch_mktime,0,&quot;mktime&quot;);

	time_t epoch_timegm = timegm(&amp;tm);
	//printf(&quot;Parsed timegm -&gt; epoch %lu\n&quot;, epoch_timegm);	// 3:04AM UTC
	todshow(epoch_timegm,1,&quot;timegm&quot;);

	time_t diff = epoch_mktime - epoch_timegm;
	printf(&quot;diff = %ld (%.3f)\n&quot;,diff,diff / 3600.0);
}

void
fix2(const char *buf)
{
	struct tm tm;
	memset(&amp;tm, 0, sizeof(tm));

	sepline(&quot;FIX2&quot;);

	printf(&quot;BUF: %s\n&quot;,buf);
	strptime(buf, &quot;%Y-%m-%d %H:%M:%S %z&quot;, &amp;tm);
	tmshow(&amp;tm,&quot;strptime&quot;);

	// NOTE: timegm ignores this -- so remember it
	time_t offset = tm.tm_gmtoff;

	//tm.tm_gmtoff = 0;
	time_t epoch_timegm = timegm(&amp;tm);
	todshow(epoch_timegm,1,&quot;timegm&quot;);

	// adjust for timezone -- this produces correct GMT
	todshow(epoch_timegm - offset,1,&quot;timegm+offset&quot;);

// NOTE/BUG: setting TZ does _not_ work
#if 0
	time_t epoch_mktime = epoch_timegm;
	epoch_mktime -= offset;
	setenv(&quot;TZ&quot;, &quot;US/Pacific&quot;, 1);
	localtime_r(&amp;epoch_mktime,&amp;tm);
#endif
#if 1
	time_t epoch_mktime = epoch_timegm;
	//epoch_mktime += offset;
	//epoch_mktime += offset;
	gmtime_r(&amp;epoch_mktime,&amp;tm);
	tm.tm_gmtoff += offset;
	//tm.tm_gmtoff += offset;
#endif

	//printf(&quot;Parsed mktime -&gt; epoch %lu\n&quot;, epoch_mktime);	// 7:04AM UTC
	tmshow(&amp;tm,&quot;localtime_r&quot;);

	time_t diff = epoch_mktime - epoch_timegm;
	printf(&quot;diff = %ld (%.3f)\n&quot;,diff,diff / 3600.0);
}

int
main()
{
	char buf[40];

	// Pacific time???
	strcpy(buf, &quot;2023-06-07 03:04:56 -0700&quot;);

	orig(buf);
	fix1(buf);
	fix2(buf);

	return 0;
}

Here is the program output:


--------------------------------------------------------------------------------
ORIG:

BUF: 2023-06-07 03:04:56 -0700
Parsed datetime 2023-06-07 03:04:56 -0700 (hour 3, offset 18446744073709526416)
TMX: 2023/06/07-03:04:56 (-25200/-7) (from strptime)
Parsed mktime -&gt; epoch 1686121496
Parsed timegm -&gt; epoch 1686107096
diff = 14400 (4.000)

--------------------------------------------------------------------------------
FIX1:

BUF: 2023-06-07 03:04:56 -0700
TMX: 2023/06/07-03:04:56 (-25200/-7) (from strptime)

TOD: 1686121496 (from mktime)
TMX: 2023/06/07-03:04:56 (-14400/-4) (from mktime)

TOD: 1686107096 (from timegm)
TMX: 2023/06/07-03:04:56 (0/0) (from timegm)
diff = 14400 (4.000)

--------------------------------------------------------------------------------
FIX2:

BUF: 2023-06-07 03:04:56 -0700
TMX: 2023/06/07-03:04:56 (-25200/-7) (from strptime)

TOD: 1686107096 (from timegm)
TMX: 2023/06/07-03:04:56 (0/0) (from timegm)

TOD: 1686132296 (from timegm+offset)
TMX: 2023/06/07-10:04:56 (0/0) (from timegm+offset)
TMX: 2023/06/07-03:04:56 (-25200/-7) (from localtime_r)
diff = 0 (0.000)

huangapple
  • 本文由 发表于 2023年6月8日 08:51:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/76427956.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定