获取到 ValueError: 时间数据与格式“%Y-%m-%d %H:%M:%S.%f%z”不匹配的错误。

huangapple go评论77阅读模式
英文:

Getting ValueError: time data doesn't match format "%Y-%m-%d %H:%M:%S.%f%z" error

问题

我正在尝试在 Pandas 数据帧中从 start_time_ns 减去 end_time_ns,方法如下:

df['time'] = pd.to_datetime(df['end_time_ns']) - pd.to_datetime(df['start_time_ns'])

其中时间单位为纳秒。我使用以下方式读取 CSV 文件:

pd.read_csv(filename, parse_dates=[2, 3], chunksize=chunksize)

其中列 2 和列 3 分别是 start_time_nsend_time_ns。这个减法在第一个数据块上运行正常,但在一个大小约为 30GB 的 CSV 文件上应用时出现错误。错误信息如下:

Traceback (most recent call last):
  File "2rg.py", line 17, in <module>
    df['time'] = pd.to_datetime(df['end_time_ns']) - pd.to_datetime(df['start_time_ns'])
  File "/home/nnazarov/.local/lib/python3.8/site-packages/pandas/core/tools/datetimes.py", line 1050, in to_datetime
    values = convert_listlike(arg._values, format)
  File "/home/nnazarov/.local/lib/python3.8/site-packages/pandas/core/tools/datetimes.py", line 453, in _convert_listlike_datetimes
    return _array_strptime_with_fallback(arg, name, utc, format, exact, errors)
  File "/home/nnazarov/.local/lib/python3.8/site-packages/pandas/core/tools/datetimes.py", line 484, in _array_strptime_with_fallback
    result, timezones = array_strptime(arg, fmt, exact=exact, errors=errors, utc=utc)
  File "pandas/_libs/tslibs/strptime.pyx", line 530, in pandas._libs.tslibs.strptime.array_strptime
  File "pandas/_libs/tslibs/strptime.pyx", line 351, in pandas._libs.tslibs.strptime.array_strptime
ValueError: time data "2023-06-20 20:41:11+00:00" doesn't match format "%Y-%m-%d %H:%M:%S.%f%z", at position 816780. You might want to try:
    - passing `format` if your strings have a consistent format;
    - passing `format='ISO8601'` if your strings are all ISO8601 but not necessarily in exactly the same format;
    - passing `format='mixed'`, and the format will be inferred for each element individually. You might want to use `dayfirst` alongside this.

我还粘贴了第 816780 行的信息:

NGN,NGN,2023-06-20 20:30:08.305255+00:00,2023-06-20 20:41:08.317472+00:00,131.243.51.211,144.195.208.70,49851,8801,active,UDP,503,0.000107876,"[0, 52, 46, 405, 0, 0, 0, 0]"
NGN,NGN,2023-06-20 20:40:53.903338+00:00,2023-06-20 20:41:11+00:00,2001:400:0:40::200:205,2001:400:211:81::d1,161,56640,idle,UDP,503,0.0001688016,"[0, 0, 0, 503, 0, 0, 0, 0]"
NGN,NGN,2023-06-20 20:40:53.890268+00:00,2023-06-20 20:41:10.986850+00:00,2001:400:211:81::d1,2001:400:0:40::200:205,56640,161,idle,UDP,503,4.6164e-05,"[0, 503, 0, 0, 0, 0, 0, 0]"

你可以如何解决这个问题?

英文:

I am trying to subtract start_time_ns from end_time_ns in the pandas data frame by using:
df[&#39;time&#39;] = pd.to_datetime(df[&#39;end_time_ns&#39;]) - pd.to_datetime(df[&#39;start_time_ns&#39;]) which are given in nanoseconds.
I am reading the csv as pd.read_csv(filename,parse_dates=[2, 3],chunksize=chunksize) where column 2 and 3 are start_time_ns and end_time_ns respectively.
The subtraction works fine for the first chunk, but getting error when applying on 30~GB CSV file. The error I get is :

Traceback (most recent call last):
  File &quot;2rg.py&quot;, line 17, in &lt;module&gt;
    df[&#39;time&#39;] = pd.to_datetime(df[&#39;end_time_ns&#39;]) - pd.to_datetime(df[&#39;start_time_ns&#39;])
  File &quot;/home/nnazarov/.local/lib/python3.8/site-packages/pandas/core/tools/datetimes.py&quot;, line 1050, in to_datetime
    values = convert_listlike(arg._values, format)
  File &quot;/home/nnazarov/.local/lib/python3.8/site-packages/pandas/core/tools/datetimes.py&quot;, line 453, in _convert_listlike_datetimes
    return _array_strptime_with_fallback(arg, name, utc, format, exact, errors)
  File &quot;/home/nnazarov/.local/lib/python3.8/site-packages/pandas/core/tools/datetimes.py&quot;, line 484, in _array_strptime_with_fallback
    result, timezones = array_strptime(arg, fmt, exact=exact, errors=errors, utc=utc)
  File &quot;pandas/_libs/tslibs/strptime.pyx&quot;, line 530, in pandas._libs.tslibs.strptime.array_strptime
  File &quot;pandas/_libs/tslibs/strptime.pyx&quot;, line 351, in pandas._libs.tslibs.strptime.array_strptime
ValueError: time data &quot;2023-06-20 20:41:11+00:00&quot; doesn&#39;t match format &quot;%Y-%m-%d %H:%M:%S.%f%z&quot;, at position 816780. You might want to try:
    - passing `format` if your strings have a consistent format;
    - passing `format=&#39;ISO8601&#39;` if your strings are all ISO8601 but not necessarily in exactly the same format;
    - passing `format=&#39;mixed&#39;`, and the format will be inferred for each element individually. You might want to use `dayfirst` alongside this.

I am pasting line 816780 as well as an information :

NGN,NGN,2023-06-20 20:30:08.305255+00:00,2023-06-20 20:41:08.317472+00:00,131.243.51.211,144.195.208.70,49851,8801,active,UDP,503,0.000107876,&quot;[0, 52, 46, 405, 0, 0, 0, 0]&quot;
NGN,NGN,2023-06-20 20:40:53.903338+00:00,2023-06-20 20:41:11+00:00,2001:400:0:40::200:205,2001:400:211:81::d1,161,56640,idle,UDP,503,0.0001688016,&quot;[0, 0, 0, 503, 0, 0, 0, 0]&quot;
NGN,NGN,2023-06-20 20:40:53.890268+00:00,2023-06-20 20:41:10.986850+00:00,2001:400:211:81::d1,2001:400:0:40::200:205,56640,161,idle,UDP,503,4.6164e-05,&quot;[0, 503, 0, 0, 0, 0, 0, 0]&quot;

How can I resolve the issue?

答案1

得分: 2

IIUC,您混合了带有和不带有指定UTC偏移的日期时间。 [mre]:

import pandas as pd
print(pd.to_datetime(["2023-06-20 20:41:11+00:00", 
                      "2023-06-20 20:41:11",
                      "2023-06-20 20:41:11.890268+00:00"]))

出现错误:

ValueError: 时间数据"2023-06-20 20:41:11"不匹配格式"%Y-%m-%d %H:%M:%S%z"位于位置1您可以尝试
    - 如果您的字符串具有一致的格式则传递"format"参数
    - 如果您的字符串都是ISO8601格式但不一定完全相同则传递"format='ISO8601'"
    - 传递"format='mixed'"并且格式将分别推断每个元素您可能需要同时使用"dayfirst"参数

在pandas v2中,您可以使用关键字utc=Trueformat="ISO8601"的组合来避免错误:

print(pd.__version__)
# 2.0.3

print(
      pd.to_datetime(["2023-06-20 20:41:11+00:00", 
                      "2023-06-20 20:41:11",
                      "2023-06-20 20:41:11.890268+00:00"],
                     format="ISO8601", utc=True)
)

DatetimeIndex(['2023-06-20 20:41:11+00:00',
               '2023-06-20 20:41:11+00:00',
               '2023-06-20 20:41:11.890268+00:00'],
              dtype='datetime64[ns, UTC]', freq=None)
英文:

IIUC, you have mixed datetimes; with and without the UTC offset specified. [mre]:

import pandas as pd
print(pd.to_datetime([&quot;2023-06-20 20:41:11+00:00&quot;, 
                      &quot;2023-06-20 20:41:11&quot;,
                      &quot;2023-06-20 20:41:11.890268+00:00&quot;]))

errors with

ValueError: time data &quot;2023-06-20 20:41:11&quot; doesn&#39;t match format &quot;%Y-%m-%d %H:%M:%S%z&quot;, at position 1. You might want to try:
    - passing `format` if your strings have a consistent format;
    - passing `format=&#39;ISO8601&#39;` if your strings are all ISO8601 but not necessarily in exactly the same format;
    - passing `format=&#39;mixed&#39;`, and the format will be inferred for each element individually. You might want to use `dayfirst` alongside this.

With pandas v2, you could use a combination of keywords utc=True and format=&quot;ISO8601&quot; to avoid the error:

print(pd.__version__)
# 2.0.3

print(
      pd.to_datetime([&quot;2023-06-20 20:41:11+00:00&quot;, 
                      &quot;2023-06-20 20:41:11&quot;,
                      &quot;2023-06-20 20:41:11.890268+00:00&quot;],
                     format=&quot;ISO8601&quot;, utc=True)
)

DatetimeIndex([       &#39;2023-06-20 20:41:11+00:00&#39;,
                      &#39;2023-06-20 20:41:11+00:00&#39;,
               &#39;2023-06-20 20:41:11.890268+00:00&#39;],
              dtype=&#39;datetime64[ns, UTC]&#39;, freq=None)

huangapple
  • 本文由 发表于 2023年7月7日 07:41:42
  • 转载请务必保留本文链接:https://go.coder-hub.com/76633121.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定