从一个日期和时间字符串创建pandas数据,但不包括冒号。

huangapple go评论55阅读模式
英文:

Create pandas data from one date and timestrings without colons

问题

Sure, here's the translated code portion:

我想从包含[GNSS][1]时间的文件中读取时间其中还包括许多其他数据预期结果是一个带有日期数据类型的pandas数组索引或系列),日期是应用于数据集的日期

在中间步骤中我有一个时间戳列表格式为`hhmmss`,其中包含一些无效数据

```python
import datetime as dt
import pandas as pd

date = dt.date(2023, 5, 9)
times_from_file = ["", "", "123456", "123457", "123458", "123459", "123500"]

我可以使用以下冗长的代码片段获得所需的输出:

datetimes = pd.to_datetime(
    times_from_file, format="%H%M%S", errors="coerce"
).map(
    lambda datetime: pd.NaT
    if pd.isnull(datetime)
    else dt.datetime.combine(date, datetime.time())
)

输出结果:

DatetimeIndex([                'NaT', '2023-05-09 12:34:56',
               '2023-05-09 12:34:57', '2023-05-09 12:34:58',
               '2023-05-09 12:34:59', '2023-05-09 12:35:00'],
              dtype='datetime64[ns]', freq=None)

然而,这看起来过于复杂。我希望可以使用pd.to_timedelta解决,但不幸的是,它不允许传递格式字符串。即使是pandas.Index.mapna_action关键字也被忽略了-这就是我使用if pd.isnull(datetime)的原因。

是否有更简单的方法来做到这一点,最好利用专门构建的Pandas函数或方法?


Please note that I have translated only the code portion as per your request, and I haven't provided an answer to the translation request.

<details>
<summary>英文:</summary>

I want to read times from a file that includes [GNSS][1] times, among a lot of other data. The expected result is a pandas array (Index or Series) with datetime datatype, with the date of the dataset applied.

In an intermediate step, I have a list of timestamps in the format `hhmmss` with some invalid data mixed in:

import datetime as dt
import pandas as pd

date = dt.date(2023, 5, 9)
times_from_file = [",,,,,,"¸ "123456", "123457", "123458", "123459", "123500"]


I can get the desired output with this lengthy code snippet:

datetimes = pd.to_datetime(
times_from_file, format="%H%M%S", errors="coerce"
).map(
lambda datetime: pd.NaT
if pd.isnull(datetime)
else dt.datetime.combine(date, datetime.time())
)


Output:

DatetimeIndex([ 'NaT', '2023-05-09 12:34:56',
'2023-05-09 12:34:57', '2023-05-09 12:34:58',
'2023-05-09 12:34:59', '2023-05-09 12:35:00'],
dtype='datetime64[ns]', freq=None)


However, this looks overly complicated. I was hoping this could be solved with [`pd.to_timedelta`](https://pandas.pydata.org/docs/reference/api/pandas.to_timedelta.html) instead but unfortunately that doesn&#39;t allow passing a format string. Even the `na_action` keyword of [`pandas.Index.map`][2] is ignored – that&#39;s why I used `if pd.isnull(datetime)` instead.

Is there a simpler way to do this, preferably leveraging purpose-built Pandas functions or methods?


  [1]: https://en.wikipedia.org/wiki/Satellite_navigation
  [2]: https://pandas.pydata.org/docs/reference/api/pandas.Index.map.html

</details>


# 答案1
**得分**: 1

将`times_from_file`转换为Series,如果它还不是的话:

```python
pd.to_datetime('2023-05-09 ' + pd.Series(times_from_file), format="%Y-%m-%d %H%M%S", errors='coerce')
英文:

Convert times_from_file as a Series if it's not already the case:

&gt;&gt;&gt; pd.to_datetime(&#39;2023-05-09 &#39; + pd.Series(times_from_file), format=&quot;%Y-%m-%d %H%M%S&quot;, errors=&#39;coerce&#39;)

0                   NaT
1   2023-05-09 12:34:56
2   2023-05-09 12:34:57
3   2023-05-09 12:34:58
4   2023-05-09 12:34:59
5   2023-05-09 12:35:00
dtype: datetime64[ns]

huangapple
  • 本文由 发表于 2023年5月11日 19:31:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/76227172.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定