英文:
Create pandas data from one date and timestrings without colons
问题
Sure, here's the translated code portion:
我想从包含[GNSS][1]时间的文件中读取时间,其中还包括许多其他数据。预期结果是一个带有日期数据类型的pandas数组(索引或系列),日期是应用于数据集的日期。
在中间步骤中,我有一个时间戳列表,格式为`hhmmss`,其中包含一些无效数据:
```python
import datetime as dt
import pandas as pd
date = dt.date(2023, 5, 9)
times_from_file = ["", "", "123456", "123457", "123458", "123459", "123500"]
我可以使用以下冗长的代码片段获得所需的输出:
datetimes = pd.to_datetime(
times_from_file, format="%H%M%S", errors="coerce"
).map(
lambda datetime: pd.NaT
if pd.isnull(datetime)
else dt.datetime.combine(date, datetime.time())
)
输出结果:
DatetimeIndex([ 'NaT', '2023-05-09 12:34:56',
'2023-05-09 12:34:57', '2023-05-09 12:34:58',
'2023-05-09 12:34:59', '2023-05-09 12:35:00'],
dtype='datetime64[ns]', freq=None)
然而,这看起来过于复杂。我希望可以使用pd.to_timedelta
解决,但不幸的是,它不允许传递格式字符串。即使是pandas.Index.map
的na_action
关键字也被忽略了-这就是我使用if pd.isnull(datetime)
的原因。
是否有更简单的方法来做到这一点,最好利用专门构建的Pandas函数或方法?
Please note that I have translated only the code portion as per your request, and I haven't provided an answer to the translation request.
<details>
<summary>英文:</summary>
I want to read times from a file that includes [GNSS][1] times, among a lot of other data. The expected result is a pandas array (Index or Series) with datetime datatype, with the date of the dataset applied.
In an intermediate step, I have a list of timestamps in the format `hhmmss` with some invalid data mixed in:
import datetime as dt
import pandas as pd
date = dt.date(2023, 5, 9)
times_from_file = [",,,,,,"¸ "123456", "123457", "123458", "123459", "123500"]
I can get the desired output with this lengthy code snippet:
datetimes = pd.to_datetime(
times_from_file, format="%H%M%S", errors="coerce"
).map(
lambda datetime: pd.NaT
if pd.isnull(datetime)
else dt.datetime.combine(date, datetime.time())
)
Output:
DatetimeIndex([ 'NaT', '2023-05-09 12:34:56',
'2023-05-09 12:34:57', '2023-05-09 12:34:58',
'2023-05-09 12:34:59', '2023-05-09 12:35:00'],
dtype='datetime64[ns]', freq=None)
However, this looks overly complicated. I was hoping this could be solved with [`pd.to_timedelta`](https://pandas.pydata.org/docs/reference/api/pandas.to_timedelta.html) instead but unfortunately that doesn't allow passing a format string. Even the `na_action` keyword of [`pandas.Index.map`][2] is ignored – that's why I used `if pd.isnull(datetime)` instead.
Is there a simpler way to do this, preferably leveraging purpose-built Pandas functions or methods?
[1]: https://en.wikipedia.org/wiki/Satellite_navigation
[2]: https://pandas.pydata.org/docs/reference/api/pandas.Index.map.html
</details>
# 答案1
**得分**: 1
将`times_from_file`转换为Series,如果它还不是的话:
```python
pd.to_datetime('2023-05-09 ' + pd.Series(times_from_file), format="%Y-%m-%d %H%M%S", errors='coerce')
英文:
Convert times_from_file
as a Series if it's not already the case:
>>> pd.to_datetime('2023-05-09 ' + pd.Series(times_from_file), format="%Y-%m-%d %H%M%S", errors='coerce')
0 NaT
1 2023-05-09 12:34:56
2 2023-05-09 12:34:57
3 2023-05-09 12:34:58
4 2023-05-09 12:34:59
5 2023-05-09 12:35:00
dtype: datetime64[ns]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论