2023年7月10日 18:21:06go评论88阅读模式

英文:

how to fill missing seconds in pandas dataframe

问题

这是你要的翻译结果：

我有一个数据框，我想要在时间数据框中填充丢失的秒值，应该如何做到这一点
这是我的数据
df = pd.DataFrame({
'sec': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'Date': ['7/7', '0', '0', '7/7', '7/7', '0', '7/7', '7/7', '0', '0'],
'Time': ['12:47:30', '0', '0', '12:47:33', '12:47:34', '0', '12:47:36', '12:47:37', '0', '0'],
'rpm': [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
})
在时间列中，从12:47:30开始，0的位置应该是12:47:31。换句话说，我的期望输出是：
df = pd.DataFrame({
'sec': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'Date': ['7/7', '0', '0', '7/7', '7/7', '0', '7/7', '7/7', '0', '0'],
'Time': ['12:47:30', '12:47:31', '12:47:32', '12:47:33', '12:47:34', '12:47:35', '12:47:36', '12:47:37', '12:47:38', '12:47:39'],
'rpm': [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
})

英文:

I have a data frame I want to fill missing seconds values in Time data frame how to do that
this is my data

df = pd.DataFrame({
&#39;sec&#39;: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
&#39;Date&#39;: [&#39;7/7&#39;, &#39;0&#39;, &#39;0&#39;, &#39;7/7&#39;, &#39;7/7&#39;, &#39;0&#39;, &#39;7/7&#39;, &#39;7/7&#39;, &#39;0&#39;, &#39;0&#39;],
&#39;Time&#39;: [&#39;12:47:30&#39;, &#39;0&#39;, &#39;0&#39;, &#39;12:47:33&#39;, &#39;12:47:34&#39;, &#39;0&#39;, &#39;12:47:36&#39;, &#39;12:47:37&#39;, &#39;0&#39;, &#39;0&#39;],
&#39;rpm&#39;: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
})

In Time column after 12:47:30 in place of 0 it has to be 12:47:31. In other words, my expected output is:

df = pd.DataFrame({
&#39;sec&#39;: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
&#39;Date&#39;: [&#39;7/7&#39;, &#39;0&#39;, &#39;0&#39;, &#39;7/7&#39;, &#39;7/7&#39;, &#39;0&#39;, &#39;7/7&#39;, &#39;7/7&#39;, &#39;0&#39;, &#39;0&#39;],
&#39;Time&#39;: [&#39;12:47:30&#39;, &#39;12:47:31&#39;, &#39;12:47:32&#39;, &#39;12:47:33&#39;, &#39;12:47:34&#39;, &#39;12:47:35&#39;, &#39;12:47:36&#39;, &#39;12:47:37&#39;, &#39;12:47:38&#39;, &#39;12:47:39&#39;],
&#39;rpm&#39;: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
})

答案1

得分: 2

以下是您要翻译的内容：

首先创建`DatetimeIndex`，然后使用[`DataFrame.resample`](http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.resample.html)，最后设置列的值：
df.index = pd.to_datetime(df['Date'] + df['Time'].astype(str), 
                          format='%m/%d%H:%M:%S', 
                          errors='coerce')
out = df.resample('S').first()
out['Time'] = out.index.time
out['Date'] = out.index.strftime('%m/%d')
out['rpm'] = out['rpm'].fillna(0)
out['sec'] = out.groupby('Date').cumcount().add(1)
print(out)
out = out.reset_index(drop=True)
print(out)

另一种解决方法是使用Series.ffill进行日期的前向填充，同时使用GroupBy.cumcount和to_timedelta来创建非时间值的秒数：

dates = pd.to_datetime(df['Date'] + df['Time'].astype(str), 
                      format='%m/%d%H:%M:%S', 
                      errors='coerce')
sec = pd.to_timedelta(df.groupby(dates.notna().cumsum()).cumcount(), unit='s')
df['Time'] = dates.ffill().add(sec).dt.strftime('%H:%M:%S')
print(df)

英文:

Create DatetimeIndex first and then use DataFrame.resample, last set columns values:

df.index = pd.to_datetime(df[&#39;Date&#39;] + df[&#39;Time&#39;].astype(str), 
                          format=&#39;%m/%d%H:%M:%S&#39;, 
                          errors=&#39;coerce&#39;)
out = df.resample(&#39;S&#39;).first()
out[&#39;Time&#39;] = out.index.time
out[&#39;Date&#39;] = out.index.strftime(&#39;%m/%d&#39;)
out[&#39;rpm&#39;] = out[&#39;rpm&#39;].fillna(0)
out[&#39;sec&#39;] = out.groupby(&#39;Date&#39;).cumcount().add(1)
print (out)
                     sec   Date      Time  rpm
1900-07-07 12:47:30    1  07/07  12:47:30  0.0
1900-07-07 12:47:31    2  07/07  12:47:31  0.0
1900-07-07 12:47:32    3  07/07  12:47:32  0.0
1900-07-07 12:47:33    4  07/07  12:47:33  0.0
1900-07-07 12:47:34    5  07/07  12:47:34  0.0
1900-07-07 12:47:35    6  07/07  12:47:35  0.0
1900-07-07 12:47:36    7  07/07  12:47:36  0.0
1900-07-07 12:47:37    8  07/07  12:47:37  0.0

out = out.reset_index(drop=True)
print (out)
   sec   Date      Time  rpm
0    1  07/07  12:47:30  0.0
1    2  07/07  12:47:31  0.0
2    3  07/07  12:47:32  0.0
3    4  07/07  12:47:33  0.0
4    5  07/07  12:47:34  0.0
5    6  07/07  12:47:35  0.0
6    7  07/07  12:47:36  0.0
7    8  07/07  12:47:37  0.0

Another solution with forward filling dates by Series.ffill with add second for non times values created by GroupBy.cumcount and to_timedelta:

dates = pd.to_datetime(df[&#39;Date&#39;] + df[&#39;Time&#39;].astype(str), 
                          format=&#39;%m/%d%H:%M:%S&#39;, 
                          errors=&#39;coerce&#39;)
sec = pd.to_timedelta(df.groupby(dates.notna().cumsum()).cumcount(), unit=&#39;s&#39;)
df[&#39;Time&#39;] = dates.ffill().add(sec).dt.strftime(&#39;%H:%M:%S&#39;)
print (df)
   sec Date      Time  rpm
0    1  7/7  12:47:30  0.0
1    2    0  12:47:31  0.0
2    3    0  12:47:32  0.0
3    4  7/7  12:47:33  0.0
4    5  7/7  12:47:34  0.0
5    6    0  12:47:35  0.0
6    7  7/7  12:47:36  0.0
7    8  7/7  12:47:37  0.0
8    9    0  12:47:38  0.0
9   10    0  12:47:39  0.0

答案2

得分: 2

另一个可能的解决方案，使用线性插值来填充空时间：

from scipy.interpolate import interp1d
df['Time'] = pd.to_datetime(df['Time'], format='%H:%M:%S', errors='coerce')
df_nonan = df[['sec', 'Time']].dropna()
f = interp1d(df_nonan.iloc[:, 0], df_nonan.iloc[:, 1], fill_value='extrapolate')
df['Time'] = pd.to_datetime(f(df['sec']))
df['Time'] = df['Time'].dt.time

输出：

   sec Date      Time  rpm
0    1  7/7  12:47:30  0.0
1    2    0  12:47:31  0.0
2    3    0  12:47:32  0.0
3    4  7/7  12:47:33  0.0
4    5  7/7  12:47:34  0.0
5    6    0  12:47:35  0.0
6    7  7/7  12:47:36  0.0
7    8  7/7  12:47:37  0.0
8    9    0  12:47:38  0.0
9   10    0  12:47:39  0.0

英文:

Another possible solution, which uses linear interpolation to fill the null times:

from scipy.interpolate import interp1d
df[&#39;Time&#39;] = pd.to_datetime(df[&#39;Time&#39;], format=&#39;%H:%M:%S&#39;, errors=&#39;coerce&#39;)
df_nonan = df[[&#39;sec&#39;, &#39;Time&#39;]].dropna()
f = interp1d(df_nonan.iloc[:, 0], df_nonan.iloc[:, 1],
             fill_value=&#39;extrapolate&#39;)
df[&#39;Time&#39;] = pd.to_datetime(f(df[&#39;sec&#39;]))
df[&#39;Time&#39;] = df[&#39;Time&#39;].dt.time

Output:

   sec Date      Time  rpm
0    1  7/7  12:47:30  0.0
1    2    0  12:47:31  0.0
2    3    0  12:47:32  0.0
3    4  7/7  12:47:33  0.0
4    5  7/7  12:47:34  0.0
5    6    0  12:47:35  0.0
6    7  7/7  12:47:36  0.0
7    8  7/7  12:47:37  0.0
8    9    0  12:47:38  0.0
9   10    0  12:47:39  0.0

答案3

得分: 1

将Time列转换为.to_datetime，并在前一个时间上添加一秒，如下所示：

代码:

# 将'Time'列转换为datetime，将'0'值转换为NaT（不是时间）
df['Time'] = pd.to_datetime(df['Time'], format='%H:%M:%S', errors='coerce')
# 遍历'Time'列，将NaT值替换为前一个时间加一秒
previous_time = None
for i, time in enumerate(df['Time']):
    if pd.isnull(time):
        new_time = (previous_time + timedelta(seconds=1))
        df.at[i, 'Time'] = new_time
        previous_time = new_time
    else:
        previous_time = time
df['Time'] = df['Time'].apply(lambda x: x.strftime('%H:%M:%S'))

输出:

sec   Date   Time       rpm
0     1    7/7  12:47:30  0.0
1     2    0    12:47:31  0.0
2     3    0    12:47:32  0.0
3     4    7/7  12:47:33  0.0
4     5    7/7  12:47:34  0.0
5     6    0    12:47:35  0.0
6     7    7/7  12:47:36  0.0
7     8    7/7  12:47:37  0.0
8     9    0    12:47:38  0.0
9    10    0    12:47:39  0.0

英文:

Convert Time column .to_datetime and add one second to previous time, as show below

Code:

# Convert &#39;Time&#39; column to datetime and &#39;0&#39; values to NaT (Not a time)
df[&#39;Time&#39;] = pd.to_datetime(df[&#39;Time&#39;], format=&#39;%H:%M:%S&#39;, errors=&#39;coerce&#39;)
# Iterate over the &#39;Time&#39; column and replace NaT values 
# with the time by adding one second to the previous time
previous_time = None
for i, time in enumerate(df[&#39;Time&#39;]):
    if pd.isnull(time):
        new_time = (previous_time + timedelta(seconds=1))
        df.at[i, &#39;Time&#39;] = new_time
        previous_time = new_time
    else:
        previous_time = time
df[&#39;Time&#39;] = df[&#39;Time&#39;].apply(lambda x: x.strftime(&#39;%H:%M:%S&#39;))

Output:

sec	Date	Time	rpm
0	1	7/7	12:47:30	0.0
1	2	0	12:47:31	0.0
2	3	0	12:47:32	0.0
3	4	7/7	12:47:33	0.0
4	5	7/7	12:47:34	0.0
5	6	0	12:47:35	0.0
6	7	7/7	12:47:36	0.0
7	8	7/7	12:47:37	0.0
8	9	0	12:47:38	0.0
9	10	0	12:47:39	0.0

答案4

得分: 0

由于您的列始终以一秒为增量，您可以使用pd.date_range来简单地“创建”它。

以下行提供了所需的输出。

df['Time'] = pd.date_range(start='12:47:30', end='12:47:39', freq='s')

如果您有一个大型数据集，可以使用periods参数来指定要创建的值的数量，而无需指定结束时间。

英文:

Since your column always increments by one second, you can just "create" it with pd.date_range

The following line gives the desired output.

df[&#39;Time&#39;] = pd.date_range(start=&#39;12:47:30&#39;, end=&#39;12:47:39&#39;, freq=&#39;s&#39;)

If you have a big dataset, instead of specifying the end, you can simply pass the number of values to create with the periods parameter.

答案5

得分: 0

这是您想要的代码：

import pandas as pd
df = pd.DataFrame({
    'sec': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    'Date': ['7/7', '0', '0', '7/7', '7/7', '0', '7/7', '7/7', '0', '0'],
    'Time': ['12:47:30', '0', '0', '12:47:33', '12:47:34', '0', '12:47:36', '12:47:37', '0', '0'],
    'rpm': [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
})
# 创建一个用于标记具有'0'时间值的掩码
mask = df['Time'] == '0'
# 找到第一个非零时间值的索引
first_nonzero_idx = df.loc[~mask, 'Time'].index[0]
# 将'Time'列转换为列表以便更容易操作
times = df['Time'].tolist()
# 通过从前一个非零时间值递增来填充缺失的时间值
for i in range(first_nonzero_idx + 1, len(times)):
    if mask[i]:
        prev_time = pd.to_datetime(times[i-1])
        times[i] = (prev_time + pd.DateOffset(seconds=1)).strftime('%H:%M:%S')
# 在数据框中更新'Time'列
df['Time'] = times
print(df)

输出：

   sec Date      Time  rpm
0    1  7/7  12:47:30  0.0
1    2    0  12:47:31  0.0
2    3    0  12:47:32  0.0
3    4  7/7  12:47:33  0.0
4    5  7/7  12:47:34  0.0
5    6    0  12:47:35  0.0
6    7  7/7  12:47:36  0.0
7    8  7/7  12:47:37  0.0
8    9    0  12:47:38  0.0
9   10    0  12:47:39  0.0

希望这对您有所帮助！

英文:

Here is the code that you want:

import pandas as pd
df = pd.DataFrame({
    &#39;sec&#39;: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    &#39;Date&#39;: [&#39;7/7&#39;, &#39;0&#39;, &#39;0&#39;, &#39;7/7&#39;, &#39;7/7&#39;, &#39;0&#39;, &#39;7/7&#39;, &#39;7/7&#39;, &#39;0&#39;, &#39;0&#39;],
    &#39;Time&#39;: [&#39;12:47:30&#39;, &#39;0&#39;, &#39;0&#39;, &#39;12:47:33&#39;, &#39;12:47:34&#39;, &#39;0&#39;, &#39;12:47:36&#39;, &#39;12:47:37&#39;, &#39;0&#39;, &#39;0&#39;],
    &#39;rpm&#39;: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
})
# Create a mask for rows with &#39;0&#39; time values
mask = df[&#39;Time&#39;] == &#39;0&#39;
# Find the index of the first non-zero time value
first_nonzero_idx = df.loc[~mask, &#39;Time&#39;].index[0]
# Convert the &#39;Time&#39; column to a list for easier manipulation
times = df[&#39;Time&#39;].tolist()
# Fill in the missing time values by incrementing from the previous non-zero time value
for i in range(first_nonzero_idx + 1, len(times)):
    if mask[i]:
        prev_time = pd.to_datetime(times[i-1])
        times[i] = (prev_time + pd.DateOffset(seconds=1)).strftime(&#39;%H:%M:%S&#39;)
# Update the &#39;Time&#39; column in the dataframe
df[&#39;Time&#39;] = times
print(df)

Output:

   sec Date      Time  rpm
0    1  7/7  12:47:30  0.0
1    2    0  12:47:31  0.0
2    3    0  12:47:32  0.0
3    4  7/7  12:47:33  0.0
4    5  7/7  12:47:34  0.0
5    6    0  12:47:35  0.0
6    7  7/7  12:47:36  0.0
7    8  7/7  12:47:37  0.0
8    9    0  12:47:38  0.0
9   10    0  12:47:39  0.0

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在Pandas数据框中填充缺失的秒数

问题

答案1

答案2

答案3

答案4

答案5

使用rospy.Subscriber获取具有延迟的连续图像。

SQLAlchemy 2.0 ORM在Pycharm中筛选时显示错误的类型。

Is there any method to set up virtual env for running python code in jenkins ubuntu pipeline server?

如何缩放 histplot 数据

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。