英文:
how to fill missing seconds in pandas dataframe
问题
这是你要的翻译结果:
我有一个数据框,我想要在时间数据框中填充丢失的秒值,应该如何做到这一点
这是我的数据
df = pd.DataFrame({
'sec': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'Date': ['7/7', '0', '0', '7/7', '7/7', '0', '7/7', '7/7', '0', '0'],
'Time': ['12:47:30', '0', '0', '12:47:33', '12:47:34', '0', '12:47:36', '12:47:37', '0', '0'],
'rpm': [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
})
在时间列中,从12:47:30开始,0的位置应该是12:47:31。换句话说,我的期望输出是:
df = pd.DataFrame({
'sec': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'Date': ['7/7', '0', '0', '7/7', '7/7', '0', '7/7', '7/7', '0', '0'],
'Time': ['12:47:30', '12:47:31', '12:47:32', '12:47:33', '12:47:34', '12:47:35', '12:47:36', '12:47:37', '12:47:38', '12:47:39'],
'rpm': [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
})
英文:
I have a data frame I want to fill missing seconds values in Time data frame how to do that
this is my data
df = pd.DataFrame({
'sec': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'Date': ['7/7', '0', '0', '7/7', '7/7', '0', '7/7', '7/7', '0', '0'],
'Time': ['12:47:30', '0', '0', '12:47:33', '12:47:34', '0', '12:47:36', '12:47:37', '0', '0'],
'rpm': [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
})
In Time column after 12:47:30 in place of 0 it has to be 12:47:31. In other words, my expected output is:
df = pd.DataFrame({
'sec': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'Date': ['7/7', '0', '0', '7/7', '7/7', '0', '7/7', '7/7', '0', '0'],
'Time': ['12:47:30', '12:47:31', '12:47:32', '12:47:33', '12:47:34', '12:47:35', '12:47:36', '12:47:37', '12:47:38', '12:47:39'],
'rpm': [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
})
答案1
得分: 2
以下是您要翻译的内容:
首先创建`DatetimeIndex`,然后使用[`DataFrame.resample`](http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.resample.html),最后设置列的值:
df.index = pd.to_datetime(df['Date'] + df['Time'].astype(str),
format='%m/%d%H:%M:%S',
errors='coerce')
out = df.resample('S').first()
out['Time'] = out.index.time
out['Date'] = out.index.strftime('%m/%d')
out['rpm'] = out['rpm'].fillna(0)
out['sec'] = out.groupby('Date').cumcount().add(1)
print(out)
out = out.reset_index(drop=True)
print(out)
另一种解决方法是使用Series.ffill
进行日期的前向填充,同时使用GroupBy.cumcount
和to_timedelta
来创建非时间值的秒数:
dates = pd.to_datetime(df['Date'] + df['Time'].astype(str),
format='%m/%d%H:%M:%S',
errors='coerce')
sec = pd.to_timedelta(df.groupby(dates.notna().cumsum()).cumcount(), unit='s')
df['Time'] = dates.ffill().add(sec).dt.strftime('%H:%M:%S')
print(df)
英文:
Create DatetimeIndex
first and then use DataFrame.resample
, last set columns values:
df.index = pd.to_datetime(df['Date'] + df['Time'].astype(str),
format='%m/%d%H:%M:%S',
errors='coerce')
out = df.resample('S').first()
out['Time'] = out.index.time
out['Date'] = out.index.strftime('%m/%d')
out['rpm'] = out['rpm'].fillna(0)
out['sec'] = out.groupby('Date').cumcount().add(1)
print (out)
sec Date Time rpm
1900-07-07 12:47:30 1 07/07 12:47:30 0.0
1900-07-07 12:47:31 2 07/07 12:47:31 0.0
1900-07-07 12:47:32 3 07/07 12:47:32 0.0
1900-07-07 12:47:33 4 07/07 12:47:33 0.0
1900-07-07 12:47:34 5 07/07 12:47:34 0.0
1900-07-07 12:47:35 6 07/07 12:47:35 0.0
1900-07-07 12:47:36 7 07/07 12:47:36 0.0
1900-07-07 12:47:37 8 07/07 12:47:37 0.0
out = out.reset_index(drop=True)
print (out)
sec Date Time rpm
0 1 07/07 12:47:30 0.0
1 2 07/07 12:47:31 0.0
2 3 07/07 12:47:32 0.0
3 4 07/07 12:47:33 0.0
4 5 07/07 12:47:34 0.0
5 6 07/07 12:47:35 0.0
6 7 07/07 12:47:36 0.0
7 8 07/07 12:47:37 0.0
Another solution with forward filling dates by Series.ffill
with add second for non times values created by GroupBy.cumcount
and to_timedelta
:
dates = pd.to_datetime(df['Date'] + df['Time'].astype(str),
format='%m/%d%H:%M:%S',
errors='coerce')
sec = pd.to_timedelta(df.groupby(dates.notna().cumsum()).cumcount(), unit='s')
df['Time'] = dates.ffill().add(sec).dt.strftime('%H:%M:%S')
print (df)
sec Date Time rpm
0 1 7/7 12:47:30 0.0
1 2 0 12:47:31 0.0
2 3 0 12:47:32 0.0
3 4 7/7 12:47:33 0.0
4 5 7/7 12:47:34 0.0
5 6 0 12:47:35 0.0
6 7 7/7 12:47:36 0.0
7 8 7/7 12:47:37 0.0
8 9 0 12:47:38 0.0
9 10 0 12:47:39 0.0
答案2
得分: 2
另一个可能的解决方案,使用线性插值来填充空时间:
from scipy.interpolate import interp1d
df['Time'] = pd.to_datetime(df['Time'], format='%H:%M:%S', errors='coerce')
df_nonan = df[['sec', 'Time']].dropna()
f = interp1d(df_nonan.iloc[:, 0], df_nonan.iloc[:, 1], fill_value='extrapolate')
df['Time'] = pd.to_datetime(f(df['sec']))
df['Time'] = df['Time'].dt.time
输出:
sec Date Time rpm
0 1 7/7 12:47:30 0.0
1 2 0 12:47:31 0.0
2 3 0 12:47:32 0.0
3 4 7/7 12:47:33 0.0
4 5 7/7 12:47:34 0.0
5 6 0 12:47:35 0.0
6 7 7/7 12:47:36 0.0
7 8 7/7 12:47:37 0.0
8 9 0 12:47:38 0.0
9 10 0 12:47:39 0.0
英文:
Another possible solution, which uses linear interpolation to fill the null times:
from scipy.interpolate import interp1d
df['Time'] = pd.to_datetime(df['Time'], format='%H:%M:%S', errors='coerce')
df_nonan = df[['sec', 'Time']].dropna()
f = interp1d(df_nonan.iloc[:, 0], df_nonan.iloc[:, 1],
fill_value='extrapolate')
df['Time'] = pd.to_datetime(f(df['sec']))
df['Time'] = df['Time'].dt.time
Output:
sec Date Time rpm
0 1 7/7 12:47:30 0.0
1 2 0 12:47:31 0.0
2 3 0 12:47:32 0.0
3 4 7/7 12:47:33 0.0
4 5 7/7 12:47:34 0.0
5 6 0 12:47:35 0.0
6 7 7/7 12:47:36 0.0
7 8 7/7 12:47:37 0.0
8 9 0 12:47:38 0.0
9 10 0 12:47:39 0.0
答案3
得分: 1
将Time
列转换为.to_datetime
,并在前一个时间上添加一秒,如下所示:
代码:
# 将'Time'列转换为datetime,将'0'值转换为NaT(不是时间)
df['Time'] = pd.to_datetime(df['Time'], format='%H:%M:%S', errors='coerce')
# 遍历'Time'列,将NaT值替换为前一个时间加一秒
previous_time = None
for i, time in enumerate(df['Time']):
if pd.isnull(time):
new_time = (previous_time + timedelta(seconds=1))
df.at[i, 'Time'] = new_time
previous_time = new_time
else:
previous_time = time
df['Time'] = df['Time'].apply(lambda x: x.strftime('%H:%M:%S'))
输出:
sec Date Time rpm
0 1 7/7 12:47:30 0.0
1 2 0 12:47:31 0.0
2 3 0 12:47:32 0.0
3 4 7/7 12:47:33 0.0
4 5 7/7 12:47:34 0.0
5 6 0 12:47:35 0.0
6 7 7/7 12:47:36 0.0
7 8 7/7 12:47:37 0.0
8 9 0 12:47:38 0.0
9 10 0 12:47:39 0.0
英文:
Convert Time
column .to_datetime
and add one second to previous time, as show below
Code:
# Convert 'Time' column to datetime and '0' values to NaT (Not a time)
df['Time'] = pd.to_datetime(df['Time'], format='%H:%M:%S', errors='coerce')
# Iterate over the 'Time' column and replace NaT values
# with the time by adding one second to the previous time
previous_time = None
for i, time in enumerate(df['Time']):
if pd.isnull(time):
new_time = (previous_time + timedelta(seconds=1))
df.at[i, 'Time'] = new_time
previous_time = new_time
else:
previous_time = time
df['Time'] = df['Time'].apply(lambda x: x.strftime('%H:%M:%S'))
Output:
sec Date Time rpm
0 1 7/7 12:47:30 0.0
1 2 0 12:47:31 0.0
2 3 0 12:47:32 0.0
3 4 7/7 12:47:33 0.0
4 5 7/7 12:47:34 0.0
5 6 0 12:47:35 0.0
6 7 7/7 12:47:36 0.0
7 8 7/7 12:47:37 0.0
8 9 0 12:47:38 0.0
9 10 0 12:47:39 0.0
答案4
得分: 0
由于您的列始终以一秒为增量,您可以使用pd.date_range
来简单地“创建”它。
以下行提供了所需的输出。
df['Time'] = pd.date_range(start='12:47:30', end='12:47:39', freq='s')
如果您有一个大型数据集,可以使用periods
参数来指定要创建的值的数量,而无需指定结束时间。
英文:
Since your column always increments by one second, you can just "create" it with pd.date_range
The following line gives the desired output.
df['Time'] = pd.date_range(start='12:47:30', end='12:47:39', freq='s')
If you have a big dataset, instead of specifying the end, you can simply pass the number of values to create with the periods
parameter.
答案5
得分: 0
这是您想要的代码:
import pandas as pd
df = pd.DataFrame({
'sec': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'Date': ['7/7', '0', '0', '7/7', '7/7', '0', '7/7', '7/7', '0', '0'],
'Time': ['12:47:30', '0', '0', '12:47:33', '12:47:34', '0', '12:47:36', '12:47:37', '0', '0'],
'rpm': [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
})
# 创建一个用于标记具有'0'时间值的掩码
mask = df['Time'] == '0'
# 找到第一个非零时间值的索引
first_nonzero_idx = df.loc[~mask, 'Time'].index[0]
# 将'Time'列转换为列表以便更容易操作
times = df['Time'].tolist()
# 通过从前一个非零时间值递增来填充缺失的时间值
for i in range(first_nonzero_idx + 1, len(times)):
if mask[i]:
prev_time = pd.to_datetime(times[i-1])
times[i] = (prev_time + pd.DateOffset(seconds=1)).strftime('%H:%M:%S')
# 在数据框中更新'Time'列
df['Time'] = times
print(df)
输出:
sec Date Time rpm
0 1 7/7 12:47:30 0.0
1 2 0 12:47:31 0.0
2 3 0 12:47:32 0.0
3 4 7/7 12:47:33 0.0
4 5 7/7 12:47:34 0.0
5 6 0 12:47:35 0.0
6 7 7/7 12:47:36 0.0
7 8 7/7 12:47:37 0.0
8 9 0 12:47:38 0.0
9 10 0 12:47:39 0.0
希望这对您有所帮助!
英文:
Here is the code that you want:
import pandas as pd
df = pd.DataFrame({
'sec': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'Date': ['7/7', '0', '0', '7/7', '7/7', '0', '7/7', '7/7', '0', '0'],
'Time': ['12:47:30', '0', '0', '12:47:33', '12:47:34', '0', '12:47:36', '12:47:37', '0', '0'],
'rpm': [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
})
# Create a mask for rows with '0' time values
mask = df['Time'] == '0'
# Find the index of the first non-zero time value
first_nonzero_idx = df.loc[~mask, 'Time'].index[0]
# Convert the 'Time' column to a list for easier manipulation
times = df['Time'].tolist()
# Fill in the missing time values by incrementing from the previous non-zero time value
for i in range(first_nonzero_idx + 1, len(times)):
if mask[i]:
prev_time = pd.to_datetime(times[i-1])
times[i] = (prev_time + pd.DateOffset(seconds=1)).strftime('%H:%M:%S')
# Update the 'Time' column in the dataframe
df['Time'] = times
print(df)
Output:
sec Date Time rpm
0 1 7/7 12:47:30 0.0
1 2 0 12:47:31 0.0
2 3 0 12:47:32 0.0
3 4 7/7 12:47:33 0.0
4 5 7/7 12:47:34 0.0
5 6 0 12:47:35 0.0
6 7 7/7 12:47:36 0.0
7 8 7/7 12:47:37 0.0
8 9 0 12:47:38 0.0
9 10 0 12:47:39 0.0
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论