如何在Pandas数据框中填充缺失的秒数

huangapple go评论48阅读模式
英文:

how to fill missing seconds in pandas dataframe

问题

这是你要的翻译结果:

我有一个数据框我想要在时间数据框中填充丢失的秒值应该如何做到这一点
这是我的数据

df = pd.DataFrame({
'sec': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'Date': ['7/7', '0', '0', '7/7', '7/7', '0', '7/7', '7/7', '0', '0'],
'Time': ['12:47:30', '0', '0', '12:47:33', '12:47:34', '0', '12:47:36', '12:47:37', '0', '0'],
'rpm': [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
})

在时间列中从12:47:30开始0的位置应该是12:47:31换句话说我的期望输出是

df = pd.DataFrame({
'sec': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'Date': ['7/7', '0', '0', '7/7', '7/7', '0', '7/7', '7/7', '0', '0'],
'Time': ['12:47:30', '12:47:31', '12:47:32', '12:47:33', '12:47:34', '12:47:35', '12:47:36', '12:47:37', '12:47:38', '12:47:39'],
'rpm': [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
})
英文:

I have a data frame I want to fill missing seconds values in Time data frame how to do that
this is my data

df = pd.DataFrame({
'sec': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'Date': ['7/7', '0', '0', '7/7', '7/7', '0', '7/7', '7/7', '0', '0'],
'Time': ['12:47:30', '0', '0', '12:47:33', '12:47:34', '0', '12:47:36', '12:47:37', '0', '0'],
'rpm': [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
})

In Time column after 12:47:30 in place of 0 it has to be 12:47:31. In other words, my expected output is:

df = pd.DataFrame({
'sec': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'Date': ['7/7', '0', '0', '7/7', '7/7', '0', '7/7', '7/7', '0', '0'],
'Time': ['12:47:30', '12:47:31', '12:47:32', '12:47:33', '12:47:34', '12:47:35', '12:47:36', '12:47:37', '12:47:38', '12:47:39'],
'rpm': [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
})

答案1

得分: 2

以下是您要翻译的内容:

首先创建`DatetimeIndex`,然后使用[`DataFrame.resample`](http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.resample.html)最后设置列的值

df.index = pd.to_datetime(df['Date'] + df['Time'].astype(str), 
                          format='%m/%d%H:%M:%S', 
                          errors='coerce')

out = df.resample('S').first()

out['Time'] = out.index.time
out['Date'] = out.index.strftime('%m/%d')
out['rpm'] = out['rpm'].fillna(0)
out['sec'] = out.groupby('Date').cumcount().add(1)
print(out)

out = out.reset_index(drop=True)
print(out)

另一种解决方法是使用Series.ffill进行日期的前向填充,同时使用GroupBy.cumcountto_timedelta来创建非时间值的秒数:

dates = pd.to_datetime(df['Date'] + df['Time'].astype(str), 
                      format='%m/%d%H:%M:%S', 
                      errors='coerce')

sec = pd.to_timedelta(df.groupby(dates.notna().cumsum()).cumcount(), unit='s')

df['Time'] = dates.ffill().add(sec).dt.strftime('%H:%M:%S')
print(df)
英文:

Create DatetimeIndex first and then use DataFrame.resample, last set columns values:

df.index = pd.to_datetime(df['Date'] + df['Time'].astype(str), 
                          format='%m/%d%H:%M:%S', 
                          errors='coerce')

out = df.resample('S').first()

out['Time'] = out.index.time
out['Date'] = out.index.strftime('%m/%d')
out['rpm'] = out['rpm'].fillna(0)
out['sec'] = out.groupby('Date').cumcount().add(1)
print (out)
                     sec   Date      Time  rpm
1900-07-07 12:47:30    1  07/07  12:47:30  0.0
1900-07-07 12:47:31    2  07/07  12:47:31  0.0
1900-07-07 12:47:32    3  07/07  12:47:32  0.0
1900-07-07 12:47:33    4  07/07  12:47:33  0.0
1900-07-07 12:47:34    5  07/07  12:47:34  0.0
1900-07-07 12:47:35    6  07/07  12:47:35  0.0
1900-07-07 12:47:36    7  07/07  12:47:36  0.0
1900-07-07 12:47:37    8  07/07  12:47:37  0.0

out = out.reset_index(drop=True)
print (out)
   sec   Date      Time  rpm
0    1  07/07  12:47:30  0.0
1    2  07/07  12:47:31  0.0
2    3  07/07  12:47:32  0.0
3    4  07/07  12:47:33  0.0
4    5  07/07  12:47:34  0.0
5    6  07/07  12:47:35  0.0
6    7  07/07  12:47:36  0.0
7    8  07/07  12:47:37  0.0

Another solution with forward filling dates by Series.ffill with add second for non times values created by GroupBy.cumcount and to_timedelta:

dates = pd.to_datetime(df['Date'] + df['Time'].astype(str), 
                          format='%m/%d%H:%M:%S', 
                          errors='coerce')

sec = pd.to_timedelta(df.groupby(dates.notna().cumsum()).cumcount(), unit='s')

df['Time'] = dates.ffill().add(sec).dt.strftime('%H:%M:%S')
print (df)
   sec Date      Time  rpm
0    1  7/7  12:47:30  0.0
1    2    0  12:47:31  0.0
2    3    0  12:47:32  0.0
3    4  7/7  12:47:33  0.0
4    5  7/7  12:47:34  0.0
5    6    0  12:47:35  0.0
6    7  7/7  12:47:36  0.0
7    8  7/7  12:47:37  0.0
8    9    0  12:47:38  0.0
9   10    0  12:47:39  0.0

答案2

得分: 2

另一个可能的解决方案,使用线性插值来填充空时间:

from scipy.interpolate import interp1d

df['Time'] = pd.to_datetime(df['Time'], format='%H:%M:%S', errors='coerce')
df_nonan = df[['sec', 'Time']].dropna()
f = interp1d(df_nonan.iloc[:, 0], df_nonan.iloc[:, 1], fill_value='extrapolate')
df['Time'] = pd.to_datetime(f(df['sec']))
df['Time'] = df['Time'].dt.time

输出:

   sec Date      Time  rpm
0    1  7/7  12:47:30  0.0
1    2    0  12:47:31  0.0
2    3    0  12:47:32  0.0
3    4  7/7  12:47:33  0.0
4    5  7/7  12:47:34  0.0
5    6    0  12:47:35  0.0
6    7  7/7  12:47:36  0.0
7    8  7/7  12:47:37  0.0
8    9    0  12:47:38  0.0
9   10    0  12:47:39  0.0
英文:

Another possible solution, which uses linear interpolation to fill the null times:

from scipy.interpolate import interp1d

df['Time'] = pd.to_datetime(df['Time'], format='%H:%M:%S', errors='coerce')
df_nonan = df[['sec', 'Time']].dropna()
f = interp1d(df_nonan.iloc[:, 0], df_nonan.iloc[:, 1],
             fill_value='extrapolate')
df['Time'] = pd.to_datetime(f(df['sec']))
df['Time'] = df['Time'].dt.time

Output:

   sec Date      Time  rpm
0    1  7/7  12:47:30  0.0
1    2    0  12:47:31  0.0
2    3    0  12:47:32  0.0
3    4  7/7  12:47:33  0.0
4    5  7/7  12:47:34  0.0
5    6    0  12:47:35  0.0
6    7  7/7  12:47:36  0.0
7    8  7/7  12:47:37  0.0
8    9    0  12:47:38  0.0
9   10    0  12:47:39  0.0

答案3

得分: 1

Time列转换为.to_datetime,并在前一个时间上添加一秒,如下所示:

代码:

# 将'Time'列转换为datetime,将'0'值转换为NaT(不是时间)
df['Time'] = pd.to_datetime(df['Time'], format='%H:%M:%S', errors='coerce')

# 遍历'Time'列,将NaT值替换为前一个时间加一秒
previous_time = None
for i, time in enumerate(df['Time']):
    if pd.isnull(time):
        new_time = (previous_time + timedelta(seconds=1))
        df.at[i, 'Time'] = new_time
        previous_time = new_time
    else:
        previous_time = time

df['Time'] = df['Time'].apply(lambda x: x.strftime('%H:%M:%S'))

输出:

sec   Date   Time       rpm
0     1    7/7  12:47:30  0.0
1     2    0    12:47:31  0.0
2     3    0    12:47:32  0.0
3     4    7/7  12:47:33  0.0
4     5    7/7  12:47:34  0.0
5     6    0    12:47:35  0.0
6     7    7/7  12:47:36  0.0
7     8    7/7  12:47:37  0.0
8     9    0    12:47:38  0.0
9    10    0    12:47:39  0.0
英文:

Convert Time column .to_datetime and add one second to previous time, as show below

Code:

# Convert 'Time' column to datetime and '0' values to NaT (Not a time)
df['Time'] = pd.to_datetime(df['Time'], format='%H:%M:%S', errors='coerce')

# Iterate over the 'Time' column and replace NaT values 
# with the time by adding one second to the previous time
previous_time = None
for i, time in enumerate(df['Time']):
    if pd.isnull(time):
        new_time = (previous_time + timedelta(seconds=1))
        df.at[i, 'Time'] = new_time
        previous_time = new_time
    else:
        previous_time = time

df['Time'] = df['Time'].apply(lambda x: x.strftime('%H:%M:%S'))

Output:

sec	Date	Time	rpm
0	1	7/7	12:47:30	0.0
1	2	0	12:47:31	0.0
2	3	0	12:47:32	0.0
3	4	7/7	12:47:33	0.0
4	5	7/7	12:47:34	0.0
5	6	0	12:47:35	0.0
6	7	7/7	12:47:36	0.0
7	8	7/7	12:47:37	0.0
8	9	0	12:47:38	0.0
9	10	0	12:47:39	0.0

答案4

得分: 0

由于您的列始终以一秒为增量,您可以使用pd.date_range来简单地“创建”它。

以下行提供了所需的输出。

df['Time'] = pd.date_range(start='12:47:30', end='12:47:39', freq='s')

如果您有一个大型数据集,可以使用periods参数来指定要创建的值的数量,而无需指定结束时间。

英文:

Since your column always increments by one second, you can just "create" it with pd.date_range

The following line gives the desired output.

df['Time'] = pd.date_range(start='12:47:30', end='12:47:39', freq='s')

If you have a big dataset, instead of specifying the end, you can simply pass the number of values to create with the periods parameter.

答案5

得分: 0

这是您想要的代码:

import pandas as pd

df = pd.DataFrame({
    'sec': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    'Date': ['7/7', '0', '0', '7/7', '7/7', '0', '7/7', '7/7', '0', '0'],
    'Time': ['12:47:30', '0', '0', '12:47:33', '12:47:34', '0', '12:47:36', '12:47:37', '0', '0'],
    'rpm': [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
})

# 创建一个用于标记具有'0'时间值的掩码
mask = df['Time'] == '0'

# 找到第一个非零时间值的索引
first_nonzero_idx = df.loc[~mask, 'Time'].index[0]

# 将'Time'列转换为列表以便更容易操作
times = df['Time'].tolist()

# 通过从前一个非零时间值递增来填充缺失的时间值
for i in range(first_nonzero_idx + 1, len(times)):
    if mask[i]:
        prev_time = pd.to_datetime(times[i-1])
        times[i] = (prev_time + pd.DateOffset(seconds=1)).strftime('%H:%M:%S')

# 在数据框中更新'Time'列
df['Time'] = times

print(df)

输出:

   sec Date      Time  rpm
0    1  7/7  12:47:30  0.0
1    2    0  12:47:31  0.0
2    3    0  12:47:32  0.0
3    4  7/7  12:47:33  0.0
4    5  7/7  12:47:34  0.0
5    6    0  12:47:35  0.0
6    7  7/7  12:47:36  0.0
7    8  7/7  12:47:37  0.0
8    9    0  12:47:38  0.0
9   10    0  12:47:39  0.0

希望这对您有所帮助!

英文:

Here is the code that you want:

import pandas as pd

df = pd.DataFrame({
    'sec': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    'Date': ['7/7', '0', '0', '7/7', '7/7', '0', '7/7', '7/7', '0', '0'],
    'Time': ['12:47:30', '0', '0', '12:47:33', '12:47:34', '0', '12:47:36', '12:47:37', '0', '0'],
    'rpm': [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
})

# Create a mask for rows with '0' time values
mask = df['Time'] == '0'

# Find the index of the first non-zero time value
first_nonzero_idx = df.loc[~mask, 'Time'].index[0]

# Convert the 'Time' column to a list for easier manipulation
times = df['Time'].tolist()

# Fill in the missing time values by incrementing from the previous non-zero time value
for i in range(first_nonzero_idx + 1, len(times)):
    if mask[i]:
        prev_time = pd.to_datetime(times[i-1])
        times[i] = (prev_time + pd.DateOffset(seconds=1)).strftime('%H:%M:%S')

# Update the 'Time' column in the dataframe
df['Time'] = times

print(df)

Output:

   sec Date      Time  rpm
0    1  7/7  12:47:30  0.0
1    2    0  12:47:31  0.0
2    3    0  12:47:32  0.0
3    4  7/7  12:47:33  0.0
4    5  7/7  12:47:34  0.0
5    6    0  12:47:35  0.0
6    7  7/7  12:47:36  0.0
7    8  7/7  12:47:37  0.0
8    9    0  12:47:38  0.0
9   10    0  12:47:39  0.0

huangapple
  • 本文由 发表于 2023年7月10日 18:21:06
  • 转载请务必保留本文链接:https://go.coder-hub.com/76652806.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定