如何在Pandas数据框中填充缺失的秒数

huangapple go评论88阅读模式
英文:

how to fill missing seconds in pandas dataframe

问题

这是你要的翻译结果:

  1. 我有一个数据框我想要在时间数据框中填充丢失的秒值应该如何做到这一点
  2. 这是我的数据
  3. df = pd.DataFrame({
  4. 'sec': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
  5. 'Date': ['7/7', '0', '0', '7/7', '7/7', '0', '7/7', '7/7', '0', '0'],
  6. 'Time': ['12:47:30', '0', '0', '12:47:33', '12:47:34', '0', '12:47:36', '12:47:37', '0', '0'],
  7. 'rpm': [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
  8. })
  9. 在时间列中12:47:30开始0的位置应该是12:47:31换句话说我的期望输出是
  10. df = pd.DataFrame({
  11. 'sec': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
  12. 'Date': ['7/7', '0', '0', '7/7', '7/7', '0', '7/7', '7/7', '0', '0'],
  13. 'Time': ['12:47:30', '12:47:31', '12:47:32', '12:47:33', '12:47:34', '12:47:35', '12:47:36', '12:47:37', '12:47:38', '12:47:39'],
  14. 'rpm': [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
  15. })
英文:

I have a data frame I want to fill missing seconds values in Time data frame how to do that
this is my data

  1. df = pd.DataFrame({
  2. 'sec': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
  3. 'Date': ['7/7', '0', '0', '7/7', '7/7', '0', '7/7', '7/7', '0', '0'],
  4. 'Time': ['12:47:30', '0', '0', '12:47:33', '12:47:34', '0', '12:47:36', '12:47:37', '0', '0'],
  5. 'rpm': [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
  6. })

In Time column after 12:47:30 in place of 0 it has to be 12:47:31. In other words, my expected output is:

  1. df = pd.DataFrame({
  2. 'sec': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
  3. 'Date': ['7/7', '0', '0', '7/7', '7/7', '0', '7/7', '7/7', '0', '0'],
  4. 'Time': ['12:47:30', '12:47:31', '12:47:32', '12:47:33', '12:47:34', '12:47:35', '12:47:36', '12:47:37', '12:47:38', '12:47:39'],
  5. 'rpm': [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
  6. })

答案1

得分: 2

以下是您要翻译的内容:

  1. 首先创建`DatetimeIndex`然后使用[`DataFrame.resample`](http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.resample.html)最后设置列的值
  2. df.index = pd.to_datetime(df['Date'] + df['Time'].astype(str),
  3. format='%m/%d%H:%M:%S',
  4. errors='coerce')
  5. out = df.resample('S').first()
  6. out['Time'] = out.index.time
  7. out['Date'] = out.index.strftime('%m/%d')
  8. out['rpm'] = out['rpm'].fillna(0)
  9. out['sec'] = out.groupby('Date').cumcount().add(1)
  10. print(out)
  11. out = out.reset_index(drop=True)
  12. print(out)

另一种解决方法是使用Series.ffill进行日期的前向填充,同时使用GroupBy.cumcountto_timedelta来创建非时间值的秒数:

  1. dates = pd.to_datetime(df['Date'] + df['Time'].astype(str),
  2. format='%m/%d%H:%M:%S',
  3. errors='coerce')
  4. sec = pd.to_timedelta(df.groupby(dates.notna().cumsum()).cumcount(), unit='s')
  5. df['Time'] = dates.ffill().add(sec).dt.strftime('%H:%M:%S')
  6. print(df)
英文:

Create DatetimeIndex first and then use DataFrame.resample, last set columns values:

  1. df.index = pd.to_datetime(df['Date'] + df['Time'].astype(str),
  2. format='%m/%d%H:%M:%S',
  3. errors='coerce')
  4. out = df.resample('S').first()
  5. out['Time'] = out.index.time
  6. out['Date'] = out.index.strftime('%m/%d')
  7. out['rpm'] = out['rpm'].fillna(0)
  8. out['sec'] = out.groupby('Date').cumcount().add(1)
  9. print (out)
  10. sec Date Time rpm
  11. 1900-07-07 12:47:30 1 07/07 12:47:30 0.0
  12. 1900-07-07 12:47:31 2 07/07 12:47:31 0.0
  13. 1900-07-07 12:47:32 3 07/07 12:47:32 0.0
  14. 1900-07-07 12:47:33 4 07/07 12:47:33 0.0
  15. 1900-07-07 12:47:34 5 07/07 12:47:34 0.0
  16. 1900-07-07 12:47:35 6 07/07 12:47:35 0.0
  17. 1900-07-07 12:47:36 7 07/07 12:47:36 0.0
  18. 1900-07-07 12:47:37 8 07/07 12:47:37 0.0

  1. out = out.reset_index(drop=True)
  2. print (out)
  3. sec Date Time rpm
  4. 0 1 07/07 12:47:30 0.0
  5. 1 2 07/07 12:47:31 0.0
  6. 2 3 07/07 12:47:32 0.0
  7. 3 4 07/07 12:47:33 0.0
  8. 4 5 07/07 12:47:34 0.0
  9. 5 6 07/07 12:47:35 0.0
  10. 6 7 07/07 12:47:36 0.0
  11. 7 8 07/07 12:47:37 0.0

Another solution with forward filling dates by Series.ffill with add second for non times values created by GroupBy.cumcount and to_timedelta:

  1. dates = pd.to_datetime(df['Date'] + df['Time'].astype(str),
  2. format='%m/%d%H:%M:%S',
  3. errors='coerce')
  4. sec = pd.to_timedelta(df.groupby(dates.notna().cumsum()).cumcount(), unit='s')
  5. df['Time'] = dates.ffill().add(sec).dt.strftime('%H:%M:%S')
  6. print (df)
  7. sec Date Time rpm
  8. 0 1 7/7 12:47:30 0.0
  9. 1 2 0 12:47:31 0.0
  10. 2 3 0 12:47:32 0.0
  11. 3 4 7/7 12:47:33 0.0
  12. 4 5 7/7 12:47:34 0.0
  13. 5 6 0 12:47:35 0.0
  14. 6 7 7/7 12:47:36 0.0
  15. 7 8 7/7 12:47:37 0.0
  16. 8 9 0 12:47:38 0.0
  17. 9 10 0 12:47:39 0.0

答案2

得分: 2

另一个可能的解决方案,使用线性插值来填充空时间:

  1. from scipy.interpolate import interp1d
  2. df['Time'] = pd.to_datetime(df['Time'], format='%H:%M:%S', errors='coerce')
  3. df_nonan = df[['sec', 'Time']].dropna()
  4. f = interp1d(df_nonan.iloc[:, 0], df_nonan.iloc[:, 1], fill_value='extrapolate')
  5. df['Time'] = pd.to_datetime(f(df['sec']))
  6. df['Time'] = df['Time'].dt.time

输出:

  1. sec Date Time rpm
  2. 0 1 7/7 12:47:30 0.0
  3. 1 2 0 12:47:31 0.0
  4. 2 3 0 12:47:32 0.0
  5. 3 4 7/7 12:47:33 0.0
  6. 4 5 7/7 12:47:34 0.0
  7. 5 6 0 12:47:35 0.0
  8. 6 7 7/7 12:47:36 0.0
  9. 7 8 7/7 12:47:37 0.0
  10. 8 9 0 12:47:38 0.0
  11. 9 10 0 12:47:39 0.0
英文:

Another possible solution, which uses linear interpolation to fill the null times:

  1. from scipy.interpolate import interp1d
  2. df['Time'] = pd.to_datetime(df['Time'], format='%H:%M:%S', errors='coerce')
  3. df_nonan = df[['sec', 'Time']].dropna()
  4. f = interp1d(df_nonan.iloc[:, 0], df_nonan.iloc[:, 1],
  5. fill_value='extrapolate')
  6. df['Time'] = pd.to_datetime(f(df['sec']))
  7. df['Time'] = df['Time'].dt.time

Output:

  1. sec Date Time rpm
  2. 0 1 7/7 12:47:30 0.0
  3. 1 2 0 12:47:31 0.0
  4. 2 3 0 12:47:32 0.0
  5. 3 4 7/7 12:47:33 0.0
  6. 4 5 7/7 12:47:34 0.0
  7. 5 6 0 12:47:35 0.0
  8. 6 7 7/7 12:47:36 0.0
  9. 7 8 7/7 12:47:37 0.0
  10. 8 9 0 12:47:38 0.0
  11. 9 10 0 12:47:39 0.0

答案3

得分: 1

Time列转换为.to_datetime,并在前一个时间上添加一秒,如下所示:

代码:

  1. # 将'Time'列转换为datetime,将'0'值转换为NaT(不是时间)
  2. df['Time'] = pd.to_datetime(df['Time'], format='%H:%M:%S', errors='coerce')
  3. # 遍历'Time'列,将NaT值替换为前一个时间加一秒
  4. previous_time = None
  5. for i, time in enumerate(df['Time']):
  6. if pd.isnull(time):
  7. new_time = (previous_time + timedelta(seconds=1))
  8. df.at[i, 'Time'] = new_time
  9. previous_time = new_time
  10. else:
  11. previous_time = time
  12. df['Time'] = df['Time'].apply(lambda x: x.strftime('%H:%M:%S'))

输出:

  1. sec Date Time rpm
  2. 0 1 7/7 12:47:30 0.0
  3. 1 2 0 12:47:31 0.0
  4. 2 3 0 12:47:32 0.0
  5. 3 4 7/7 12:47:33 0.0
  6. 4 5 7/7 12:47:34 0.0
  7. 5 6 0 12:47:35 0.0
  8. 6 7 7/7 12:47:36 0.0
  9. 7 8 7/7 12:47:37 0.0
  10. 8 9 0 12:47:38 0.0
  11. 9 10 0 12:47:39 0.0
英文:

Convert Time column .to_datetime and add one second to previous time, as show below

Code:

  1. # Convert 'Time' column to datetime and '0' values to NaT (Not a time)
  2. df['Time'] = pd.to_datetime(df['Time'], format='%H:%M:%S', errors='coerce')
  3. # Iterate over the 'Time' column and replace NaT values
  4. # with the time by adding one second to the previous time
  5. previous_time = None
  6. for i, time in enumerate(df['Time']):
  7. if pd.isnull(time):
  8. new_time = (previous_time + timedelta(seconds=1))
  9. df.at[i, 'Time'] = new_time
  10. previous_time = new_time
  11. else:
  12. previous_time = time
  13. df['Time'] = df['Time'].apply(lambda x: x.strftime('%H:%M:%S'))

Output:

  1. sec Date Time rpm
  2. 0 1 7/7 12:47:30 0.0
  3. 1 2 0 12:47:31 0.0
  4. 2 3 0 12:47:32 0.0
  5. 3 4 7/7 12:47:33 0.0
  6. 4 5 7/7 12:47:34 0.0
  7. 5 6 0 12:47:35 0.0
  8. 6 7 7/7 12:47:36 0.0
  9. 7 8 7/7 12:47:37 0.0
  10. 8 9 0 12:47:38 0.0
  11. 9 10 0 12:47:39 0.0

答案4

得分: 0

由于您的列始终以一秒为增量,您可以使用pd.date_range来简单地“创建”它。

以下行提供了所需的输出。

  1. df['Time'] = pd.date_range(start='12:47:30', end='12:47:39', freq='s')

如果您有一个大型数据集,可以使用periods参数来指定要创建的值的数量,而无需指定结束时间。

英文:

Since your column always increments by one second, you can just "create" it with pd.date_range

The following line gives the desired output.

  1. df['Time'] = pd.date_range(start='12:47:30', end='12:47:39', freq='s')

If you have a big dataset, instead of specifying the end, you can simply pass the number of values to create with the periods parameter.

答案5

得分: 0

这是您想要的代码:

  1. import pandas as pd
  2. df = pd.DataFrame({
  3. 'sec': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
  4. 'Date': ['7/7', '0', '0', '7/7', '7/7', '0', '7/7', '7/7', '0', '0'],
  5. 'Time': ['12:47:30', '0', '0', '12:47:33', '12:47:34', '0', '12:47:36', '12:47:37', '0', '0'],
  6. 'rpm': [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
  7. })
  8. # 创建一个用于标记具有'0'时间值的掩码
  9. mask = df['Time'] == '0'
  10. # 找到第一个非零时间值的索引
  11. first_nonzero_idx = df.loc[~mask, 'Time'].index[0]
  12. # 将'Time'列转换为列表以便更容易操作
  13. times = df['Time'].tolist()
  14. # 通过从前一个非零时间值递增来填充缺失的时间值
  15. for i in range(first_nonzero_idx + 1, len(times)):
  16. if mask[i]:
  17. prev_time = pd.to_datetime(times[i-1])
  18. times[i] = (prev_time + pd.DateOffset(seconds=1)).strftime('%H:%M:%S')
  19. # 在数据框中更新'Time'列
  20. df['Time'] = times
  21. print(df)

输出:

  1. sec Date Time rpm
  2. 0 1 7/7 12:47:30 0.0
  3. 1 2 0 12:47:31 0.0
  4. 2 3 0 12:47:32 0.0
  5. 3 4 7/7 12:47:33 0.0
  6. 4 5 7/7 12:47:34 0.0
  7. 5 6 0 12:47:35 0.0
  8. 6 7 7/7 12:47:36 0.0
  9. 7 8 7/7 12:47:37 0.0
  10. 8 9 0 12:47:38 0.0
  11. 9 10 0 12:47:39 0.0

希望这对您有所帮助!

英文:

Here is the code that you want:

  1. import pandas as pd
  2. df = pd.DataFrame({
  3. 'sec': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
  4. 'Date': ['7/7', '0', '0', '7/7', '7/7', '0', '7/7', '7/7', '0', '0'],
  5. 'Time': ['12:47:30', '0', '0', '12:47:33', '12:47:34', '0', '12:47:36', '12:47:37', '0', '0'],
  6. 'rpm': [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
  7. })
  8. # Create a mask for rows with '0' time values
  9. mask = df['Time'] == '0'
  10. # Find the index of the first non-zero time value
  11. first_nonzero_idx = df.loc[~mask, 'Time'].index[0]
  12. # Convert the 'Time' column to a list for easier manipulation
  13. times = df['Time'].tolist()
  14. # Fill in the missing time values by incrementing from the previous non-zero time value
  15. for i in range(first_nonzero_idx + 1, len(times)):
  16. if mask[i]:
  17. prev_time = pd.to_datetime(times[i-1])
  18. times[i] = (prev_time + pd.DateOffset(seconds=1)).strftime('%H:%M:%S')
  19. # Update the 'Time' column in the dataframe
  20. df['Time'] = times
  21. print(df)

Output:

  1. sec Date Time rpm
  2. 0 1 7/7 12:47:30 0.0
  3. 1 2 0 12:47:31 0.0
  4. 2 3 0 12:47:32 0.0
  5. 3 4 7/7 12:47:33 0.0
  6. 4 5 7/7 12:47:34 0.0
  7. 5 6 0 12:47:35 0.0
  8. 6 7 7/7 12:47:36 0.0
  9. 7 8 7/7 12:47:37 0.0
  10. 8 9 0 12:47:38 0.0
  11. 9 10 0 12:47:39 0.0

huangapple
  • 本文由 发表于 2023年7月10日 18:21:06
  • 转载请务必保留本文链接:https://go.coder-hub.com/76652806.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定