2023年6月19日 07:02:56go评论102阅读模式

英文:

How to flatten a digital signal whose baseline jumps up and down with Python?

问题

我正在分析SciPy中的心电图（EKG）内置数据集，其中一部分如下所示：

上述数据的一个问题是EKG的基线上下跳动很大。如果您对EKG或心跳分析不熟悉，它们应该是平的，只有一些"QRS复合体"（也称为实际心跳）的尖峰，如下所示：

制造EKG监测器的医疗设备公司使用一些平坦化和滤波函数，使EKG更加平滑和易于阅读，因为自然的人体运动是不可避免的，会导致EKG信号像上面显示的那样跳动。但我不知道他们使用哪些滤波器/函数。

我如何使用SciPy或编写自定义函数来使上述SciPy EKG数据集平滑？

我尝试过的方法

我已经尝试阅读SciPy信号处理文档，但尚未找到任何可以"平滑"数据或将其调整到基线的函数。我在数字信号处理方面是新手，想知道是否有更好或官方的方法来做到这一点。

我尝试了通过一些移动平均值将值上下移动，但简单的Y值加法或减法肯定不是正确的方法。那太粗糙了。

感谢您的帮助。

问题澄清：修复"基线漂移"

在评论中，另一位用户（Christoph Rackwitz）建议使用高通滤波器。在搜索高通滤波器时，我找到了一篇关于心电图的研究论文中的一幅图像：

他们说第一幅图像具有"基线漂移"，这正是我尝试回答的问题。如何修复基线漂移？

两篇有用的研究论文

考虑到我在上一部分提到的研究论文，以及我刚找到的另一篇关于修复心电图中基线漂移的论文，我将阅读这些论文，看看我能找到什么信息。

英文:

I'm analyzing the electrocardiogram (EKG) built-in dataset from SciPy, a segment of which looks like below:

One problem with the data above is that the baseline of the EKG jumps up and down a lot. If you're not familiar with EKGs or heartbeat analysis, they're supposed to be flat with a few spikes of the "QRS complex" (AKA the actual heartbeat), like below:

Medical device companies that make EKG monitors use some flattening and filtering functions to make EKGs smoother and easier to read, since natural human body movements are unavoidable and will cause the EKG signal to jump around as shown above. But I don't know which filters/functions they use.

How can I use SciPy or write a custom function to flatten the SciPy EKG dataset above?

What I have tried

I have tried reading the SciPy documentation for signal processing, but have not yet found any functions that "flatten" data or bring it to baseline. I'm a novice in digital signals processing and want to know if there is a better or official way to do this.

I have tried shifting the values up or down by some moving average, but there's no way that simple addition or subtraction of Y-values is the correct way. That is too hacky.

Thanks for your help.

Question clarification: fixing "baseline wandering"

Another user in the comments (Christoph Rackwitz) suggested a high-pass filter. While Googling high-pass filters, I found an image from a research paper on ECGs:

They said the first image had "baseline wandering," which is what I'm really trying to answer with this question. How do I fix baseline wandering?

Two useful research papers

Considering the research paper I mentioned in the last section, as well as another one I just found about fixing baseline wandering in ECGs, I'll read these and see what I find.

答案1

得分: 5

以下是一些低通滤波器的展示。

您需要了解“因果滤波器”与“非因果滤波器”的区别，以及有限脉冲响应（FIR）和无限脉冲响应（IIR）之间的差异。

加载数据：

signal = scipy.datasets.electrocardiogram()
fs = 360 # 假设是文档中提到的采样率
time = np.arange(signal.size) / fs # 仅用于绘图

探索信号：

fig, axs = plt.subplots(3, 1, figsize=(15, 15))
axs[0].plot(time[30000:31000], signal[30000:31000])
axs[1].plot(time[30000:40000], signal[30000:40000])
axs[2].plot(time, signal)
axs[0].set_xlabel('时间（秒）')
axs[1].set_xlabel('时间（秒）')
axs[2].set_xlabel('时间（秒）')
plt.show()

尝试几种滤波器：

# 巴特沃斯滤波器，一阶，0.5 Hz 截止频率
lowpass = scipy.signal.butter(1, 0.5, btype='lowpass', fs=fs, output='sos')
lowpassed = scipy.signal.sosfilt(lowpass, signal)
highpassed = signal - lowpassed
fig, axs = plt.subplots(2, 1, figsize=(15, 10))
axs[0].plot(time[30000:32000], signal[30000:32000])
axs[0].plot(time[30000:32000], lowpassed[30000:32000])
axs[1].plot(time[30000:32000], highpassed[30000:32000])
axs[0].set_xlabel('时间（秒）')
axs[1].set_xlabel('时间（秒）')
axs[0].set_ylim([-3, +3])
axs[1].set_ylim([-3, +3])
plt.show()

# 注意这些系数：
>>> scipy.signal.butter(1, 0.5, btype='lowpass', fs=fs, output='ba')
(array([0.00434, 0.00434]), array([ 1.     , -0.99131]))
# 与以下内容进行比较...

# 几乎相同的内容，不同的公式：“指数平均”
# y += (x - y) * alpha  # 应用于信号的每个值X，以产生新的Y
alpha = 1/100 # 取决于采样率和期望的截止频率
lowpassed = scipy.signal.lfilter([alpha], [1, -(1-alpha)], signal)
highpassed = signal - lowpassed
fig, axs = plt.subplots(2, 1, figsize=(15, 10))
axs[0].plot(time[30000:32000], signal[30000:32000])
axs[0].plot(time[30000:32000], lowpassed[30000:32000])
axs[1].plot(time[30000:32000], highpassed[30000:32000])
axs[0].set_xlabel('时间（秒）')
axs[1].set_xlabel('时间（秒）')
axs[0].set_ylim([-3, +3])
axs[1].set_ylim([-3, +3])
plt.show()

# 前两个滤波器是“因果”的，即仅使用过去的样本。
# 缺点：滞后、相位偏移，即低通滤波器不完全匹配/跟踪信号。
# “非因果”滤波器可以使用未来的样本。
# 这允许去除相位偏移，但处理引入了延迟。
# 对于离线处理或者如果被认为“足够小”，则此延迟无关紧要。
# 以下是非因果的例子。

# 中值滤波器。将峰值视为离群值，因此这显示了可以被视为“基线”的任何内容。
# 如果核窗口仅覆盖过去的部分，那么可以是因果的，但这会引入滞后（当信号明显漂移时可见）。
# 可能需要在减去之前对中值滤波器进行另一轮平滑。
# 中值滤波可以很便宜，如果使用正确的数据结构。scipy 的实现似乎不够智能，需要明显的时间。
lowpassed = scipy.signal.medfilt(signal, kernel_size=fs+1)
highpassed = signal - lowpassed
fig, axs = plt.subplots(2, 1, figsize=(15, 10))
axs[0].plot(time[30000:32000], signal[30000:32000])
axs[0].plot(time[30000:32000], lowpassed[30000:32000])
axs[1].plot(time[30000:32000], highpassed[30000:32000])
axs[0].set_xlabel('时间（秒）')
axs[1].set_xlabel('时间（秒）')
axs[0].set_ylim([-3, +3])
axs[1].set_ylim([-3, +3])
plt.show()

lowpassed = scipy.ndimage.gaussian_filter1d(signal, sigma=0.2 * fs, order=0)
highpassed = signal - lowpassed
fig, axs = plt.subplots(2, 1, figsize=(15, 10))
axs[0].plot(time[30000:32000], signal[30000:32000])
axs[0].plot(time[30000:32000], lowpassed[30000:32000])
axs[1].plot(time[30000:32000], highpassed[30000:32000])
axs[0].set_xlabel('时间（秒）')
axs[1].set_xlabel('时间（秒）')
axs[0].set_ylim([-3, +3])
axs[1].set_ylim([-3, +3])
plt.show()

英文:

Here's a showcase of some lowpass filters.

You'll want to read up on "causal" filters vs non-causal filters, as well as the difference between FIR and IIR.

Loading the data:

signal = scipy.datasets.electrocardiogram()
fs = 360 # say the docs
time = np.arange(signal.size) / fs # for plotting only

Explore the signal:

fig, axs = plt.subplots(3, 1, figsize=(15, 15))
axs[0].plot(time[30000:31000], signal[30000:31000])
axs[1].plot(time[30000:40000], signal[30000:40000])
axs[2].plot(time, signal)
axs[0].set_xlabel(&#39;Time (s)&#39;)
axs[1].set_xlabel(&#39;Time (s)&#39;)
axs[2].set_xlabel(&#39;Time (s)&#39;)
plt.show()

Trying a few filters:

# Butterworth, first order, 0.5 Hz cutoff
lowpass = scipy.signal.butter(1, 0.5, btype=&#39;lowpass&#39;, fs=fs, output=&#39;sos&#39;)
lowpassed = scipy.signal.sosfilt(lowpass, signal)
highpassed = signal - lowpassed
fig, axs = plt.subplots(2, 1, figsize=(15, 10))
axs[0].plot(time[30000:32000], signal[30000:32000])
axs[0].plot(time[30000:32000], lowpassed[30000:32000])
axs[1].plot(time[30000:32000], highpassed[30000:32000])
axs[0].set_xlabel(&#39;Time (s)&#39;)
axs[1].set_xlabel(&#39;Time (s)&#39;)
axs[0].set_ylim([-3, +3])
axs[1].set_ylim([-3, +3])
plt.show()

# take note of these coefficients:
&gt;&gt;&gt; scipy.signal.butter(1, 0.5, btype=&#39;lowpass&#39;, fs=fs, output=&#39;ba&#39;)
(array([0.00434, 0.00434]), array([ 1.     , -0.99131]))
# and compare to the following...

# Almost the same thing, different formulation: &quot;exponential average&quot;
# y += (x - y) * alpha  # applied to each value X of the signal to produce a new Y
alpha = 1/100 # depends on fs and desired cutoff frequency
lowpassed = scipy.signal.lfilter([alpha], [1, -(1-alpha)], signal)
highpassed = signal - lowpassed
fig, axs = plt.subplots(2, 1, figsize=(15, 10))
axs[0].plot(time[30000:32000], signal[30000:32000])
axs[0].plot(time[30000:32000], lowpassed[30000:32000])
axs[1].plot(time[30000:32000], highpassed[30000:32000])
axs[0].set_xlabel(&#39;Time (s)&#39;)
axs[1].set_xlabel(&#39;Time (s)&#39;)
axs[0].set_ylim([-3, +3])
axs[1].set_ylim([-3, +3])
plt.show()

# the first two filters were &quot;causal&quot;, i.e. only using past samples.
# downside: lag, phase shift, i.e. the lowpass doesn&#39;t quite match/track the signal.
# &quot;non-causal&quot; filters can use future samples.
# this allows to remove the phase shift but the processing introduces a delay instead.
# this delay is irrelevant for offline processing or if it&#39;s considered &quot;small enough&quot;.
# the following are non-causal.

# median filter. interpret peaks as outliers, so this reveals whatever can be considered &quot;baseline&quot;.
# can be causal if the kernel window only covers the past but that introduces lag (noticeable when the signal drifts actively).
# might need another pass of smoothing, on the median filter, before subtracting.
# median filtering CAN be cheap, if using the right data structure. scipy implementation seems less smart, takes noticeable time.
lowpassed = scipy.signal.medfilt(signal, kernel_size=fs+1)
highpassed = signal - lowpassed
fig, axs = plt.subplots(2, 1, figsize=(15, 10))
axs[0].plot(time[30000:32000], signal[30000:32000])
axs[0].plot(time[30000:32000], lowpassed[30000:32000])
axs[1].plot(time[30000:32000], highpassed[30000:32000])
axs[0].set_xlabel(&#39;Time (s)&#39;)
axs[1].set_xlabel(&#39;Time (s)&#39;)
axs[0].set_ylim([-3, +3])
axs[1].set_ylim([-3, +3])
plt.show()

lowpassed = scipy.ndimage.gaussian_filter1d(signal, sigma=0.2 * fs, order=0)
highpassed = signal - lowpassed
fig, axs = plt.subplots(2, 1, figsize=(15, 10))
axs[0].plot(time[30000:32000], signal[30000:32000])
axs[0].plot(time[30000:32000], lowpassed[30000:32000])
axs[1].plot(time[30000:32000], highpassed[30000:32000])
axs[0].set_xlabel(&#39;Time (s)&#39;)
axs[1].set_xlabel(&#39;Time (s)&#39;)
axs[0].set_ylim([-3, +3])
axs[1].set_ylim([-3, +3])
plt.show()

答案2

得分: 1

以下是您要翻译的内容：

要补充克里斯托弗·拉克维茨（Christoph Rackwitz）相当详细的回答（我从中提取了绘图代码），您还可以使用非因果（保持相位）的IIR高通滤波器来去除基线漂移。这实际上与使用低通滤波器来计算移动基线，然后从原始信号中减去它是一样的：

from scipy import signal, datasets
import matplotlib.pyplot as plt
import numpy as np
ecg = datasets.electrocardiogram()
fs = 360 
time = np.arange(ecg.size) / fs 
[b,a] = signal.butter(4, 0.5, btype='highpass', fs=fs)
filtered = signal.filtfilt(b, a, ecg)
fig, axs = plt.subplots(2, 1, figsize=(15, 10))
axs[0].plot(time[30000:32000], ecg[30000:32000])
axs[1].plot(time[30000:32000], filtered[30000:32000])
axs[0].set_xlabel('Time (s)')
axs[1].set_xlabel('Time (s)')
axs[0].set_ylim([-3, +3])
axs[1].set_ylim([-3, +3])
plt.show()

请务必使用filtfilt而不是filt（或类似的sosfilt等），否则IIR滤波器（例如Butterworth滤波器）的非线性相位响应将改变您的信号形状，可能破坏进一步的QRS分析。FIR滤波器不会这样做，但它也会将原始信号的时间向后移动与滤波器系数一样多的样本，对于一个紧密的频率响应，这将会有很多（多达400+）个。

请注意使用filtfilt而不是filt（或类似的sosfilt），否则IIR滤波器（例如Butterworth滤波器）的非线性相位响应将改变信号的形状，可能会损害进一步的QRS分析。FIR滤波器不会这样做，但它也会将原始信号的时间向后移动与滤波器系数一样多的样本，这将是很多的（多达400+）以获得良好的频率响应。

英文:

To add to Christoph Rackwitz's quite extensive answer (from which I pinched the plotting code), you can also use a non-causal (phase preserving) IIR high pass filter to remove the baseline drift. This is effectively the same as using a lowpass to compute the moving baseline and then subtracting it from the original signal:

from scipy import signal, datasets
import matplotlib.pyplot as plt
import numpy as np
ecg = datasets.electrocardiogram()
fs = 360 
time = np.arange(ecg.size) / fs 
[b,a] = signal.butter(4, 0.5, btype=&#39;highpass&#39;, fs=fs)
filtered = signal.filtfilt(b, a, ecg)
fig, axs = plt.subplots(2, 1, figsize=(15, 10))
axs[0].plot(time[30000:32000], ecg[30000:32000])
axs[1].plot(time[30000:32000], filtered[30000:32000])
axs[0].set_xlabel(&#39;Time (s)&#39;)
axs[1].set_xlabel(&#39;Time (s)&#39;)
axs[0].set_ylim([-3, +3])
axs[1].set_ylim([-3, +3])
plt.show()

Be sure to use filtfilt and not filt (or a variety like sosfilt) or else the nonlinear phase response of an IIR filter (e.g. a butterworth filter) will change the shape of your signal and perhaps ruin further QRS analysis. An FIR filter will not do this, but it will also time shift your original signal by as many samples as you have filter coefficients---which will be many (as many as 400+) for a nice tight frequency response.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何使用Python平整一个基线上下跳动的数字信号？

问题

我尝试过的方法

问题澄清：修复"基线漂移"

两篇有用的研究论文

What I have tried

Question clarification: fixing "baseline wandering"

Two useful research papers

答案1

答案2

如何一次保存多个CSV文件，并更改它们的标题？

Could not broadcast input array from shape (300,) into shape when mapping FastText word vectors with word in the dataset

if first value is zero in one dataframe set previous values to 1 in another dataframe on condition

There is a Python function that allows me to sum the last goals of a team in a dataframe.

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。