英文:
Latency in detecting key press (hotkey) due to non-event-thread activity
问题
在我的应用程序中,我正在使用非事件线程进行一些相当重的处理工作:使用docx2python
模块从多个.docx
文件中提取文本内容,然后重新组装行(每次10行)以形成用于与Elasticsearch索引的"Lucene文档"。
我有一个通过配置菜单项的热键,允许我在查看初始框架和表示执行任务进度的第二个框架之间切换。这种切换一直可以立即生效... 直到我真正开始处理实际任务。
从日志消息中,我可以看出切换方法并不是问题的根本原因:按下键盘和在切换方法开始时看到日志消息之间可能会有一段时间间隔(约1秒)。
只是为了说明我在线程使用中没有犯一些基本错误,请看下面的这个最小可重现示例(按F4键切换)。问题是,这个最小可重现示例实际上并没有展示出这个问题。我尝试使用各种可能会“拥堵”线程处理的复杂任务,但到目前为止没有复现出这种现象。
我考虑在工作线程中的某个频繁调用的点上放置QApplication.processEvents()
。但这没有改变任何情况。
有没有人熟悉热键是如何处理的?感觉好像我漏掉了一些明显的“yield”技巧(即使用"yield"来强制检测热键活动...)。
import sys, time, logging, pathlib
from PyQt5 import QtWidgets, QtCore
# ... 省略中间部分代码 ...
if __name__ == '__main__':
main()
注意:以上是您提供的代码的翻译部分,没有其他内容。
英文:
In my application I am using a non-event-thread to do some fairly heavy processing: using module docx2python to extract text content from multiple .docx files and then re-assemble batches of lines (10-line batches) to form "Lucene Documents" for the purpose of indexing with Elasticsearch.
I have a hotkey from configuring a menu item which lets me toggle between viewing the initial frame and a 2nd frame which represents what's going on in terms of executing tasks. This toggling has been working instantaneously... until I started actually working on the real tasks.
I can tell from the log messages that the toggle method is not the culprit: there can be a gap (about 1 s) between me pressing the key and seeing the log message at the start of the toggle method.
Just to show that I'm not making some textbook mistake in my use of threads, please have a look at this MRE below (F4 toggles). The trouble is, this MRE does NOT in fact illustrate the problem. I tried using various crunchy tasks which might potentially "congest" thread-handling, but nothing so far reproduces the phenomenon.
I thought of putting QApplication.processEvents()
in the worker thread at a point where it would be called frequently. This changed nothing.
Is anyone familiar with how hotkeys are handled? It feels like I'm missing some obvious "yield" technique (i.e. yield to force detection of hotkey activity...).
import sys, time, logging, pathlib
from PyQt5 import QtWidgets, QtCore
class MainWindow(QtWidgets.QMainWindow):
def __init__(self):
super().__init__()
self.setWindowTitle('Blip')
self.create_initial_gui_components()
self.add_standard_menu_components()
self.task_mgr = TaskManager()
self.task_mgr.thread.start()
self.alternate_view = False
def create_initial_gui_components(self):
self.resize(1200, 600)
central_widget = QtWidgets.QFrame(self)
self.setCentralWidget(central_widget)
central_widget.setLayout(QtWidgets.QVBoxLayout(central_widget))
central_widget.setStyleSheet('border: 5px solid green;')
# initially showing frame
self.frame1 = QtWidgets.QFrame()
central_widget.layout().addWidget(self.frame1)
self.frame1.setStyleSheet('border: 5px solid blue;')
# initially hidden frame
self.frame2 = QtWidgets.QFrame()
central_widget.layout().addWidget(self.frame2)
self.frame2.setStyleSheet('border: 5px solid red;')
self.frame2.hide()
def add_standard_menu_components(self):
self.menubar = QtWidgets.QMenuBar(self)
self.setMenuBar(self.menubar)
self.main_menu = QtWidgets.QMenu('&Main', self.menubar)
self.menubar.addMenu(self.main_menu)
# make the menu item with hotkey
self.main_menu.toggle_frame_view_action = self.make_new_menu_item_action('&Toggle view tasks', 'F4',
self.toggle_frame_view, self.main_menu, enabled=True)
def toggle_frame_view(self):
# this is the message which occurs after pressing F4, after a very bad gap, in the "real world" app
print('toggle...')
if self.alternate_view:
self.frame2.hide()
self.frame1.show()
else:
self.frame2.show()
self.frame1.hide()
self.alternate_view = not self.alternate_view
def make_new_menu_item_action(self, text, shortcut, connect_method, menu, enabled=True):
action = QtWidgets.QAction(text, menu)
action.setShortcut(shortcut)
menu.addAction(action)
action.triggered.connect(connect_method)
action.setEnabled(enabled)
return action
class TaskManager():
def __init__(self):
self.thread = QtCore.QThread()
self.task = LongTask()
self.task.moveToThread(self.thread)
self.thread.started.connect(self.task.run)
class LongTask(QtCore.QObject):
def run(self):
out_file_path = pathlib.Path('out.txt')
if out_file_path.is_file():
out_file_path.unlink()
# None of this works (has the desired effect of causing toggling latency)...
s = ''
for i in range(20):
print(f'{i} blip')
for j in range(1, 1000001):
for k in range(1, 1000001):
l = j/k
# performing I/O operations makes no difference
with open('out.txt', 'w') as f:
f.write(f'l: {l} len(s) {len(s)}\n')
s += f'{{once upon a {j}}}\n'
print('here BBB')
def main():
app = QtWidgets.QApplication(sys.argv)
window = MainWindow()
window.show()
app.exec()
if __name__ == '__main__':
main()
Later<br>
Suspicion about the answer. When a "long task" is started, some gui elements have to be constructed and added to frame 2 (progress bar and stop button on a small strip of a QFrame - one for each live task). Will have to do some tests. But it seems fairly unlikely that this could have such a dramatic effect (1s of latency).
答案1
得分: 0
我认为我终于检测到了问题...这是一个经典问题,可能会引起某人对类似情况感到困惑。
在我的工作线程(非事件线程)中,我有多个time.sleep(.0001)
命令,其目的是允许其他线程(包括事件线程,即“主线程”)夺取控制权...正如Musicamante在此处以前回答我的一个早期问题所解释的那样。
正如前面提到的,我的工作线程的目的是将Word (.docx)文档拆分为可分析的部分,使用docx2python进行分析,而分析本身是“CPU密集型”的。令我稍感惊讶的是,即使对于非常大的文件,这个命令也花费了非常少的时间:
docx_obj = docx2python.docx2python(docx_file_path)
但我最终发现,通过大量记录,这一行才是真正的问题所在:
if not self.generate_ldocs_from_structure_list(docx_obj.body):
但问题并不是调用我的方法generate...
。事实证明,docx_obj.body
虽然看起来只是对属性的引用,但实际上涉及执行一个非常耗时的操作。对于我测试集中的最大文件,有44,000个字,这个操作花费了超过1秒的时间!这总是我的测试集中的第一个文档,所以在运行开始的大约1秒后发生了这个问题。
因此,在发生这种情况时,任何尝试使用F4切换都可能导致非常慢的响应。除非使用multiprocessing
(正如ekhumoro在他的评论中推荐的那样),否则似乎没有解决办法:docx_obj.body
不会释放。
话虽如此,我刚刚看了两个关于GIL的有趣视频:其中一个是由David Beazley提出的一堂讲座,对GIL及其运作进行了法医学分析。请注意,我确实理解PyQt本身不运行Python代码,而是C++代码:然而,如果不能工程化地“放手”(docx2python
可能能够工程化地做到这一点:然而,我查看了源代码并发现整个耗时操作使用了copy.deepcopy
操作,所以无法在其中插入time.sleep()
...)的话,使用Python方式的工作线程可能会导致混乱。
另一个视频是这个,兴奋地命名为“如何绕过GIL”,并解释了为什么这样做,最终坚持使用线程,可能比全面使用多进程更可取。不幸的是,Python代码中的numba
只能处理具有“nogil=True”的“jitted”函数中的数值操作。
英文:
I think I finally detected the problem ... it's classic and it may be of interest to someone baffled by a similar situation.
In my worker thread (non-event thread) I had multiple time.sleep(.0001)
commands, the purpose of which is to allow other threads (including the event thread aka "main thread") to wrest control ... as previously explained here by Musicamante in reply to an earlier question by me.
The purpose of my worker thread, as mentioned, is to split apart Word (.docx) documents for analysis, using docx2python, and the analysis itself is "CPU-intensive". I was slightly surprised to find that, even for a very large file, this command took very little time:
docx_obj = docx2python.docx2python(docx_file_path)
... but what I finally found, with copious logging, is that this line was the real culprit:
if not self.generate_ldocs_from_structure_list(docx_obj.body):
...
... but it wasn't the calling of (my method) generate...
. It turns out that docx_obj.body
, although it seems to be just a reference to a property, actually involves the execution of a very lengthy operation. For the largest file in my test set, with 44,000 words, this operation took over 1 second! This was always the first document in my test set, and so occurred in about 1 second into the run.
Any attempt to toggle, using F4, while that is happening is therefore likely to be met with a very slow response. And short of going multiprocessing
(as recommended by ekhumoro in his comment), there appears to be no solution to this: docx_obj.body
won't let go.
Having said that, I just saw two interesting vids about the GIL: one is a lecture by David Beazley, forensically analysing the GIL and its operation. NB I do appreciate that PyQt itself is not running Python code, but C++ code: however, it appears that using worker threads in a Pythonic manner can cause mayhem if they can't be engineered to "let go" (docx2python
might be able to be engineered to do that: however I looked at the source code and found that the entire lengthy operation uses a copy.deepcopy
operation, so no way of inserting time.sleep()
in there...).
The other is this one, excitingly entitled "how to bypass the GIL", and explaining why doing that, and ultimately sticking with threads, might be preferable to going all multiprocess. Unfortunately the Python code numba
can handle in a "jitted" function with "nogil=True" appears limited to numerical operations.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论