英文:
Datajoint locking the table? Populating on multiple machines
问题
我有一个运行一些较重计算的表格(每个键的处理时间约为5分钟)。我想预留作业并在多台机器上运行它。我注意到,只要一台机器开始处理作业,计算机就会从表格中被锁定 - 它们必须等到其中一个作业完成,然后才能开始自己的作业,或者有机会获取一个作业。这种行为是从哪里来的?当作业花费太长时间时,我似乎在其他机器上遇到了"Lock wait timeout exceeded errors"。
@schema
class HeavyComputation(dj.Computed):
definition = """
# ...
-> Table1
class_label : varchar(25)
-> Table2.proj(somekey2="somekey")
---
analyzed : longblob
"""
我在表格上运行.populate(),使用以下设置:
settings = {"display_progress": True,
"reserve_jobs": True,
"suppress_errors": True,
"order": "random"}
英文:
I have a table that runs some heavier computation (process length ~ 5 minutes per key). I want to reserve jobs and run it on multiple machines. I noticed that computers get locked out from the table as soon as one machine starts processing a job - they effectively have to wait until one of the jobs finished before it starts its own, or gets a chance to grab a job. Where does this behavior stem from? I seem to run into "Lock wait timeout exceeded errors" on other machines then the one that is currently processing a job when the job is taking too long.
@schema
class HeavyComputation(dj.Computed):
definition = """
# ...
-> Table1
class_label : varchar(25)
-> Table2.proj(somekey2="somekey")
---
analyzed : longblob
I am running .populate() on the table with
settings = {"display_progress": True,
"reserve_jobs": True,
"suppress_errors": True,
"order": "random"}
答案1
得分: 1
是的,这是一个关于事务序列化工作方式的棘手问题。我稍后会详细解释并提供额外的背景,但解决方法是重新排列表中的主键属性:
@schema
class HeavyComputation(dj.Computed):
definition = """
# ...
-> Table1
-> Table2.proj(somekey2="somekey")
class_label : varchar(25)
---
analyzed : longblob
英文:
Yes, this is a tricky problem with how transaction serialization works. I will explain in a bit more detail and provide additional background but the solution is to reorder the primary key attributes in the table:
@schema
class HeavyComputation(dj.Computed):
definition = """
# ...
-> Table1
-> Table2.proj(somekey2="somekey")
class_label : varchar(25)
---
analyzed : longblob
Again, I will provide a detailed explanation later since it will take some time to write up. I did not want to make you wait.
答案2
得分: 1
问题是我的制作函数的子函数内部的.delete()
调用。我正在跟踪另一个(不相关的)表内的临时文件,并希望在制作例程完成后清理这些文件。然而,这个.delete
调用遇到了表锁定问题,从而阻止了.populate
调用的完成。
英文:
The problem turned out to be a .delete()
call inside a sub function of my make function. I am taking track of temporary files inside another (unrelated) table and wanted things to be cleaned once the make routine finishes. However, this .delete
was running into a table lock and thereby prevented the .populate
call to finish.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论