英文:
Django upload and process multiple files failing with libreoffice
问题
我正在开发一个Django应用程序,用于处理Excel文件。它只能处理xlsx
文件,但如果上传的是xls
或ods
文件,我会先将其转换为xlsx
,然后再处理。我的应用程序支持在表单中同时上传多个文件。所有上传的文件都会成功上传并保存到数据库模型中,其中一个字段为status = 'Created'
。然后,一个模型函数会在保存后触发一个新的线程,用于在后台处理这些文件。处理完成后,文件的状态会更改为'Error'
或'Processed'
。我还添加了一个额外的选项来重新处理文件。
问题出在当我尝试上传多个不是xlsx
的文件时,这些文件需要在我的处理之前转换为xlsx
。为此,我使用了Python的子进程执行libreoffice --convert-to xlsx filename --headless
。这在一次或两次同时上传一个或两个文件时运行正常。但是,如果同时上传多个文件,有些会失败,而有些文件会成功处理,而且没有任何明显的模式。所有测试文件都可以正常工作,如果我一个接一个地上传它们,甚至重新处理这些文件也没问题。
错误是由libreoffice引发的,因为如果上传多个已经是xlsx
文件的文件,它们也会成功处理。当发生这种情况时,libreoffice返回1,没有标准输出或标准错误信息。
你的问题可能出在并发处理多个文件时,由于libreoffice
的调用,可能会导致一些竞争条件或资源争用。你可以尝试添加一些同步机制来确保libreoffice
的调用不会冲突。例如,使用Python的threading.Lock
来保护libreoffice
的调用,以确保一次只有一个线程在调用libreoffice
。
你可以在处理文件之前添加一个锁,如下所示:
# 在导入中添加
import threading
# 在process_file_function中添加
libreoffice_lock = threading.Lock()
if libreoffice_lock.acquire(timeout=60): # 等待锁的时间
try:
if filename.endswith('.ods') or filename.endswith('.xls'):
import os
print(os.stat(filename))
output = subprocess.run(['libreoffice', '--convert-to', 'xlsx', filename, '--headless', '--outdir', '/tmp/sage/'], capture_output=True)
print(output)
filename = f"/tmp/sage/{filename.split('/')[-1].replace('xls', 'xlsx').replace('.ods', '.xlsx')}"
finally:
libreoffice_lock.release()
这样可以确保一次只有一个线程在调用libreoffice
,以避免竞争条件。希望这对你有帮助。
英文:
I'm working on a Django application that works with excel files. It only works with xlsx
files but if you upload an xls
or an ods
file I convert it previously to xlsx
in order to work with that processed file. My application supports multiple file upload in the form. All files uploaded are uploaded successfully and saved into a model in Database with a field status = 'Created'
. A post-save model function triggers a new Thread that process files for processing those files in background. After files are processed them are saved as status = 'Error'
or status = 'Processed'
. I have also added an extra option to reprocess files.
The problem comes when I try to upload multiple files which are not xlsx
those files need to be converted to xlsx
before my own processing stuff. For that purpose I'm using libreoffice --convert-to xlsx filename --headless
in a python subprocess. This is working fine with one or two file upload at the same time. But if I upload multiple files at the same time, some are failing and some files are being processed successfully, and there aren't any pattern with the files. All files for testing works properly if I upload them one by one, or even if I reprocess those files.
The error is given by libreoffice, because if I upload multiple files which are already xlsx
files are being processed successfully too. When this happens, libreoffice returns 1 an no stdout nor stderr.
models.py
class Document(models.Model):
docfile = models.FileField(upload_to='documents/%Y/%m/%d')
date_creation = models.DateTimeField(auto_now_add=True)
document_type = models.TextField(max_length=256)
status = models.TextField(max_length=256, default="Created")
bank_account = models.TextField(max_length=256, null=True)
def filename(self):
return os.path.basename(self.docfile.name)
@receiver(models.signals.post_save, sender=Document)
def process_file(sender, instance, **kwargs):
t = threading.Thread(target=process_file_function,args=[sender,instance],daemon=True)
t.start()
functions.py
def process_file_function(sender, instance, **kwargs):
from accounting.models import Asiento, Apunte, FiltroBanco
import pytz
if instance.status == "Created" or instance.status == "Reprocess":
filename = file = instance.docfile.name
instance.status='Processing'
instance.save(update_fields=['status'])
print(f"Starting processing file: {file}")
try:
if filename.endswith('.ods') or filename.endswith('xls'):
import os
print(os.stat(filename))
output = subprocess.run(["libreoffice", "--convert-to", "xlsx", filename, "--headless", "--outdir", "/tmp/sage/"], capture_output=True)
print(output)
filename = f"/tmp/sage/{filename.split('/')[-1].replace('xls', 'xlsx').replace('.ods', '.xlsx')}"
wb = load_workbook(filename=filename, data_only=True)
# Do my stuff
instance.status='Processed'
instance.save()
print(f"Finished processing file: {file}")
except Exception as e:
instance.status='Error'
instance.save()
Otuput example of a successful file:
Starting processing file: documents/2023/02/19/filename02.ods
os.stat_result(st_mode=33188, st_ino=901900, st_dev=40, st_nlink=1, st_uid=1000, st_gid=1000, st_size=29771, st_atime=1676805630, st_mtime=1676805630, st_ctime=1676805630)
CompletedProcess(args=['libreoffice', '--convert-to', 'xlsx', 'documents/2023/02/19/filename02.ods', '--headless', '--outdir', '/tmp/sage/'], returncode=0, stdout=b'convert /home/ajulian/Documents/code/python/facturasweb/documents/2023/02/19/filename02.ods -> /tmp/sage/filename02.xlsx using filter : Calc Office Open XML\n', stderr=b'')
Finished processing file: documents/2023/02/19/filename02.ods
Output example of a error file:
Starting processing file: documents/2023/02/19/filename01.ods
os.stat_result(st_mode=33188, st_ino=901899, st_dev=40, st_nlink=1, st_uid=1000, st_gid=1000, st_size=21469, st_atime=1676805630, st_mtime=1676805630, st_ctime=1676805630)
CompletedProcess(args=['libreoffice', '--convert-to', 'xlsx', 'documents/2023/02/19/filename01.ods', '--headless', '--outdir', '/tmp/sage/'], returncode=1, stdout=b'', stderr=b'')
ERROR: Error processing file: documents/2023/02/19/filename01.ods
-----
Traceback (most recent call last):
File "/home/ajulian/Documents/code/python/facturasweb/accounting/functions.py", line 50, in process_file_function
wb = load_workbook(filename=filename, data_only=True)
File "/usr/local/lib/python3.9/dist-packages/openpyxl/reader/excel.py", line 315, in load_workbook
reader = ExcelReader(filename, read_only, keep_vba,
File "/usr/local/lib/python3.9/dist-packages/openpyxl/reader/excel.py", line 124, in __init__
self.archive = _validate_archive(fn)
File "/usr/local/lib/python3.9/dist-packages/openpyxl/reader/excel.py", line 96, in _validate_archive
archive = ZipFile(filename, 'r')
File "/usr/lib/python3.9/zipfile.py", line 1239, in __init__
self.fp = io.open(file, filemode)
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/sage/filename01.xlsx'
Notice the difference between the output of libreoffice subprocess. Don't blame filename01.ods because this file in other executions was converted successfully. It only happens on multiple file upload and not to all files.
What could be the problem? Why happens this sometimes and sometimes not? Why libreoffice only returns 1 without any output?
Thanks in advance.
答案1
得分: 0
解决了这个问题。问题发生在LibreOffice尝试同时打开相同用户配置时。通过为每个文件创建一个新的用户空间来解决:"-env:UserInstallation=file://{tmpfile}"
tmpfile = f"/tmp/sage/sessions/{filename.split('/')[-1].split('.')[0]}"
subprocess.run(["libreoffice", "--convert-to", "xlsx", filename, "--headless", "--outdir", "/tmp/sage/", f"-env:UserInstallation=file://{tmpfile}"], capture_output=True)
英文:
Solved this issue. The problem happens when libreoffice tries to open same user configuration at the same time. Solved by creating a new userspace for each file: "-env:UserInstallation=file://{tmpfile}"
tmpfile = f"/tmp/sage/sessions/{filename.split('/')[-1].split('.')[0]}"
subprocess.run(["libreoffice", "--convert-to", "xlsx", filename, "--headless", "--outdir", "/tmp/sage/", f"-env:UserInstallation=file://{tmpfile}"], capture_output=True)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论