Django上传和处理多个文件失败,出现libreoffice错误。

huangapple go评论54阅读模式
英文:

Django upload and process multiple files failing with libreoffice

问题

我正在开发一个Django应用程序,用于处理Excel文件。它只能处理xlsx文件,但如果上传的是xlsods文件,我会先将其转换为xlsx,然后再处理。我的应用程序支持在表单中同时上传多个文件。所有上传的文件都会成功上传并保存到数据库模型中,其中一个字段为status = 'Created'。然后,一个模型函数会在保存后触发一个新的线程,用于在后台处理这些文件。处理完成后,文件的状态会更改为'Error''Processed'。我还添加了一个额外的选项来重新处理文件。

问题出在当我尝试上传多个不是xlsx的文件时,这些文件需要在我的处理之前转换为xlsx。为此,我使用了Python的子进程执行libreoffice --convert-to xlsx filename --headless。这在一次或两次同时上传一个或两个文件时运行正常。但是,如果同时上传多个文件,有些会失败,而有些文件会成功处理,而且没有任何明显的模式。所有测试文件都可以正常工作,如果我一个接一个地上传它们,甚至重新处理这些文件也没问题。

错误是由libreoffice引发的,因为如果上传多个已经是xlsx文件的文件,它们也会成功处理。当发生这种情况时,libreoffice返回1,没有标准输出或标准错误信息。

你的问题可能出在并发处理多个文件时,由于libreoffice的调用,可能会导致一些竞争条件或资源争用。你可以尝试添加一些同步机制来确保libreoffice的调用不会冲突。例如,使用Python的threading.Lock来保护libreoffice的调用,以确保一次只有一个线程在调用libreoffice

你可以在处理文件之前添加一个锁,如下所示:

# 在导入中添加
import threading

# 在process_file_function中添加
libreoffice_lock = threading.Lock()
if libreoffice_lock.acquire(timeout=60):  # 等待锁的时间
    try:
        if filename.endswith('.ods') or filename.endswith('.xls'):
            import os
            print(os.stat(filename))
            output = subprocess.run(['libreoffice', '--convert-to', 'xlsx', filename, '--headless', '--outdir', '/tmp/sage/'], capture_output=True)
            print(output)
            filename = f"/tmp/sage/{filename.split('/')[-1].replace('xls', 'xlsx').replace('.ods', '.xlsx')}"
    finally:
        libreoffice_lock.release()

这样可以确保一次只有一个线程在调用libreoffice,以避免竞争条件。希望这对你有帮助。

英文:

I'm working on a Django application that works with excel files. It only works with xlsx files but if you upload an xls or an ods file I convert it previously to xlsx in order to work with that processed file. My application supports multiple file upload in the form. All files uploaded are uploaded successfully and saved into a model in Database with a field status = 'Created'. A post-save model function triggers a new Thread that process files for processing those files in background. After files are processed them are saved as status = 'Error' or status = 'Processed'. I have also added an extra option to reprocess files.

The problem comes when I try to upload multiple files which are not xlsx those files need to be converted to xlsx before my own processing stuff. For that purpose I'm using libreoffice --convert-to xlsx filename --headless in a python subprocess. This is working fine with one or two file upload at the same time. But if I upload multiple files at the same time, some are failing and some files are being processed successfully, and there aren't any pattern with the files. All files for testing works properly if I upload them one by one, or even if I reprocess those files.

The error is given by libreoffice, because if I upload multiple files which are already xlsx files are being processed successfully too. When this happens, libreoffice returns 1 an no stdout nor stderr.

models.py

class Document(models.Model):
    docfile = models.FileField(upload_to='documents/%Y/%m/%d')
    date_creation = models.DateTimeField(auto_now_add=True)
    document_type = models.TextField(max_length=256)
    status = models.TextField(max_length=256, default="Created")
    bank_account = models.TextField(max_length=256, null=True)

    def filename(self):
        return os.path.basename(self.docfile.name)


@receiver(models.signals.post_save, sender=Document)
def process_file(sender, instance, **kwargs):
    t = threading.Thread(target=process_file_function,args=[sender,instance],daemon=True)
    t.start()

functions.py

def process_file_function(sender, instance, **kwargs):
    from accounting.models import Asiento, Apunte, FiltroBanco
    import pytz

    if instance.status == "Created" or instance.status == "Reprocess":
        filename = file = instance.docfile.name
        instance.status='Processing'
        instance.save(update_fields=['status'])

        print(f"Starting processing file: {file}")

        try:
            if filename.endswith('.ods') or filename.endswith('xls'):
                import os
                print(os.stat(filename))
                output = subprocess.run(["libreoffice", "--convert-to", "xlsx", filename, "--headless", "--outdir", "/tmp/sage/"], capture_output=True)
                print(output)
                filename = f"/tmp/sage/{filename.split('/')[-1].replace('xls', 'xlsx').replace('.ods', '.xlsx')}"

            wb = load_workbook(filename=filename, data_only=True)

            # Do my stuff

            instance.status='Processed'
            instance.save()
            print(f"Finished processing file: {file}")
        except Exception as e:
            instance.status='Error'
            instance.save()

Otuput example of a successful file:

Starting processing file: documents/2023/02/19/filename02.ods
os.stat_result(st_mode=33188, st_ino=901900, st_dev=40, st_nlink=1, st_uid=1000, st_gid=1000, st_size=29771, st_atime=1676805630, st_mtime=1676805630, st_ctime=1676805630)
CompletedProcess(args=['libreoffice', '--convert-to', 'xlsx', 'documents/2023/02/19/filename02.ods', '--headless', '--outdir', '/tmp/sage/'], returncode=0, stdout=b'convert /home/ajulian/Documents/code/python/facturasweb/documents/2023/02/19/filename02.ods -> /tmp/sage/filename02.xlsx using filter : Calc Office Open XML\n', stderr=b'')
Finished processing file: documents/2023/02/19/filename02.ods

Output example of a error file:

Starting processing file: documents/2023/02/19/filename01.ods
os.stat_result(st_mode=33188, st_ino=901899, st_dev=40, st_nlink=1, st_uid=1000, st_gid=1000, st_size=21469, st_atime=1676805630, st_mtime=1676805630, st_ctime=1676805630)
CompletedProcess(args=['libreoffice', '--convert-to', 'xlsx', 'documents/2023/02/19/filename01.ods', '--headless', '--outdir', '/tmp/sage/'], returncode=1, stdout=b'', stderr=b'')
ERROR: Error processing file: documents/2023/02/19/filename01.ods
-----
Traceback (most recent call last):
  File "/home/ajulian/Documents/code/python/facturasweb/accounting/functions.py", line 50, in process_file_function
    wb = load_workbook(filename=filename, data_only=True)
  File "/usr/local/lib/python3.9/dist-packages/openpyxl/reader/excel.py", line 315, in load_workbook
    reader = ExcelReader(filename, read_only, keep_vba,
  File "/usr/local/lib/python3.9/dist-packages/openpyxl/reader/excel.py", line 124, in __init__
    self.archive = _validate_archive(fn)
  File "/usr/local/lib/python3.9/dist-packages/openpyxl/reader/excel.py", line 96, in _validate_archive
    archive = ZipFile(filename, 'r')
  File "/usr/lib/python3.9/zipfile.py", line 1239, in __init__
    self.fp = io.open(file, filemode)
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/sage/filename01.xlsx'

Notice the difference between the output of libreoffice subprocess. Don't blame filename01.ods because this file in other executions was converted successfully. It only happens on multiple file upload and not to all files.

What could be the problem? Why happens this sometimes and sometimes not? Why libreoffice only returns 1 without any output?

Thanks in advance.

答案1

得分: 0

解决了这个问题。问题发生在LibreOffice尝试同时打开相同用户配置时。通过为每个文件创建一个新的用户空间来解决:"-env:UserInstallation=file://{tmpfile}"

tmpfile = f"/tmp/sage/sessions/{filename.split('/')[-1].split('.')[0]}"
subprocess.run(["libreoffice", "--convert-to", "xlsx", filename, "--headless", "--outdir", "/tmp/sage/", f"-env:UserInstallation=file://{tmpfile}"], capture_output=True)
英文:

Solved this issue. The problem happens when libreoffice tries to open same user configuration at the same time. Solved by creating a new userspace for each file: "-env:UserInstallation=file://{tmpfile}"

tmpfile = f"/tmp/sage/sessions/{filename.split('/')[-1].split('.')[0]}"
subprocess.run(["libreoffice", "--convert-to", "xlsx", filename, "--headless", "--outdir", "/tmp/sage/", f"-env:UserInstallation=file://{tmpfile}"], capture_output=True)

huangapple
  • 本文由 发表于 2023年2月19日 19:29:56
  • 转载请务必保留本文链接:https://go.coder-hub.com/75499814.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定