2023年7月24日 18:27:58go评论123阅读模式

英文:

Convert all text document inside a directory to html document - python

问题

我有近10000个文本文档，我需要使用Python将其转换/保存为.html文件，然后将其转换为PDF文件。

我尝试了一个叫做'texttohtml'的包，可以使用以下命令进行安装：

pip install texttohtml

在终端中使用以下命令进行转换：

python -m texttohtml.convert C:\Users\User\Downloads\AAPL -o C:\Users\User\Downloads\AAPL_pdfs

但它没有工作。我尝试了两次，但它既没有报错，也没有创建任何HTML文档。

英文:

I have a close to 10000 text documents which I need to convert/save it into .html and convert it to pdf using python.

I tried a package called 'texttohtml'

which can be installed using

pip install texttohtml

from terminal:

python -m texttohtml.convert C:\Users\User\Downloads\AAPL -o C:\Users\User\Downloads\AAPL_pdfs

It did not work. I ran twice still it did not give me any errors or did it create any html documents.

答案1

得分: 0

我不对任何丢失的数据负责。

这应该遍历您的目录并尝试转换每个文件：

import os, subprocess
from_directory = "C:\\Users\\User\\Downloads\\AAPL"
to_directory = "C:\\Users\\User\\Downloads\\AAPL_pdfs"
for file in os.listdir(from_directory):
    if os.path.isfile(os.path.join(from_directory, file)):
        try:
            subprocess.run(f"python -m texttohtml.convert {os.path.join(from_directory, file)} -o {os.path.join(to_directory, file.split('.')[0]+'.html')}")
        except Exception as e:
            print(f"Failed to convert file: {file} Error: {e}")
            if "No data" in str(e): print(f"File: {file} was empty.")
# -o 表示成功时，将内容写入文件，而不是写入标准输出。
我建议您在尝试对所有文件进行转换之前进行测试。
此外，这将仅将文件转换为HTML格式，而不是PDF。
<details>
<summary>英文:</summary>
I take no responsibility for any lost data.
This should run through your directory and try to convert every file:
    import os, subprocess
    
    from_directory = &quot;C:\\Users\\User\\Downloads\\AAPL&quot;
    to_directory = &quot;C:\\Users\\User\\Downloads\\AAPL_pdfs&quot;
    
    for file in os.listdir(from_directory):
        if os.path.isfile(os.path.join(from_directory, file)):
            try:
                subprocess.run(f&quot;python -m texttohtml.convert {os.path.join(from_directory, file)} -o {os.path.join(to_directory, file.split(&#39;.&#39;)[0]+&#39;.html&#39;)}&quot;)
            except Exception as e:
                print(f&quot;Failed to convert file: {file} Error: {e}&quot;)
                if &quot;No data&quot; in e: print(f&quot;File: {file} was empty.&quot;)
-o means, that on success, instead of writing to stdout, it will write to a file.
I suggest you test it before trying it on all the files.
Also, this will only convert to HTML files and not PDF.
</details>
# 答案2
**得分**: 0
```python
import os
def change_file_extension(old_extension, new_extension):
    # 获取当前目录中的所有文件
    files = os.listdir()
    # 遍历文件并将具有旧扩展名的文件重命名为新扩展名
    for file in files:
        if file.endswith(old_extension):
            new_name = file.replace(old_extension, new_extension)
            os.rename(file, new_name)
            print(f"将 {file} 重命名为 {new_name}")
if __name__ == "__main__":
    old_extension = ".txt"
    new_extension = ".html"
    change_file_extension(old_extension, new_extension)

这段代码用于将文件扩展名从 .txt 更改为 .html。您需要进一步处理将每个 HTML 文档转换为 PDF 的工作。

英文:

import os
def change_file_extension(old_extension, new_extension):
# Get a list of all files in the current directory
files = os.listdir()
# Iterate through the files and rename those with the old extension to the new extension
for file in files:
if file.endswith(old_extension):
new_name = file.replace(old_extension, new_extension)
os.rename(file, new_name)
print(f&quot;Renamed {file} to {new_name}&quot;)
if __name__ == &quot;__main__&quot;:
old_extension = &quot;.txt&quot;
new_extension = &quot;.html&quot;
change_file_extension(old_extension, new_extension)

This worked for me to convert to html. I need to further work on converting each html document to pdf

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将目录中的所有文本文档转换为HTML文档 – Python

问题

答案1

Python点击文本定义的矩形时为什么会返回AttributeError？

在Django模板中如何加载静态文件？

在JSON文件中移除特定行

Python代码从保存在txt文件中的XML中提取数值。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。