将目录中的所有文本文档转换为HTML文档 – Python

huangapple go评论123阅读模式
英文:

Convert all text document inside a directory to html document - python

问题

我有近10000个文本文档,我需要使用Python将其转换/保存为.html文件,然后将其转换为PDF文件。

我尝试了一个叫做'texttohtml'的包,可以使用以下命令进行安装:

  1. pip install texttohtml

在终端中使用以下命令进行转换:

  1. python -m texttohtml.convert C:\Users\User\Downloads\AAPL -o C:\Users\User\Downloads\AAPL_pdfs

但它没有工作。我尝试了两次,但它既没有报错,也没有创建任何HTML文档。

将目录中的所有文本文档转换为HTML文档 – Python

英文:

I have a close to 10000 text documents which I need to convert/save it into .html and convert it to pdf using python.

I tried a package called 'texttohtml'

which can be installed using

  1. pip install texttohtml

from terminal:

  1. python -m texttohtml.convert C:\Users\User\Downloads\AAPL -o C:\Users\User\Downloads\AAPL_pdfs

It did not work. I ran twice still it did not give me any errors or did it create any html documents.

将目录中的所有文本文档转换为HTML文档 – Python

答案1

得分: 0

我不对任何丢失的数据负责。

这应该遍历您的目录并尝试转换每个文件:

  1. import os, subprocess
  2. from_directory = "C:\\Users\\User\\Downloads\\AAPL"
  3. to_directory = "C:\\Users\\User\\Downloads\\AAPL_pdfs"
  4. for file in os.listdir(from_directory):
  5. if os.path.isfile(os.path.join(from_directory, file)):
  6. try:
  7. subprocess.run(f"python -m texttohtml.convert {os.path.join(from_directory, file)} -o {os.path.join(to_directory, file.split('.')[0]+'.html')}")
  8. except Exception as e:
  9. print(f"Failed to convert file: {file} Error: {e}")
  10. if "No data" in str(e): print(f"File: {file} was empty.")
  11. # -o 表示成功时,将内容写入文件,而不是写入标准输出。
  12. 我建议您在尝试对所有文件进行转换之前进行测试
  13. 此外这将仅将文件转换为HTML格式而不是PDF
  14. <details>
  15. <summary>英文:</summary>
  16. I take no responsibility for any lost data.
  17. This should run through your directory and try to convert every file:
  18. import os, subprocess
  19. from_directory = &quot;C:\\Users\\User\\Downloads\\AAPL&quot;
  20. to_directory = &quot;C:\\Users\\User\\Downloads\\AAPL_pdfs&quot;
  21. for file in os.listdir(from_directory):
  22. if os.path.isfile(os.path.join(from_directory, file)):
  23. try:
  24. subprocess.run(f&quot;python -m texttohtml.convert {os.path.join(from_directory, file)} -o {os.path.join(to_directory, file.split(&#39;.&#39;)[0]+&#39;.html&#39;)}&quot;)
  25. except Exception as e:
  26. print(f&quot;Failed to convert file: {file} Error: {e}&quot;)
  27. if &quot;No data&quot; in e: print(f&quot;File: {file} was empty.&quot;)
  28. -o means, that on success, instead of writing to stdout, it will write to a file.
  29. I suggest you test it before trying it on all the files.
  30. Also, this will only convert to HTML files and not PDF.
  31. </details>
  32. # 答案2
  33. **得分**: 0
  34. ```python
  35. import os
  36. def change_file_extension(old_extension, new_extension):
  37. # 获取当前目录中的所有文件
  38. files = os.listdir()
  39. # 遍历文件并将具有旧扩展名的文件重命名为新扩展名
  40. for file in files:
  41. if file.endswith(old_extension):
  42. new_name = file.replace(old_extension, new_extension)
  43. os.rename(file, new_name)
  44. print(f"将 {file} 重命名为 {new_name}")
  45. if __name__ == "__main__":
  46. old_extension = ".txt"
  47. new_extension = ".html"
  48. change_file_extension(old_extension, new_extension)

这段代码用于将文件扩展名从 .txt 更改为 .html。您需要进一步处理将每个 HTML 文档转换为 PDF 的工作。

英文:
  1. import os
  2. def change_file_extension(old_extension, new_extension):
  3. # Get a list of all files in the current directory
  4. files = os.listdir()
  5. # Iterate through the files and rename those with the old extension to the new extension
  6. for file in files:
  7. if file.endswith(old_extension):
  8. new_name = file.replace(old_extension, new_extension)
  9. os.rename(file, new_name)
  10. print(f&quot;Renamed {file} to {new_name}&quot;)
  11. if __name__ == &quot;__main__&quot;:
  12. old_extension = &quot;.txt&quot;
  13. new_extension = &quot;.html&quot;
  14. change_file_extension(old_extension, new_extension)

This worked for me to convert to html. I need to further work on converting each html document to pdf

huangapple
  • 本文由 发表于 2023年7月24日 18:27:58
  • 转载请务必保留本文链接:https://go.coder-hub.com/76753573.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定