寻找一种方法可以多次运行一个Python脚本,同时将txt文件转换为csv。

huangapple go评论97阅读模式
英文:

Looking for a way to run a Python script multiple times while converting txt files to csv

问题

以下是翻译好的部分:

  1. 我正在尝试将多个txt文件转换为csv
  2. 这是我的代码
  3. ```python
  4. import pandas as pd
  5. import pathlib
  6. path=pathlib.Path(folderpathtotxtfiles)
  7. def create_csv(filename):
  8. try:
  9. df=pd.read_csv(filename,sep='\s+',on_bad_lines='skip')
  10. df.to_csv(f'{filename}.csv',header=None,index=False)
  11. except:
  12. print(filename,'is empty')
  13. for filename in path.glob('*.txt'):
  14. create_csv(filename)

这给了我所需的输出,但是否有任何方法可以多次运行此代码,每次只从我的路径中获取10个文件并将其转换为csv。就像如果我同时多次运行它,它应该先获取前10个文件,然后获取下一个10个文件,依此类推。

  1. <details>
  2. <summary>英文:</summary>
  3. I am trying to convert multiple txt files to csv.
  4. Here&#39;s my code:

import pandas as pd
import pathlib
path=pathlib.Path(folderpathtotxtfiles)
def create_csv(filename):
try:
df=pd.read_csv(filename,sep='\s+',on_bad_lines='skip')
df.to_csv(f'{filename}.csv,header=None,index=False)
except:
print(filename,'is empty')
for filename in path.glob('*.txt'):
create_csv(filename)

  1. This gives me the required output,but is there anyway that I could run this code multiple times and each time it should only take 10 files from my path and convert it to csv. Like if I run it multiple times simultaneously it should take first 10 files then next 10 files and so on
  2. </details>
  3. # 答案1
  4. **得分**: 2
  5. 以下是您要翻译的内容:
  6. "You need some way to ensure that files processed in one run of the program are not processed in second and subsequent invocations."
  7. "One way to do this is to create a directory to where you move the files that have already been processed. In that way they won't be 'seen' on subsequent runs."
  8. "For optimum performance you should consider either multiprocessing or multithreading. In this case the former is probably most appropriate."
  9. "Something like this:
  10. import pandas
  11. import os
  12. import sys
  13. import shutil
  14. import glob
  15. import re
  16. from concurrent.futures import ProcessPoolExecutor
  17. SOURCE_DIR = '/Volumes/G-Drive/src' # location of source files
  18. SAVE_DIR = os.path.join(SOURCE_DIR, 'processed') # location of saved/processed files
  19. BATCH_SIZE = 10
  20. SUFFIX = 'txt'
  21. def move(filename):
  22. try:
  23. os.makedirs(SAVE_DIR, exist_ok=True)
  24. target = os.path.join(SAVE_DIR, os.path.basename(filename))
  25. shutil.move(filename, target)
  26. except Exception as e:
  27. print(e, file=sys.stderr)
  28. def create_csf(filename):
  29. try:
  30. df = pandas.read_csv(filename, sep=' ')
  31. csv = re.sub(fr'{SUFFIX}$', 'csv', filename)
  32. df.to_csv(csv, index=False)
  33. move(filename)
  34. except Exception as e:
  35. print(e, file=sys.stderr)
  36. def main():
  37. with ProcessPoolExecutor() as ppe:
  38. files = glob.glob(os.path.join(SOURCE_DIR, f'*.{SUFFIX}'))
  39. ppe.map(create_csf, list(files)[:BATCH_SIZE])
  40. if __name__ == '__main__':
  41. main()"
  42. <details>
  43. <summary>英文:</summary>
  44. You need some way to ensure that files processed in one run of the program are not processed in second and subsequent invocations.
  45. One way to do this is to create a directory to where you move the files that have already been processed. In that way they won&#39;t be &quot;seen&quot; on subsequent runs.
  46. For optimum performance you should consider either multiprocessing or multithreading. In this case the former is probably most appropriate.
  47. Something like this:
  48. import pandas
  49. import os
  50. import sys
  51. import shutil
  52. import glob
  53. import re
  54. from concurrent.futures import ProcessPoolExecutor
  55. SOURCE_DIR = &#39;/Volumes/G-Drive/src&#39; # location of source files
  56. SAVE_DIR = os.path.join(SOURCE_DIR, &#39;processed&#39;) # location of saved/processed files
  57. BATCH_SIZE = 10
  58. SUFFIX = &#39;txt&#39;
  59. def move(filename):
  60. try:
  61. os.makedirs(SAVE_DIR, exist_ok=True)
  62. target = os.path.join(SAVE_DIR, os.path.basename(filename))
  63. shutil.move(filename, target)
  64. except Exception as e:
  65. print(e, file=sys.stderr)
  66. def create_csf(filename):
  67. try:
  68. df = pandas.read_csv(filename, sep=&#39; &#39;)
  69. csv = re.sub(fr&#39;{SUFFIX}$&#39;, &#39;csv&#39;, filename)
  70. df.to_csv(csv, index=False)
  71. move(filename)
  72. except Exception as e:
  73. print(e, file=sys.stderr)
  74. def main():
  75. with ProcessPoolExecutor() as ppe:
  76. files = glob.glob(os.path.join(SOURCE_DIR, f&#39;*.{SUFFIX}&#39;))
  77. ppe.map(create_csf, list(files)[:BATCH_SIZE])
  78. if __name__ == &#39;__main__&#39;:
  79. main()
  80. </details>
  81. # 答案2
  82. **得分**: 0
  83. 是的,您可以修改您的代码以多次运行,并每次处理 10 个文件。
  84. ```python
  85. import pandas as pd
  86. import pathlib
  87. folder_path = pathlib.Path(folder_path_to_txt_files)
  88. def create_csv(filename):
  89. try:
  90. df = pd.read_csv(filename, sep='\s+', on_bad_lines='skip')
  91. df.to_csv(f'{filename}.csv', header=None, index=False)
  92. except:
  93. print(filename, '是空的')
  94. # 获取文件夹中所有txt文件的列表
  95. file_list = list(folder_path.glob('*.txt'))
  96. # 每次处理的文件数量
  97. batch_size = 10
  98. while file_list:
  99. batch = file_list[:batch_size]
  100. file_list = file_list[batch_size:]
  101. for filename in batch:
  102. create_csv(filename)

希望这对您有帮助。

英文:

Yes you can modify your code to run multiple time and took 10 every time.

  1. import pandas as pd
  2. import pathlib
  3. folder_path = pathlib.Path(folder_path_to_txt_files)
  4. def create_csv(filename):
  5. try:
  6. df = pd.read_csv(filename, sep=&#39;\s+&#39;, on_bad_lines=&#39;skip&#39;)
  7. df.to_csv(f&#39;{filename}.csv&#39;, header=None, index=False)
  8. except:
  9. print(filename, &#39;is empty&#39;)
  10. # Get the list of all txt files in the folder
  11. file_list = list(folder_path.glob(&#39;*.txt&#39;))
  12. # Number of files to process at a time
  13. batch_size = 10
  14. while file_list:
  15. batch = file_list[:batch_size]
  16. file_list = file_list[batch_size:]
  17. for filename in batch:
  18. create_csv(filename)

I hope this helps.

huangapple
  • 本文由 发表于 2023年6月5日 13:07:30
  • 转载请务必保留本文链接:https://go.coder-hub.com/76403610.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定