实现多线程在Python中读取文件中的行并检查是否匹配给定的字符串。

huangapple go评论152阅读模式
英文:

Implementing multi threading in python to to read the lines in a file and check whether the line matches the given string

问题

我正在尝试在Python3中实现多线程以读取特定文件中的行并检查是否与给定字符串匹配。这让我感到有些困惑。请帮助我解决这个问题,还告诉我如果在这里使用多进程而不是多线程。请检查下面的代码

还有一件事。我正在尝试在基于Linux的系统上实现这个

  1. import threading
  2. from subprocess import check_output
  3. def thread_task(line):
  4. try:
  5. if line=="hello123":
  6. print("Found")
  7. init_.close()
  8. exit()
  9. except:
  10. pass
  11. def check_point(lock,file1):
  12. lock.acquire()
  13. print(file1)
  14. with open(file1,"r",encoding="utf-8",errors = "ignore") as data:
  15. for line in data:
  16. line = line[:-1]
  17. thread_task(line)
  18. lock.release()
  19. def main_task(wordlist):
  20. lock = threading.Lock()
  21. # creating threads
  22. t = list(range(0,len(div)))
  23. for i in t:
  24. t[i] = threading.Thread(target=check_point, args=(lock,div[i],))
  25. # start threads
  26. for i in range(0,len(div)):
  27. t[i].start()
  28. # wait until threads finish their job
  29. for i in range(0,len(div)):
  30. t[i].join()
  31. if __name__ == "__main__":
  32. wordlist = input("Enter the File location : ")
  33. div = check_output(['split','-l','100000',wordlist,"Temp/"])
  34. div = check_output(["ls",'Temp/']).decode("utf").split("\n")
  35. div = div[:-1]
  36. main_task(div)
  37. print("String Not found")

希望这对你有所帮助。

英文:

I am trying to implement multi-threading in python3 to read the lines in a particular file and check whether the line matches the given string. This is causing me some confusion. Please help me with this and also let me know if multi-processing used here instead of multiprocessing. Please check the code below

One more thing. I am trying to implement this on Linux based system

  1. import threading
  2. from subprocess import check_output
  3. def thread_task(line):
  4. try:
  5. if line=="hello123":
  6. print("Found")
  7. init_.close()
  8. exit()
  9. except:
  10. #print("Somethings gotta")
  11. pass
  12. def check_point(lock,file1):
  13. lock.acquire()
  14. print(file1)
  15. with open(file1,"r",encoding="utf-8",errors = "ignore") as data:
  16. for line in data:
  17. line = line[:-1]
  18. thread_task(line)
  19. lock.release()
  20. def main_task(wordlist):
  21. lock = threading.Lock()
  22. # creating threads
  23. t = list(range(0,len(div)))
  24. for i in t:
  25. t[i] = threading.Thread(target=check_point, args=(lock,div[i],))
  26. # start threads
  27. for i in range(0,len(div)):
  28. t[i].start()
  29. # wait until threads finish their job
  30. for i in range(0,len(div)):
  31. t[i].join()
  32. if __name__ == "__main__":
  33. wordlist = input("Enter the File location : ")
  34. div = check_output(['split','-l','100000',wordlist,"Temp/"])
  35. div = check_output(["ls",'Temp/']).decode("utf").split("\n")
  36. div = div[:-1]
  37. main_task(div)
  38. print("String Not found")

答案1

得分: 1

In most OS the bottleneck is the disk access. Therefore splitting your large file into many small files to read them simultaneously is just a waste of time.

Please, read your file in your python script, one line at a time, check for the word being included or not. Multiprocessing won't help you here.


If I were you, I'd write something like this:

  1. if __name__ == "__main__":
  2. wordlist = set()
  3. wordlist_name = input("Enter the File location: ")
  4. with open(wordlist_name) as fin:
  5. wordlist = set(fin.read().split())
  6. if 'hello123' in wordlist:
  7. print("Found")
  8. else:
  9. print("String Not found")
英文:

In most OS the bottleneck is the disk access. Therefore splitting your large file into many small files to read them simultaneously is just a waste of time.

Please, read your file in your python script, one line at a time, check for the word being included or not. Multiprocessing won't help you here.


If I were you, I'd write something like this:

  1. if __name__ == "__main__":
  2. wordlist = set()
  3. wordlist_name = input("Enter the File location : ")
  4. with open( wordlist_name ) as fin :
  5. wordlist = set( fin.read().split() )
  6. if 'hello123' in wordlist :
  7. print( "Found" )
  8. else :
  9. print("String Not found")

答案2

得分: -1

If you need to choose between multiprocessing and multi-threading you may be better off using multiprocessing in this case. Your program would then look something like this-

  1. from multiprocessing import Pool
  2. def process_line(line):
  3. if line.strip() == "hello123":
  4. return True
  5. return False
  6. if __name__ == "__main__":
  7. pool = Pool(4)
  8. wordlist = input("Enter the File location : ")
  9. with open(wordlist) as source_file:
  10. results = pool.map(process_line, source_file, 4)
  11. if True in results:
  12. print("Found")
  13. else:
  14. print("string not found")

Though depending on your system this might not help at all since in this case your reading from a file you tend to be I/O bound.

英文:

If you need to choose between multiprocessing and multi-threading you may be better off using multiprocessing in this case. Your program would then look something like this-

  1. from multiprocessing import Pool
  2. def process_line(line):
  3. if(line.strip() == "hello123"):
  4. return True
  5. return False
  6. if __name__ == "__main__":
  7. pool = Pool(4)
  8. wordlist = input("Enter the File location : ")
  9. with open(wordlist) as source_file:
  10. results = pool.map(process_line, source_file, 4)
  11. if True in results:
  12. print("Found")
  13. else:
  14. print("string not found")

Though depending on your system this might not help at all since in this case your reading from a file you tend to be I/O bound

huangapple
  • 本文由 发表于 2020年1月6日 16:50:33
  • 转载请务必保留本文链接:https://go.coder-hub.com/59609084.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定