英文:
Implementing multi threading in python to to read the lines in a file and check whether the line matches the given string
问题
我正在尝试在Python3中实现多线程以读取特定文件中的行并检查是否与给定字符串匹配。这让我感到有些困惑。请帮助我解决这个问题,还告诉我如果在这里使用多进程而不是多线程。请检查下面的代码
还有一件事。我正在尝试在基于Linux的系统上实现这个
import threading
from subprocess import check_output
def thread_task(line):
try:
if line=="hello123":
print("Found")
init_.close()
exit()
except:
pass
def check_point(lock,file1):
lock.acquire()
print(file1)
with open(file1,"r",encoding="utf-8",errors = "ignore") as data:
for line in data:
line = line[:-1]
thread_task(line)
lock.release()
def main_task(wordlist):
lock = threading.Lock()
# creating threads
t = list(range(0,len(div)))
for i in t:
t[i] = threading.Thread(target=check_point, args=(lock,div[i],))
# start threads
for i in range(0,len(div)):
t[i].start()
# wait until threads finish their job
for i in range(0,len(div)):
t[i].join()
if __name__ == "__main__":
wordlist = input("Enter the File location : ")
div = check_output(['split','-l','100000',wordlist,"Temp/"])
div = check_output(["ls",'Temp/']).decode("utf").split("\n")
div = div[:-1]
main_task(div)
print("String Not found")
希望这对你有所帮助。
英文:
I am trying to implement multi-threading in python3 to read the lines in a particular file and check whether the line matches the given string. This is causing me some confusion. Please help me with this and also let me know if multi-processing used here instead of multiprocessing. Please check the code below
One more thing. I am trying to implement this on Linux based system
import threading
from subprocess import check_output
def thread_task(line):
try:
if line=="hello123":
print("Found")
init_.close()
exit()
except:
#print("Somethings gotta")
pass
def check_point(lock,file1):
lock.acquire()
print(file1)
with open(file1,"r",encoding="utf-8",errors = "ignore") as data:
for line in data:
line = line[:-1]
thread_task(line)
lock.release()
def main_task(wordlist):
lock = threading.Lock()
# creating threads
t = list(range(0,len(div)))
for i in t:
t[i] = threading.Thread(target=check_point, args=(lock,div[i],))
# start threads
for i in range(0,len(div)):
t[i].start()
# wait until threads finish their job
for i in range(0,len(div)):
t[i].join()
if __name__ == "__main__":
wordlist = input("Enter the File location : ")
div = check_output(['split','-l','100000',wordlist,"Temp/"])
div = check_output(["ls",'Temp/']).decode("utf").split("\n")
div = div[:-1]
main_task(div)
print("String Not found")
答案1
得分: 1
In most OS the bottleneck is the disk access. Therefore splitting your large file into many small files to read them simultaneously is just a waste of time.
Please, read your file in your python script, one line at a time, check for the word being included or not. Multiprocessing won't help you here.
If I were you, I'd write something like this:
if __name__ == "__main__":
wordlist = set()
wordlist_name = input("Enter the File location: ")
with open(wordlist_name) as fin:
wordlist = set(fin.read().split())
if 'hello123' in wordlist:
print("Found")
else:
print("String Not found")
英文:
In most OS the bottleneck is the disk access. Therefore splitting your large file into many small files to read them simultaneously is just a waste of time.
Please, read your file in your python script, one line at a time, check for the word being included or not. Multiprocessing won't help you here.
If I were you, I'd write something like this:
if __name__ == "__main__":
wordlist = set()
wordlist_name = input("Enter the File location : ")
with open( wordlist_name ) as fin :
wordlist = set( fin.read().split() )
if 'hello123' in wordlist :
print( "Found" )
else :
print("String Not found")
答案2
得分: -1
If you need to choose between multiprocessing and multi-threading you may be better off using multiprocessing in this case. Your program would then look something like this-
from multiprocessing import Pool
def process_line(line):
if line.strip() == "hello123":
return True
return False
if __name__ == "__main__":
pool = Pool(4)
wordlist = input("Enter the File location : ")
with open(wordlist) as source_file:
results = pool.map(process_line, source_file, 4)
if True in results:
print("Found")
else:
print("string not found")
Though depending on your system this might not help at all since in this case your reading from a file you tend to be I/O bound.
英文:
If you need to choose between multiprocessing and multi-threading you may be better off using multiprocessing in this case. Your program would then look something like this-
from multiprocessing import Pool
def process_line(line):
if(line.strip() == "hello123"):
return True
return False
if __name__ == "__main__":
pool = Pool(4)
wordlist = input("Enter the File location : ")
with open(wordlist) as source_file:
results = pool.map(process_line, source_file, 4)
if True in results:
print("Found")
else:
print("string not found")
Though depending on your system this might not help at all since in this case your reading from a file you tend to be I/O bound
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论