实现多线程在Python中读取文件中的行并检查是否匹配给定的字符串。

huangapple go评论102阅读模式
英文:

Implementing multi threading in python to to read the lines in a file and check whether the line matches the given string

问题

我正在尝试在Python3中实现多线程以读取特定文件中的行并检查是否与给定字符串匹配。这让我感到有些困惑。请帮助我解决这个问题,还告诉我如果在这里使用多进程而不是多线程。请检查下面的代码

还有一件事。我正在尝试在基于Linux的系统上实现这个

import threading 
from subprocess import check_output

def thread_task(line): 

    try:
        if line=="hello123":
            print("Found")
            init_.close()
            exit()
    except:
        pass

def check_point(lock,file1):
    lock.acquire()
    print(file1)
    with open(file1,"r",encoding="utf-8",errors = "ignore") as data:
        for line in data:
            line = line[:-1]
            thread_task(line)
    lock.release()

def main_task(wordlist):
    lock = threading.Lock() 

    # creating threads
    t = list(range(0,len(div)))
    for i in t: 
        t[i] = threading.Thread(target=check_point, args=(lock,div[i],))

    # start threads 
    for i in range(0,len(div)):	
        t[i].start()

    # wait until threads finish their job
    for i in range(0,len(div)):	
        t[i].join()

if __name__ == "__main__": 
    wordlist = input("Enter the File location : ")
    div = check_output(['split','-l','100000',wordlist,"Temp/"])
    div = check_output(["ls",'Temp/']).decode("utf").split("\n")
    div = div[:-1]	
    main_task(div)
    print("String Not found")

希望这对你有所帮助。

英文:

I am trying to implement multi-threading in python3 to read the lines in a particular file and check whether the line matches the given string. This is causing me some confusion. Please help me with this and also let me know if multi-processing used here instead of multiprocessing. Please check the code below

One more thing. I am trying to implement this on Linux based system

import threading 
from subprocess import check_output

def thread_task(line): 

	 
	try:
		if line=="hello123":
		    print("Found")
		    init_.close()
		    exit()
	except:
                                #print("Somethings gotta")
		pass
	

def check_point(lock,file1):
	lock.acquire()
	print(file1)
	with open(file1,"r",encoding="utf-8",errors = "ignore") as data:
			for line in data:
				line = line[:-1]
				thread_task(line)
	lock.release()

def main_task(wordlist):
	lock = threading.Lock() 

	# creating threads
	t = list(range(0,len(div)))
	for i in t: 
			t[i] = threading.Thread(target=check_point, args=(lock,div[i],))
	
	# start threads 
	for i in range(0,len(div)):	
		t[i].start()

	# wait until threads finish their job
	for i in range(0,len(div)):	
		t[i].join()

if __name__ == "__main__": 
	wordlist = input("Enter the File location : ")
	div = check_output(['split','-l','100000',wordlist,"Temp/"])
	div = check_output(["ls",'Temp/']).decode("utf").split("\n")
	div = div[:-1]	
	main_task(div)
	print("String Not found")

答案1

得分: 1

In most OS the bottleneck is the disk access. Therefore splitting your large file into many small files to read them simultaneously is just a waste of time.

Please, read your file in your python script, one line at a time, check for the word being included or not. Multiprocessing won't help you here.


If I were you, I'd write something like this:

if __name__ == "__main__":
    wordlist = set()
    wordlist_name = input("Enter the File location: ")
    with open(wordlist_name) as fin:
        wordlist = set(fin.read().split())

    if 'hello123' in wordlist:
        print("Found")
    else:
        print("String Not found")
英文:

In most OS the bottleneck is the disk access. Therefore splitting your large file into many small files to read them simultaneously is just a waste of time.

Please, read your file in your python script, one line at a time, check for the word being included or not. Multiprocessing won't help you here.


If I were you, I'd write something like this:

if __name__ == "__main__":
    wordlist = set()
    wordlist_name = input("Enter the File location : ")
    with open( wordlist_name ) as fin :
        wordlist = set( fin.read().split() )

    if 'hello123' in wordlist :
        print( "Found" )
    else :
        print("String Not found")

答案2

得分: -1

If you need to choose between multiprocessing and multi-threading you may be better off using multiprocessing in this case. Your program would then look something like this-

from multiprocessing import Pool

def process_line(line):
    if line.strip() == "hello123":
        return True
    return False

if __name__ == "__main__":
    pool = Pool(4)
    wordlist = input("Enter the File location : ")
    with open(wordlist) as source_file:
        results = pool.map(process_line, source_file, 4)
    if True in results:
        print("Found")
    else:
        print("string not found")

Though depending on your system this might not help at all since in this case your reading from a file you tend to be I/O bound.

英文:

If you need to choose between multiprocessing and multi-threading you may be better off using multiprocessing in this case. Your program would then look something like this-

from multiprocessing import Pool

def process_line(line):
	if(line.strip() == "hello123"):
		return True
	return False

if __name__ == "__main__":
	pool = Pool(4)
	wordlist = input("Enter the File location : ")
	with open(wordlist) as source_file:
		results = pool.map(process_line, source_file, 4)
    if True in results:
        print("Found")
    else:
        print("string not found")

Though depending on your system this might not help at all since in this case your reading from a file you tend to be I/O bound

huangapple
  • 本文由 发表于 2020年1月6日 16:50:33
  • 转载请务必保留本文链接:https://go.coder-hub.com/59609084.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定