英文:
find a text with pattern from google lens response using regex
问题
我正在尝试通过上传图像从Google Lens获取学习驾驶执照号码,但是我的正则表达式不起作用,因为执照号码以以下模式出现:
``KL 14 /0000007/2023``
``KL14 /0000007/2023``
``KL 14/0000007/2023``
``KL 14 /0000007/ 2023``
这意味着可能会有空格,也可能没有。我的正则表达式是:`KL *[0-9]{1}.*\/.*[0-9]{1}.*\/.*[0-9]{1}.*`,但它不起作用。
我的代码如下:
```python
from lxml.html import soupparser
import re
import os
import requests
# ...(你的代码继续)
phoneNumRegex2 = re.compile(r'KL *[0-9]{1}.*\/.*[0-9]{1}.*\/.*[0-9]+')
mo = phoneNumRegex2.search(str(r4))
print(mo.group())
Google Lens的响应是:
"text:0:e90nKYDCi5I\u003d"],6,[]]],[[],3]]]],[]],[[],null,null,"en",[[["FORM 3 [See Rule 3(a) a",,"LEARNER'S L",,"Application No... 394442223",,"Learner's Licence",,"KL 14 /0002707/2023",,"Issue Date.....",,"1. Name",,"SATHEESAN U",,"2. Father's Name",,"CHOUKAR K",,"Date of Birth",,"07-03-1984"]],"Ad7f3FjZKr2A8ovUoig+fwJqhVKxG6sbvcjciTQV+KzOBTZf2VGydPYtpIkEMPU6sQyWL+Ad8/Vjl0/OV0izP/oXCluFA2xNbzAktl3KxaOVnfyvyS3kTwHv",[1678139279,21105500
以上内容中包含了我需要获取的学习驾驶执照。
输出为None值。
我将提供附加的样本图像[点击这里查看图像描述](https://i.stack.imgur.com/QclbX.jpg) [点击这里查看图像描述](https://i.stack.imgur.com/xOOAl.jpg)
英文:
i am trying to get learner license number from google lens by uploading image
but my regex is not working as
license number are appearing in following patterns
KL 14 /0000007/2023
KL14 /0000007/2023
KL 14/0000007/2023
KL 14 /0000007/ 2023
etc
which means there may be space between or may not
my regex is KL [0-9]{1}./.[0-9]{1}./.[0-9]{1}.
but it is not working
my code
`from lxml.html import soupparser
import re
import os
import requests
folder_dir = os.getcwd()
for images in os.listdir(folder_dir):
try:
# check if the image end swith png or jpg or jpeg
if (images.endswith(".png") or images.endswith(".jpg") \
or images.endswith(".jpeg")):
proxy = '127.0.0.1:8080'
os.environ['http_proxy'] = proxy
os.environ['HTTP_PROXY'] = proxy
os.environ['https_proxy'] = proxy
os.environ['HTTPS_PROXY']= proxy
os.environ['REQUESTS_CA_BUNDLE'] = "C:\\Users\\User\\Desktop\\cacert.pem"
print("-------------------------------------------------------------------------------------")
print(images)
print("\n")
captchaurl = 'https://lens.google.com/upload?ep=ccm&s=csp&st=1653142987619'
encoded_image = {'encoded_image': open(images, 'rb')}
burp0cap_headers = {"Cache-Control": "max-age=0", "Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.67 Safari/537.36",
"Origin": "null",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
"Sec-Gpc": "1", "Sec-Fetch-Site": "none",
"Sec-Fetch-Mode": "navigate", "Sec-Fetch-User": "?1",
"Sec-Fetch-Dest": "document", "Accept-Encoding": "gzip, deflate",
"Accept-Language": "en-GB,en-US;q=0.9,en;q=0.8"}
rlens = requests.post(captchaurl, files=encoded_image, headers=burp0cap_headers,
allow_redirects=True)
DATA000 = str(rlens.content)
# print(DATA000)
root = soupparser.fromstring(DATA000)
result_url = root.xpath('//meta[@http-equiv="refresh"]/@content')
result_url = str(result_url[0])
url2 = result_url.split('URL=')
finalurl = str(url2[1])
# print(finalurl)
burp1cap_headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.67 Safari/537.36",
"Accept-Encoding": "gzip, deflate",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
"Cache-Control": "max-age=0", "Upgrade-Insecure-Requests": "1", "Origin": "null",
"Sec-Gpc": "1", "Sec-Fetch-Site": "none", "Sec-Fetch-Mode": "navigate",
"Sec-Fetch-User": "?1", "Sec-Fetch-Dest": "document",
"Accept-Language": "en-GB,en-US;q=0.9,en;q=0.8"}
r2 = requests.get(finalurl, headers=burp1cap_headers)
r3 = str(r2.text)
r4 = r3.replace('"', '')
#print(r4)
phoneNumRegex2 = re.compile(r'KL *[0-9]{1}.*\/.*[0-9]{1}.*\/.*[0-9]+')
mo = phoneNumRegex2.search(str(r4))
print(mo.group())
except Exception as e:
print(e)`
response of google lens is
"text:0:e90nKYDCi5I\u003d"],6,[]]],[[],3]]]],[]],[[],null,null,"en",[[["FORM 3 [See Rule 3(a) a","LEARNER'S L","Application No... 394442223","Learner's Licence","KL 14 /0002707/2023","Issue Date.....","1. Name","SATHEESAN U","2. Father's Name","CHOUKAR K","Date of Birth","07-03-1984"]],"Ad7f3FjZKr2A8ovUoig+fwJqhVKxG6sbvcjciTQV+KzOBTZf2VGydPYtpIkEMPU6sQyWL+Ad8/Vjl0/OV0izP/oXCluFA2xNbzAktl3KxaOVnfyvyS3kTwHv",[1678139279,21105500
something including above
i need to get learner licnese from above response
output gives none vaule
i will provide sample images as attacthedenter image description here
答案1
得分: 1
这个正则表达式考虑了元素之间的空白字符:
KL\s*\d+\s*/\s*\d+\s*/\s*\d+
\s*
表示零个或多个空白字符。然后,使用 \d+
匹配所有数字,表示一个或多个数字 - 你的正则表达式中错误地只匹配了一个数字。
英文:
This regex considers whitespace between any of the elements:
KL\s*\d+\s*/\s*\d+\s*/\s*\d+
\s*
means zero or more whitespace characters. Then you match all the digits with \d+
, which means one or more digit - you matched only 1 digit incorrectly with your regex.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论