使用正则表达式从Google Lens响应中找到符合模式的文本。

huangapple go评论82阅读模式
英文:

find a text with pattern from google lens response using regex

问题

我正在尝试通过上传图像从Google Lens获取学习驾驶执照号码但是我的正则表达式不起作用因为执照号码以以下模式出现

``KL 14 /0000007/2023``

``KL14 /0000007/2023``

``KL 14/0000007/2023``

``KL 14 /0000007/ 2023``

这意味着可能会有空格也可能没有我的正则表达式是:`KL *[0-9]{1}.*\/.*[0-9]{1}.*\/.*[0-9]{1}.*`,但它不起作用

我的代码如下

```python
from lxml.html import soupparser
import re
import os
import requests

# ...(你的代码继续)

phoneNumRegex2 = re.compile(r'KL *[0-9]{1}.*\/.*[0-9]{1}.*\/.*[0-9]+')
mo = phoneNumRegex2.search(str(r4))
print(mo.group())

Google Lens的响应是:

"text:0:e90nKYDCi5I\u003d"],6,[]]],[[],3]]]],[]],[[],null,null,"en",[[["FORM 3 [See Rule 3(a) a",,"LEARNER'S L",,"Application No... 394442223",,"Learner's Licence",,"KL 14 /0002707/2023",,"Issue Date.....",,"1. Name",,"SATHEESAN U",,"2. Father's Name",,"CHOUKAR K",,"Date of Birth",,"07-03-1984"]],"Ad7f3FjZKr2A8ovUoig+fwJqhVKxG6sbvcjciTQV+KzOBTZf2VGydPYtpIkEMPU6sQyWL+Ad8/Vjl0/OV0izP/oXCluFA2xNbzAktl3KxaOVnfyvyS3kTwHv",[1678139279,21105500

以上内容中包含了我需要获取的学习驾驶执照。

输出为None值。

我将提供附加的样本图像[点击这里查看图像描述](https://i.stack.imgur.com/QclbX.jpg) [点击这里查看图像描述](https://i.stack.imgur.com/xOOAl.jpg)
英文:

i am trying to get learner license number from google lens by uploading image
but my regex is not working as
license number are appearing in following patterns

KL 14 /0000007/2023

KL14 /0000007/2023

KL 14/0000007/2023

KL 14 /0000007/ 2023

etc
which means there may be space between or may not

my regex is KL [0-9]{1}./.[0-9]{1}./.[0-9]{1}.
but it is not working

my code
`from lxml.html import soupparser
import re
import os
import requests
folder_dir = os.getcwd()
for images in os.listdir(folder_dir):
try:

    # check if the image end swith png or jpg or jpeg
    if (images.endswith(".png") or images.endswith(".jpg") \
            or images.endswith(".jpeg")):


        proxy = '127.0.0.1:8080'
        os.environ['http_proxy'] = proxy
        os.environ['HTTP_PROXY'] = proxy
        os.environ['https_proxy'] = proxy
        os.environ['HTTPS_PROXY']= proxy
        os.environ['REQUESTS_CA_BUNDLE'] = "C:\\Users\\User\\Desktop\\cacert.pem"


        print("-------------------------------------------------------------------------------------")
        print(images)
        print("\n")
        captchaurl = 'https://lens.google.com/upload?ep=ccm&s=csp&st=1653142987619'
        encoded_image = {'encoded_image': open(images, 'rb')}
        burp0cap_headers = {"Cache-Control": "max-age=0", "Upgrade-Insecure-Requests": "1",
                            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.67 Safari/537.36",
                            "Origin": "null",
                            "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
                            "Sec-Gpc": "1", "Sec-Fetch-Site": "none",
                            "Sec-Fetch-Mode": "navigate", "Sec-Fetch-User": "?1",
                            "Sec-Fetch-Dest": "document", "Accept-Encoding": "gzip, deflate",
                            "Accept-Language": "en-GB,en-US;q=0.9,en;q=0.8"}
        rlens = requests.post(captchaurl, files=encoded_image, headers=burp0cap_headers,
                              allow_redirects=True)
        DATA000 = str(rlens.content)
        # print(DATA000)
        root = soupparser.fromstring(DATA000)
        result_url = root.xpath('//meta[@http-equiv="refresh"]/@content')
        result_url = str(result_url[0])
        url2 = result_url.split('URL=')
        finalurl = str(url2[1])
        # print(finalurl)
        burp1cap_headers = {
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.67 Safari/537.36",
            "Accept-Encoding": "gzip, deflate",
            "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
            "Cache-Control": "max-age=0", "Upgrade-Insecure-Requests": "1", "Origin": "null",
            "Sec-Gpc": "1", "Sec-Fetch-Site": "none", "Sec-Fetch-Mode": "navigate",
            "Sec-Fetch-User": "?1", "Sec-Fetch-Dest": "document",
            "Accept-Language": "en-GB,en-US;q=0.9,en;q=0.8"}
        r2 = requests.get(finalurl, headers=burp1cap_headers)
        r3 = str(r2.text)

        r4 = r3.replace('"', '')
        #print(r4)

        phoneNumRegex2 = re.compile(r'KL *[0-9]{1}.*\/.*[0-9]{1}.*\/.*[0-9]+')
       
        mo = phoneNumRegex2.search(str(r4))
        print(mo.group())
        
except Exception as e:
    print(e)`

response of google lens is

"text:0:e90nKYDCi5I\u003d"],6,[]]],[[],3]]]],[]],[[],null,null,"en",[[["FORM 3 [See Rule 3(a) a","LEARNER'S L","Application No... 394442223","Learner's Licence","KL 14 /0002707/2023","Issue Date.....","1. Name","SATHEESAN U","2. Father's Name","CHOUKAR K","Date of Birth","07-03-1984"]],"Ad7f3FjZKr2A8ovUoig+fwJqhVKxG6sbvcjciTQV+KzOBTZf2VGydPYtpIkEMPU6sQyWL+Ad8/Vjl0/OV0izP/oXCluFA2xNbzAktl3KxaOVnfyvyS3kTwHv",[1678139279,21105500

something including above
i need to get learner licnese from above response

output gives none vaule

i will provide sample images as attacthedenter image description here

答案1

得分: 1

这个正则表达式考虑了元素之间的空白字符:

KL\s*\d+\s*/\s*\d+\s*/\s*\d+

\s* 表示零个或多个空白字符。然后,使用 \d+ 匹配所有数字,表示一个或多个数字 - 你的正则表达式中错误地只匹配了一个数字。

正则表达式101 演示/解释

英文:

This regex considers whitespace between any of the elements:

KL\s*\d+\s*/\s*\d+\s*/\s*\d+

\s* means zero or more whitespace characters. Then you match all the digits with \d+, which means one or more digit - you matched only 1 digit incorrectly with your regex.

Regex101 playground/explantation

huangapple
  • 本文由 发表于 2023年3月7日 02:14:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/75654406.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定