2023年3月7日 02:14:28go评论169阅读模式

英文:

find a text with pattern from google lens response using regex

问题

我正在尝试通过上传图像从Google Lens获取学习驾驶执照号码，但是我的正则表达式不起作用，因为执照号码以以下模式出现：

``KL 14 /0000007/2023``

``KL14 /0000007/2023``

``KL 14/0000007/2023``

``KL 14 /0000007/ 2023``

这意味着可能会有空格，也可能没有。我的正则表达式是：`KL *[0-9]{1}.*\/.*[0-9]{1}.*\/.*[0-9]{1}.*`，但它不起作用。

我的代码如下：

```python
from lxml.html import soupparser
import re
import os
import requests

# ...（你的代码继续）

phoneNumRegex2 = re.compile(r'KL *[0-9]{1}.*\/.*[0-9]{1}.*\/.*[0-9]+')
mo = phoneNumRegex2.search(str(r4))
print(mo.group())

Google Lens的响应是：

"text:0:e90nKYDCi5I\u003d"],6,[]]],[[],3]]]],[]],[[],null,null,"en",[[["FORM 3 [See Rule 3(a) a",,"LEARNER'S L",,"Application No... 394442223",,"Learner's Licence",,"KL 14 /0002707/2023",,"Issue Date.....",,"1. Name",,"SATHEESAN U",,"2. Father's Name",,"CHOUKAR K",,"Date of Birth",,"07-03-1984"]],"Ad7f3FjZKr2A8ovUoig+fwJqhVKxG6sbvcjciTQV+KzOBTZf2VGydPYtpIkEMPU6sQyWL+Ad8/Vjl0/OV0izP/oXCluFA2xNbzAktl3KxaOVnfyvyS3kTwHv",[1678139279,21105500

以上内容中包含了我需要获取的学习驾驶执照。

输出为None值。

我将提供附加的样本图像[点击这里查看图像描述](https://i.stack.imgur.com/QclbX.jpg) [点击这里查看图像描述](https://i.stack.imgur.com/xOOAl.jpg)

英文:

i am trying to get learner license number from google lens by uploading image
but my regex is not working as
license number are appearing in following patterns

KL 14 /0000007/2023

KL14 /0000007/2023

KL 14/0000007/2023

KL 14 /0000007/ 2023

etc
which means there may be space between or may not

my regex is KL [0-9]{1}./.[0-9]{1}./.[0-9]{1}.
but it is not working

my code
`from lxml.html import soupparser
import re
import os
import requests
folder_dir = os.getcwd()
for images in os.listdir(folder_dir):
try:

    # check if the image end swith png or jpg or jpeg
    if (images.endswith(&quot;.png&quot;) or images.endswith(&quot;.jpg&quot;) \
            or images.endswith(&quot;.jpeg&quot;)):


        proxy = &#39;127.0.0.1:8080&#39;
        os.environ[&#39;http_proxy&#39;] = proxy
        os.environ[&#39;HTTP_PROXY&#39;] = proxy
        os.environ[&#39;https_proxy&#39;] = proxy
        os.environ[&#39;HTTPS_PROXY&#39;]= proxy
        os.environ[&#39;REQUESTS_CA_BUNDLE&#39;] = &quot;C:\\Users\\User\\Desktop\\cacert.pem&quot;


        print(&quot;-------------------------------------------------------------------------------------&quot;)
        print(images)
        print(&quot;\n&quot;)
        captchaurl = &#39;https://lens.google.com/upload?ep=ccm&amp;s=csp&amp;st=1653142987619&#39;
        encoded_image = {&#39;encoded_image&#39;: open(images, &#39;rb&#39;)}
        burp0cap_headers = {&quot;Cache-Control&quot;: &quot;max-age=0&quot;, &quot;Upgrade-Insecure-Requests&quot;: &quot;1&quot;,
                            &quot;User-Agent&quot;: &quot;Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.67 Safari/537.36&quot;,
                            &quot;Origin&quot;: &quot;null&quot;,
                            &quot;Accept&quot;: &quot;text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9&quot;,
                            &quot;Sec-Gpc&quot;: &quot;1&quot;, &quot;Sec-Fetch-Site&quot;: &quot;none&quot;,
                            &quot;Sec-Fetch-Mode&quot;: &quot;navigate&quot;, &quot;Sec-Fetch-User&quot;: &quot;?1&quot;,
                            &quot;Sec-Fetch-Dest&quot;: &quot;document&quot;, &quot;Accept-Encoding&quot;: &quot;gzip, deflate&quot;,
                            &quot;Accept-Language&quot;: &quot;en-GB,en-US;q=0.9,en;q=0.8&quot;}
        rlens = requests.post(captchaurl, files=encoded_image, headers=burp0cap_headers,
                              allow_redirects=True)
        DATA000 = str(rlens.content)
        # print(DATA000)
        root = soupparser.fromstring(DATA000)
        result_url = root.xpath(&#39;//meta[@http-equiv=&quot;refresh&quot;]/@content&#39;)
        result_url = str(result_url[0])
        url2 = result_url.split(&#39;URL=&#39;)
        finalurl = str(url2[1])
        # print(finalurl)
        burp1cap_headers = {
            &quot;User-Agent&quot;: &quot;Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.67 Safari/537.36&quot;,
            &quot;Accept-Encoding&quot;: &quot;gzip, deflate&quot;,
            &quot;Accept&quot;: &quot;text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9&quot;,
            &quot;Cache-Control&quot;: &quot;max-age=0&quot;, &quot;Upgrade-Insecure-Requests&quot;: &quot;1&quot;, &quot;Origin&quot;: &quot;null&quot;,
            &quot;Sec-Gpc&quot;: &quot;1&quot;, &quot;Sec-Fetch-Site&quot;: &quot;none&quot;, &quot;Sec-Fetch-Mode&quot;: &quot;navigate&quot;,
            &quot;Sec-Fetch-User&quot;: &quot;?1&quot;, &quot;Sec-Fetch-Dest&quot;: &quot;document&quot;,
            &quot;Accept-Language&quot;: &quot;en-GB,en-US;q=0.9,en;q=0.8&quot;}
        r2 = requests.get(finalurl, headers=burp1cap_headers)
        r3 = str(r2.text)

        r4 = r3.replace(&#39;&quot;&#39;, &#39;&#39;)
        #print(r4)

        phoneNumRegex2 = re.compile(r&#39;KL *[0-9]{1}.*\/.*[0-9]{1}.*\/.*[0-9]+&#39;)
       
        mo = phoneNumRegex2.search(str(r4))
        print(mo.group())
        
except Exception as e:
    print(e)`

response of google lens is

"text:0:e90nKYDCi5I\u003d"],6,[]]],[[],3]]]],[]],[[],null,null,"en",[[["FORM 3 [See Rule 3(a) a","LEARNER'S L","Application No... 394442223","Learner's Licence","KL 14 /0002707/2023","Issue Date.....","1. Name","SATHEESAN U","2. Father's Name","CHOUKAR K","Date of Birth","07-03-1984"]],"Ad7f3FjZKr2A8ovUoig+fwJqhVKxG6sbvcjciTQV+KzOBTZf2VGydPYtpIkEMPU6sQyWL+Ad8/Vjl0/OV0izP/oXCluFA2xNbzAktl3KxaOVnfyvyS3kTwHv",[1678139279,21105500

something including above
i need to get learner licnese from above response

output gives none vaule

i will provide sample images as attacthedenter image description here

答案1

得分: 1

这个正则表达式考虑了元素之间的空白字符：

KL\s*\d+\s*/\s*\d+\s*/\s*\d+

\s* 表示零个或多个空白字符。然后，使用 \d+ 匹配所有数字，表示一个或多个数字 - 你的正则表达式中错误地只匹配了一个数字。

正则表达式101 演示/解释

英文:

This regex considers whitespace between any of the elements:

KL\s*\d+\s*/\s*\d+\s*/\s*\d+

\s* means zero or more whitespace characters. Then you match all the digits with \d+, which means one or more digit - you matched only 1 digit incorrectly with your regex.

Regex101 playground/explantation

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用正则表达式从Google Lens响应中找到符合模式的文本。

问题

答案1

无法在WSL中安装或升级Python 3.10.8。

Python TypeError: ‘NoneType’ 对象不可迭代 – 是什么导致了这个错误？

Regex for comma separated time in HH:mm or HH:mm:ss in Java e.g. 12:00,13:03:21

合并/连接Notepad++中基于分隔符的行。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论