2023年5月11日 00:02:38go评论148阅读模式

英文:

I keep Getting a IndexError that is affecting my webscraping script

问题

以下是您提供的代码的翻译部分：

from bs4 import BeautifulSoup
from urllib.request import urlopen
import requests
import pandas as pd
from time import sleep
from random import randint

name = []
address1 = []
address2 = []

def getPageResults(postcode, page):
    url = 'https://www.ukrlp.co.uk/ukrlp/ukrlp_provider.page_pls_searchProviders'
    url += '?pv_pc=' + postcode
    url += '&pn_pc_d=5'  # 设置距离：0、1、5或10英里
    url += '&pn_pageNo=' + str(page)
    url += '&pv_layout=SEARCH'

    page = urlopen(url)
    html = page.read().decode("utf-8")
    soup = BeautifulSoup(html, "html.parser")

    results = False

    for a in soup.findAll("div", attrs={"class": "govuk-grid-row"}):
        result = a.select("h2")
        if len(result) > 0:
            name.append(a.select("h2")[0].text)

        address1.append(a.select("div.govuk-body")[0].text)
        address2.append(a.select("div.govuk-body")[0].text)
        results = True

    return results

print("__________________________")

postcodes = [
    # 列出您要搜索的邮政编码
]

for postcode in postcodes:
    print(postcode)

    page = 1

    while True:
        results = getPageResults(postcode, page)
        sleep(randint(1, 3))

        if results == False:
            break
        page += 1
        print(page)

serve = pd.DataFrame({
    "name": name,
    "address1": address1,
    "address2": address2
})

df = pd.DataFrame(columns=["name", "address1", "address2"])

df = df.append(serve)

df.to_excel("Five_miles_9.xlsx", index=False)

请注意，上述代码是基于您提供的信息翻译的。如果需要任何进一步的帮助或解释，请随时告诉我。

英文:

Hi there I keep getting this index error on my code when I am webscraping and I am not sure how to quite solve it so any help with this would be very helpful. I will put a code sample in this and the error and where on the code the error is saying is affecting the code.

The highlighted sections are where the error is occurring and I will now put the error message is showing and these below are the bits of code that are appearing to have affected my script any help will be very much appreciated.

Traceback (most recent call last):
results = getPageResults(postcode, page)
in getPageResults
address1.append(a.select(&quot;div.govuk-body&quot;)[0].text)
IndexError: list index out of range

    from bs4 import BeautifulSoup
from urllib.request import urlopen
import requests
import pandas as pd
from time import sleep
from random import randint
name = []
address1 = []
address2 = []
def getPageResults(postcode, page):
url = &#39;https://www.ukrlp.co.uk/ukrlp/ukrlp_provider.page_pls_searchProviders&#39;
url += &#39;?pv_pc=&#39; + postcode
url += &#39;&amp;pn_pc_d=5&#39;  # set up distance here: 0, 1, 5 or 10 miles
url += &#39;&amp;pn_pageNo=&#39; + str(page)
url += &#39;&amp;pv_layout=SEARCH&#39;
page = urlopen(url)
html = page.read().decode(&quot;utf-8&quot;)
soup = BeautifulSoup(html, &quot;html.parser&quot;)
results = False
for a in soup.findAll(&quot;div&quot;, attrs={&quot;class&quot;: &quot;govuk-grid-row&quot;}):
result = a.select(&quot;h2&quot;)
if len(result) &gt; 0:
name.append(a.select(&quot;h2&quot;)[0].text)
address1.append(a.select(&quot;div.govuk-body&quot;)[0].text)
address2.append(a.select(&quot;div.govuk-body&quot;)[0].text)
results = True
return results
print(&quot;__________________________&quot;)
postcodes = [  # &quot;BB0&quot;, &quot;BB1&quot;, &quot;BB10&quot;, &quot;BB11&quot;, &quot;BB12&quot;, &quot;BB18&quot;, &quot;BB2&quot;, &quot;BB3&quot;, &quot;BB4&quot;, &quot;BB5&quot;, &quot;BB6&quot;, &quot;BB7&quot;,
# &quot;BB8&quot;, &quot;BB9&quot;, &quot;BB94&quot;, &quot;BD23&quot;, &quot;BL0&quot;, &quot;BL6&quot;,&quot;BL7&quot;, &quot;BL8&quot;, &quot;BL9&quot;, &quot;BN1&quot;, &quot;BN10&quot;, &quot;BN2&quot;, &quot;BN20&quot;, &quot;BN21&quot;, &quot;BN22&quot;, &quot;BN23&quot;, &quot;BN24&quot;, &quot;BN25&quot;, &quot;BN26&quot;, &quot;BN27&quot;
# &quot;BN3&quot;,
# &quot;BN4&quot;, &quot;BN41&quot;, &quot;BN42&quot;, &quot;BN45&quot;, &quot;BN50&quot;, &quot;BN51&quot;, &quot;BN52&quot;, &quot;BN6&quot;, &quot;BN7&quot;, &quot;BN8&quot;, &quot;BN88&quot;, &quot;BN9&quot;, &quot;BR1&quot;, &quot;BR3&quot;
# &quot;DE1&quot;, &quot;DE11&quot;, &quot;DE12&quot;, &quot;DE13&quot;, &quot;DE14&quot;, &quot;DE15&quot;, &quot;DE2&quot;, &quot;DE21&quot;, &quot;DE22&quot;, &quot;DE23&quot;, &quot;DE24&quot;,
# &quot;DE3&quot;, &quot;DE4&quot;, &quot;DE45&quot;, &quot;DE5&quot;, &quot;DE55&quot;, &quot;DE56&quot;,
# &quot;DE6&quot;, &quot;DE65&quot;, &quot;DE7&quot;, &quot;DE72&quot;, &quot;DE73&quot;, &quot;DE74&quot;, &quot;DE75&quot;, &quot;DE99&quot;, &quot;DN10&quot;, &quot;DN11&quot;,&quot;DN21&quot;, &quot;DN22&quot;, &quot;DN9&quot;
# &quot;FY0&quot;, &quot;FY1&quot;, &quot;FY2&quot;, &quot;FY3&quot;, &quot;FY4&quot;, &quot;FY5&quot;, &quot;FY6&quot;, &quot;FY7&quot;, &quot;FY8&quot;
# &quot;HA0&quot;, &quot;HA1&quot;,
#&quot;HA3&quot;, &quot;HA7&quot;, #&quot;HA8&quot;, &quot;HA9&quot;, &quot;HU1&quot;, &quot;HU11&quot;, &quot;HU12&quot;, &quot;HU13&quot;, &quot;HU2&quot;, &quot;HU3&quot;, &quot;HU4&quot;, &quot;HU5&quot;, &quot;HU6&quot;, &quot;HU7&quot;, &quot;HU8&quot;, &quot;HU9&quot;,
&quot;L31&quot;, #&quot;L33&quot;, &quot;L37&quot;, &quot;L39&quot;, &quot;L40&quot;, &quot;LA1&quot;, &quot;LA2&quot;, &quot;LA3&quot;, &quot;LA4&quot;, &quot;LA5&quot;, &quot;LA6&quot;, &quot;LA7&quot;, &quot;LE12&quot;, &quot;LE14&quot;, &quot;LE6&quot;, &quot;LE65&quot;, &quot;LN1&quot;, &quot;LN6&quot;
# &quot;N10&quot;, &quot;N11&quot;, &quot;N13&quot;, &quot;N15&quot;, &quot;N17&quot;, &quot;N18&quot;, &quot;N2&quot;, &quot;N22&quot;, &quot;N4&quot;, &quot;N6&quot;, &quot;N8&quot;, &quot;N81&quot;, &quot;NG1&quot;, &quot;NG10&quot;, &quot;NG11&quot;, &quot;NG12&quot;, &quot;NG13&quot;, &quot;NG14&quot;, &quot;NG15&quot;,
# &quot;NG16&quot;, &quot;NG17&quot;, &quot;NG18&quot;, &quot;NG19&quot;, &quot;NG2&quot;, &quot;NG20&quot;, &quot;NG21&quot;, &quot;NG22&quot;, &quot;NG23&quot;, &quot;NG24&quot;, &quot;NG25&quot;, &quot;NG3&quot;, &quot;NG4&quot;, &quot;NG5&quot;, &quot;NG6&quot;, &quot;NG7&quot;, &quot;NG70&quot;, &quot;NG8&quot;,
# &quot;NG80&quot;, &quot;NG9&quot;, &quot;NG90&quot;, &quot;NW10&quot;, &quot;NW2&quot;, &quot;NW26&quot;, &quot;NW6&quot;, &quot;NW8&quot;, &quot;NW9&quot;, &quot;OL12&quot;, &quot;OL13&quot;, &quot;OL14&quot;, &quot;PR0&quot;, &quot;PR1&quot;, &quot;PR11&quot;, &quot;PR2&quot;, &quot;PR25&quot;, &quot;PR26&quot;,
# &quot;PR3&quot;, &quot;PR4&quot;, &quot;PR5&quot;, &quot;PR6&quot;, &quot;PR7&quot;, &quot;PR8&quot;, &quot;PR9&quot;, &quot;RH15&quot;, &quot;RH16&quot;, &quot;RH17&quot;, &quot;RH18&quot;, &quot;RH19&quot;,
# &quot;S1&quot;, &quot;S11&quot;, &quot;S12&quot;, &quot;S17&quot;, &quot;S18&quot;, &quot;S19&quot;, &quot;S21&quot;, &quot;S26&quot;, &quot;S30&quot;, &quot;S31&quot;, &quot;S32&quot;, &quot;S33&quot;, &quot;S40&quot;, &quot;S41&quot;, &quot;S42&quot;, &quot;S43&quot;, &quot;S44&quot;, &quot;S45&quot;, &quot;S49&quot;, &quot;S8&quot;,
# &quot;S80&quot;, &quot;S81&quot;, &quot;SE10&quot;, &quot;SE12&quot;, &quot;SE13&quot;, &quot;SE14&quot;, &quot;SE15&quot;, &quot;SE16&quot;, &quot;SE23&quot;, &quot;SE26&quot;, &quot;SE3&quot;, &quot;SE4&quot;, &quot;SE6&quot;, &quot;SE8&quot;, &quot;SE9&quot;, &quot;SK12&quot;, &quot;SK13&quot;, &quot;SK14&quot;,
# &quot;SK17&quot;, &quot;SK22&quot;, &quot;SK23&quot;, &quot;ST14&quot;, &quot;TN18&quot;, &quot;TN19&quot;, &quot;TN2&quot;, &quot;TN20&quot;, &quot;TN21&quot;, &quot;TN22&quot;, &quot;TN3&quot;, &quot;TN31&quot;, &quot;TN32&quot;, &quot;TN33&quot;, &quot;TN34&quot;, &quot;TN35&quot;, &quot;TN36&quot;,
# &quot;TN37&quot;, &quot;TN38&quot;, &quot;TN39&quot;, &quot;TN40&quot;, &quot;TN5&quot;, &quot;TN6&quot;, &quot;TN7&quot;, &quot;TN8&quot;, &quot;UB6&quot;, &quot;W10&quot;, &quot;W9&quot;, &quot;WA11&quot;, &quot;WN5&quot;, &quot;WN6&quot;, &quot;WN8&quot;
]
for postcode in postcodes:
print(postcode)
page = 1
while True:
results = getPageResults(postcode, page)
sleep(randint(1, 3))
if results == False:
break
page += 1
print(page)
serve = pd.DataFrame({
&quot;name&quot;: name,
&quot;address1&quot;: address1,
&quot;address2&quot;: address2
})
df = pd.DataFrame(columns=[&quot;name&quot;, &quot;address1&quot;, &quot;address2&quot;])
df = df.append(serve)  
df.to_excel(&quot;Five_miles_9.xlsx&quot;, index=False)

答案1

得分: 1

以下是您提供的代码的翻译：

当页面中没有元素时，您没有退出循环，这就是为什么在没有元素的情况下会出现索引错误，例如在这里 `result = a.select("div.govuk-body")[0]`，因为 `result = a.select("h2")` 返回一个空数组

for a in soup.findAll("div", attrs={"class": "govuk-grid-row"}):
    result = a.select("h2")
    if len(result) > 0:
        # 仅在长度大于0时追加
        name.append(result[0].text)
        address1.append(a.select("div.govuk-body")[0].text)
        address2.append(a.select("div.govuk-body")[0].text)
        results = True
    else:
        # 由于没有更多结果，退出循环
        results = False
        break
return results

同时，在下面我已经更新了中断逻辑

for postcode in postcodes:
    print(postcode)

    page = 1
    while True:
        print(page)
        results = getPageResults(postcode, page)
        sleep(randint(1, 3))
        page += 1

        if not results:
            break

尝试这个更新后的代码，您的Excel文件应该会生成

from bs4 import BeautifulSoup
from urllib.request import urlopen
import pandas as pd
from time import sleep
from random import randint

name = []
address1 = []
address2 = []


def getPageResults(postcode, page):
    url = 'https://www.ukrlp.co.uk/ukrlp/ukrlp_provider.page_pls_searchProviders'
    url += '?pv_pc=' + postcode
    url += '&pn_pc_d=5'  # 在这里设置距离：0、1、5或10英里
    url += '&pn_pageNo=' + str(page)
    url += '&pv_layout=SEARCH'

    page = urlopen(url)
    html = page.read().decode("utf-8")
    soup = BeautifulSoup(html, "html.parser")

    results = False

    for a in soup.findAll("div", attrs={"class": "govuk-grid-row"}):
        result = a.select("h2")
        if len(result) > 0:
            name.append(result[0].text)
            address1.append(a.select("div.govuk-body")[0].text)
            address2.append(a.select("div.govuk-body")[0].text)
            results = True
        else:
            results = False
            break
    return results


print("__________________________")

postcodes = [
    "L31"
]

for postcode in postcodes:
    print(postcode)

    page = 1
    while True:
        print(page)
        results = getPageResults(postcode, page)
        sleep(randint(1, 3))
        page += 1

        if not results:
            break

serve = pd.DataFrame({
    "name": name,
    "address1": address1,
    "address2": address2
})

df = pd.DataFrame(columns=["name", "address1", "address2"])

df = pd.concat([df, serve])

df.to_excel("Five_miles_9.xlsx", index=False)

请注意，代码中的注释已被保留，以便您更容易理解代码的功能。

英文:

When the are no elements in a page you are not breaking out of it the loop
and that why you are getting Index error when there are no elements a.select("div.govuk-body")[0] here since result = a.select("h2") returns empty array

for a in soup.findAll(&quot;div&quot;, attrs={&quot;class&quot;: &quot;govuk-grid-row&quot;}):
result = a.select(&quot;h2&quot;)
if len(result) &gt; 0:
#Only Append when length &gt;0
name.append(result[0].text)
address1.append(a.select(&quot;div.govuk-body&quot;)[0].text)
address2.append(a.select(&quot;div.govuk-body&quot;)[0].text)
results = True
else:
#Since no more results breaking out
results = False
break
return results

Also in below i have updated logic to break out

 for postcode in postcodes:
print(postcode)
page = 1
while True:
print(page)
results = getPageResults(postcode, page)
sleep(randint(1, 3))
page += 1
if not results:
break

Try this Updated code your excel should be generated

from bs4 import BeautifulSoup
from urllib.request import urlopen
import pandas as pd
from time import sleep
from random import randint
name = []
address1 = []
address2 = []
def getPageResults(postcode, page):
url = &#39;https://www.ukrlp.co.uk/ukrlp/ukrlp_provider.page_pls_searchProviders&#39;
url += &#39;?pv_pc=&#39; + postcode
url += &#39;&amp;pn_pc_d=5&#39;  # set up distance here: 0, 1, 5 or 10 miles
url += &#39;&amp;pn_pageNo=&#39; + str(page)
url += &#39;&amp;pv_layout=SEARCH&#39;
page = urlopen(url)
html = page.read().decode(&quot;utf-8&quot;)
soup = BeautifulSoup(html, &quot;html.parser&quot;)
results = False
for a in soup.findAll(&quot;div&quot;, attrs={&quot;class&quot;: &quot;govuk-grid-row&quot;}):
result = a.select(&quot;h2&quot;)
if len(result) &gt; 0:
name.append(result[0].text)
address1.append(a.select(&quot;div.govuk-body&quot;)[0].text)
address2.append(a.select(&quot;div.govuk-body&quot;)[0].text)
results = True
else:
results = False
break
return results
print(&quot;__________________________&quot;)
postcodes = [  # &quot;BB0&quot;, &quot;BB1&quot;, &quot;BB10&quot;, &quot;BB11&quot;, &quot;BB12&quot;, &quot;BB18&quot;, &quot;BB2&quot;, &quot;BB3&quot;, &quot;BB4&quot;, &quot;BB5&quot;, &quot;BB6&quot;, &quot;BB7&quot;,
# &quot;BB8&quot;, &quot;BB9&quot;, &quot;BB94&quot;, &quot;BD23&quot;, &quot;BL0&quot;, &quot;BL6&quot;,&quot;BL7&quot;, &quot;BL8&quot;, &quot;BL9&quot;, &quot;BN1&quot;, &quot;BN10&quot;, &quot;BN2&quot;, &quot;BN20&quot;, &quot;BN21&quot;, &quot;BN22&quot;, &quot;BN23&quot;, &quot;BN24&quot;, &quot;BN25&quot;, &quot;BN26&quot;, &quot;BN27&quot;
# &quot;BN3&quot;,
# &quot;BN4&quot;, &quot;BN41&quot;, &quot;BN42&quot;, &quot;BN45&quot;, &quot;BN50&quot;, &quot;BN51&quot;, &quot;BN52&quot;, &quot;BN6&quot;, &quot;BN7&quot;, &quot;BN8&quot;, &quot;BN88&quot;, &quot;BN9&quot;, &quot;BR1&quot;, &quot;BR3&quot;
# &quot;DE1&quot;, &quot;DE11&quot;, &quot;DE12&quot;, &quot;DE13&quot;, &quot;DE14&quot;, &quot;DE15&quot;, &quot;DE2&quot;, &quot;DE21&quot;, &quot;DE22&quot;, &quot;DE23&quot;, &quot;DE24&quot;,
# &quot;DE3&quot;, &quot;DE4&quot;, &quot;DE45&quot;, &quot;DE5&quot;, &quot;DE55&quot;, &quot;DE56&quot;,
# &quot;DE6&quot;, &quot;DE65&quot;, &quot;DE7&quot;, &quot;DE72&quot;, &quot;DE73&quot;, &quot;DE74&quot;, &quot;DE75&quot;, &quot;DE99&quot;, &quot;DN10&quot;, &quot;DN11&quot;,&quot;DN21&quot;, &quot;DN22&quot;, &quot;DN9&quot;
# &quot;FY0&quot;, &quot;FY1&quot;, &quot;FY2&quot;, &quot;FY3&quot;, &quot;FY4&quot;, &quot;FY5&quot;, &quot;FY6&quot;, &quot;FY7&quot;, &quot;FY8&quot;
# &quot;HA0&quot;, &quot;HA1&quot;,
# &quot;HA3&quot;, &quot;HA7&quot;, #&quot;HA8&quot;, &quot;HA9&quot;, &quot;HU1&quot;, &quot;HU11&quot;, &quot;HU12&quot;, &quot;HU13&quot;, &quot;HU2&quot;, &quot;HU3&quot;, &quot;HU4&quot;, &quot;HU5&quot;, &quot;HU6&quot;, &quot;HU7&quot;, &quot;HU8&quot;, &quot;HU9&quot;,
&quot;L31&quot;,
# &quot;L33&quot;, &quot;L37&quot;, &quot;L39&quot;, &quot;L40&quot;, &quot;LA1&quot;, &quot;LA2&quot;, &quot;LA3&quot;, &quot;LA4&quot;, &quot;LA5&quot;, &quot;LA6&quot;, &quot;LA7&quot;, &quot;LE12&quot;, &quot;LE14&quot;, &quot;LE6&quot;, &quot;LE65&quot;, &quot;LN1&quot;, &quot;LN6&quot;
# &quot;N10&quot;, &quot;N11&quot;, &quot;N13&quot;, &quot;N15&quot;, &quot;N17&quot;, &quot;N18&quot;, &quot;N2&quot;, &quot;N22&quot;, &quot;N4&quot;, &quot;N6&quot;, &quot;N8&quot;, &quot;N81&quot;, &quot;NG1&quot;, &quot;NG10&quot;, &quot;NG11&quot;, &quot;NG12&quot;, &quot;NG13&quot;, &quot;NG14&quot;, &quot;NG15&quot;,
# &quot;NG16&quot;, &quot;NG17&quot;, &quot;NG18&quot;, &quot;NG19&quot;, &quot;NG2&quot;, &quot;NG20&quot;, &quot;NG21&quot;, &quot;NG22&quot;, &quot;NG23&quot;, &quot;NG24&quot;, &quot;NG25&quot;, &quot;NG3&quot;, &quot;NG4&quot;, &quot;NG5&quot;, &quot;NG6&quot;, &quot;NG7&quot;, &quot;NG70&quot;, &quot;NG8&quot;,
# &quot;NG80&quot;, &quot;NG9&quot;, &quot;NG90&quot;, &quot;NW10&quot;, &quot;NW2&quot;, &quot;NW26&quot;, &quot;NW6&quot;, &quot;NW8&quot;, &quot;NW9&quot;, &quot;OL12&quot;, &quot;OL13&quot;, &quot;OL14&quot;, &quot;PR0&quot;, &quot;PR1&quot;, &quot;PR11&quot;, &quot;PR2&quot;, &quot;PR25&quot;, &quot;PR26&quot;,
# &quot;PR3&quot;, &quot;PR4&quot;, &quot;PR5&quot;, &quot;PR6&quot;, &quot;PR7&quot;, &quot;PR8&quot;, &quot;PR9&quot;, &quot;RH15&quot;, &quot;RH16&quot;, &quot;RH17&quot;, &quot;RH18&quot;, &quot;RH19&quot;,
# &quot;S1&quot;, &quot;S11&quot;, &quot;S12&quot;, &quot;S17&quot;, &quot;S18&quot;, &quot;S19&quot;, &quot;S21&quot;, &quot;S26&quot;, &quot;S30&quot;, &quot;S31&quot;, &quot;S32&quot;, &quot;S33&quot;, &quot;S40&quot;, &quot;S41&quot;, &quot;S42&quot;, &quot;S43&quot;, &quot;S44&quot;, &quot;S45&quot;, &quot;S49&quot;, &quot;S8&quot;,
# &quot;S80&quot;, &quot;S81&quot;, &quot;SE10&quot;, &quot;SE12&quot;, &quot;SE13&quot;, &quot;SE14&quot;, &quot;SE15&quot;, &quot;SE16&quot;, &quot;SE23&quot;, &quot;SE26&quot;, &quot;SE3&quot;, &quot;SE4&quot;, &quot;SE6&quot;, &quot;SE8&quot;, &quot;SE9&quot;, &quot;SK12&quot;, &quot;SK13&quot;, &quot;SK14&quot;,
# &quot;SK17&quot;, &quot;SK22&quot;, &quot;SK23&quot;, &quot;ST14&quot;, &quot;TN18&quot;, &quot;TN19&quot;, &quot;TN2&quot;, &quot;TN20&quot;, &quot;TN21&quot;, &quot;TN22&quot;, &quot;TN3&quot;, &quot;TN31&quot;, &quot;TN32&quot;, &quot;TN33&quot;, &quot;TN34&quot;, &quot;TN35&quot;, &quot;TN36&quot;,
# &quot;TN37&quot;, &quot;TN38&quot;, &quot;TN39&quot;, &quot;TN40&quot;, &quot;TN5&quot;, &quot;TN6&quot;, &quot;TN7&quot;, &quot;TN8&quot;, &quot;UB6&quot;, &quot;W10&quot;, &quot;W9&quot;, &quot;WA11&quot;, &quot;WN5&quot;, &quot;WN6&quot;, &quot;WN8&quot;
]
for postcode in postcodes:
print(postcode)
page = 1
while True:
print(page)
results = getPageResults(postcode, page)
sleep(randint(1, 3))
page += 1
if not results:
break
serve = pd.DataFrame({
&quot;name&quot;: name,
&quot;address1&quot;: address1,
&quot;address2&quot;: address2
})
df = pd.DataFrame(columns=[&quot;name&quot;, &quot;address1&quot;, &quot;address2&quot;])
df = pd.concat([df, serve])
df.to_excel(&quot;Five_miles_9.xlsx&quot;, index=False)

答案2

得分: 0

看起来在您提供的 url 网站中，在 govuk-grid-row 下没有 govuk-body。govuk-body 实际上在另一个 div 中，也是 govuk-width-container 的子元素。也许您想要提取的是 govuk-body-1。由于找不到任何内容，a 返回一个空列表，正如上面的评论中 @DarkKnight 所说。

为了更容易调试，确保在分配之前 a 的长度不为零。可以通过添加以下内容实现：assert len(a.select("div.govuk-body")) != 0。它会在尝试分配任何内容之前使断言失败，告诉您在执行下一步之前 a 的问题。

英文:

It seems that within the website that you have as url, there is no govuk-body under govuk-grid-row. govuk-body is in a different div that is also a child of govuk-width-container. Perhaps what you meant to scrape is govuk-body-1. Since it can't find anything, a is returning an empty list, as @DarkKnight is saying in the comment above.

To make this easier to debug in the future, make sure that a doesn't have a length of zero before assigning it. You can do this by adding the following: assert len(a.select("div.govuk-body")) != 0. It will fail the assertion before trying to assign anything, telling you the problem with a before trying to execute the next step.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

我遇到了一个IndexError错误，影响了我的网络爬虫脚本。

问题

答案1

答案2

How to automatically detect corrupted Excel files generated with ArcGIS 10.6/xlswriter/SpatiaLite using Python 2.7

Prefetch optimization of tf.data doesn’t work.

不正确比较列表

在Python中复制2D数组

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论