问题

我目前正在尝试使用pandas来对网站进行表格的网络抓取，但在其中一个链接上出现了以下错误。

以下是导致崩溃的代码片段：

import pandas as pd
website_df = pd.read_html("https://ballotpedia.org/Roger_Wicker")
print(website_df)

下面是我遇到的错误，有人知道如何修复吗？

Traceback (most recent call last):
  File "C:\Users\miniconda3\lib\site-packages\pandas\io\parsers\python_parser.py", line 700, in _next_line
    line = self._check_comments([self.data[self.pos]])[0]
IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\miniconda3\lib\site-packages\pandas\io\parsers\python_parser.py", line 385, in _infer_columns
    line = self._next_line()
  File "C:\Users\miniconda3\lib\site-packages\pandas\io\parsers\python_parser.py", line 713, in _next_line
    raise StopIteration
StopIteration

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "c:\Users\legislators-current.py", line 15, in <module>
    website_df = pd.read_html("https://ballotpedia.org/Roger_Wicker")
  File "C:\Users\miniconda3\lib\site-packages\pandas\util\_decorators.py", line 331, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\miniconda3\lib\site-packages\pandas\io\html.py", line 1205, in read_html
    return _parse(
  File "C:\Users\miniconda3\lib\site-packages\pandas\io\html.py", line 1011, in _parse
    df = _data_to_frame(data=table, **kwargs)
  File "C:\Users\miniconda3\lib\site-packages\pandas\io\html.py", line 890, in _data_to_frame
    with TextParser(body, header=header, **kwargs) as tp:
  File "C:\Users\miniconda3\lib\site-packages\pandas\io\parsers\readers.py", line 1876, in TextParser
    return TextFileReader(*args, **kwds)
  File "C:\Users\miniconda3\lib\site-packages\pandas\io\parsers\readers.py", line 1442, in __init__
    self._engine = self._make_engine(f, self.engine)
  File "C:\Users\miniconda3\lib\site-packages\pandas\io\parsers\readers.py", line 1753, in _make_engine
    return mapping[engine](f, **self.options)
  File "C:\Users\miniconda3\lib\site-packages\pandas\io\parsers\python_parser.py", line 122, in __init__
    ) = self._infer_columns()
  File "C:\Users\miniconda3\lib\site-packages\pandas\io\parsers\python_parser.py", line 395, in _infer_columns
    raise ValueError(
ValueError: Passed header=[1,2], len of 2, but only 2 lines in file

英文:

I'm currently trying to webscrape websites for tables using pandas and I get this error for one of the links.

Here's a snippet of what causes the crash:

import pandas as pd
website_df = pd.read_html(&quot;https://ballotpedia.org/Roger_Wicker&quot;)
print(website_df)

Below is the error I get, does anyone know how to fix this?

Traceback (most recent call last):
  File &quot;C:\Users\miniconda3\lib\site-packages\pandas\io\parsers\python_parser.py&quot;, line 700, in _next_line
    line = self._check_comments([self.data[self.pos]])[0]
IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File &quot;C:\Users\miniconda3\lib\site-packages\pandas\io\parsers\python_parser.py&quot;, line 385, in _infer_columns
    line = self._next_line()
  File &quot;C:\Users\miniconda3\lib\site-packages\pandas\io\parsers\python_parser.py&quot;, line 713, in _next_line
    raise StopIteration
StopIteration

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File &quot;c:\Users\legislators-current.py&quot;, line 15, in &lt;module&gt;
    website_df = pd.read_html(&quot;https://ballotpedia.org/Roger_Wicker&quot;)
  File &quot;C:\Users\miniconda3\lib\site-packages\pandas\util\_decorators.py&quot;, line 331, in wrapper
    return func(*args, **kwargs)
  File &quot;C:\Users\miniconda3\lib\site-packages\pandas\io\html.py&quot;, line 1205, in read_html
    return _parse(
  File &quot;C:\Users\miniconda3\lib\site-packages\pandas\io\html.py&quot;, line 1011, in _parse
    df = _data_to_frame(data=table, **kwargs)
  File &quot;C:\Users\miniconda3\lib\site-packages\pandas\io\html.py&quot;, line 890, in _data_to_frame
    with TextParser(body, header=header, **kwargs) as tp:
  File &quot;C:\Users\miniconda3\lib\site-packages\pandas\io\parsers\readers.py&quot;, line 1876, in TextParser
    return TextFileReader(*args, **kwds)
  File &quot;C:\Users\miniconda3\lib\site-packages\pandas\io\parsers\readers.py&quot;, line 1442, in __init__
    self._engine = self._make_engine(f, self.engine)
  File &quot;C:\Users\miniconda3\lib\site-packages\pandas\io\parsers\readers.py&quot;, line 1753, in _make_engine
    return mapping[engine](f, **self.options)
  File &quot;C:\Users\miniconda3\lib\site-packages\pandas\io\parsers\python_parser.py&quot;, line 122, in __init__
    ) = self._infer_columns()
  File &quot;C:\Users\miniconda3\lib\site-packages\pandas\io\parsers\python_parser.py&quot;, line 395, in _infer_columns
    raise ValueError(
ValueError: Passed header=[1,2], len of 2, but only 2 lines in file

答案1

得分: 0

设置 header=0。你会得到很多数据框，但你可以解析它们以获取你需要的内容。

website_df = pd.read_html("https://ballotpedia.org/Roger_Wicker", header=0)

英文:

Set header=0. You're going to get a lot of dataframes, but you can parse them to get what you need.

website_df = pd.read_html(&quot;https://ballotpedia.org/Roger_Wicker&quot;, header=0)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Pandas 网页抓取错误

问题

答案1

Go – how can i do this python code in go code?

为什么这不起作用，我已经安装了sklearn。当我尝试执行它时，它显示错误。

“`python Django获取现在和DateTimeField之间的持续时间 “`

TypeError: 类型为Properties的对象不可JSON序列化 (Sagemaker管道)

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论