英文:
Pandas Webscraping Errors
问题
我目前正在尝试使用pandas来对网站进行表格的网络抓取,但在其中一个链接上出现了以下错误。
以下是导致崩溃的代码片段:
import pandas as pd
website_df = pd.read_html("https://ballotpedia.org/Roger_Wicker")
print(website_df)
下面是我遇到的错误,有人知道如何修复吗?
Traceback (most recent call last):
File "C:\Users\miniconda3\lib\site-packages\pandas\io\parsers\python_parser.py", line 700, in _next_line
line = self._check_comments([self.data[self.pos]])[0]
IndexError: list index out of range
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\miniconda3\lib\site-packages\pandas\io\parsers\python_parser.py", line 385, in _infer_columns
line = self._next_line()
File "C:\Users\miniconda3\lib\site-packages\pandas\io\parsers\python_parser.py", line 713, in _next_line
raise StopIteration
StopIteration
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "c:\Users\legislators-current.py", line 15, in <module>
website_df = pd.read_html("https://ballotpedia.org/Roger_Wicker")
File "C:\Users\miniconda3\lib\site-packages\pandas\util\_decorators.py", line 331, in wrapper
return func(*args, **kwargs)
File "C:\Users\miniconda3\lib\site-packages\pandas\io\html.py", line 1205, in read_html
return _parse(
File "C:\Users\miniconda3\lib\site-packages\pandas\io\html.py", line 1011, in _parse
df = _data_to_frame(data=table, **kwargs)
File "C:\Users\miniconda3\lib\site-packages\pandas\io\html.py", line 890, in _data_to_frame
with TextParser(body, header=header, **kwargs) as tp:
File "C:\Users\miniconda3\lib\site-packages\pandas\io\parsers\readers.py", line 1876, in TextParser
return TextFileReader(*args, **kwds)
File "C:\Users\miniconda3\lib\site-packages\pandas\io\parsers\readers.py", line 1442, in __init__
self._engine = self._make_engine(f, self.engine)
File "C:\Users\miniconda3\lib\site-packages\pandas\io\parsers\readers.py", line 1753, in _make_engine
return mapping[engine](f, **self.options)
File "C:\Users\miniconda3\lib\site-packages\pandas\io\parsers\python_parser.py", line 122, in __init__
) = self._infer_columns()
File "C:\Users\miniconda3\lib\site-packages\pandas\io\parsers\python_parser.py", line 395, in _infer_columns
raise ValueError(
ValueError: Passed header=[1,2], len of 2, but only 2 lines in file
英文:
I'm currently trying to webscrape websites for tables using pandas and I get this error for one of the links.
Here's a snippet of what causes the crash:
import pandas as pd
website_df = pd.read_html("https://ballotpedia.org/Roger_Wicker")
print(website_df)
Below is the error I get, does anyone know how to fix this?
Traceback (most recent call last):
File "C:\Users\miniconda3\lib\site-packages\pandas\io\parsers\python_parser.py", line 700, in _next_line
line = self._check_comments([self.data[self.pos]])[0]
IndexError: list index out of range
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\miniconda3\lib\site-packages\pandas\io\parsers\python_parser.py", line 385, in _infer_columns
line = self._next_line()
File "C:\Users\miniconda3\lib\site-packages\pandas\io\parsers\python_parser.py", line 713, in _next_line
raise StopIteration
StopIteration
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "c:\Users\legislators-current.py", line 15, in <module>
website_df = pd.read_html("https://ballotpedia.org/Roger_Wicker")
File "C:\Users\miniconda3\lib\site-packages\pandas\util\_decorators.py", line 331, in wrapper
return func(*args, **kwargs)
File "C:\Users\miniconda3\lib\site-packages\pandas\io\html.py", line 1205, in read_html
return _parse(
File "C:\Users\miniconda3\lib\site-packages\pandas\io\html.py", line 1011, in _parse
df = _data_to_frame(data=table, **kwargs)
File "C:\Users\miniconda3\lib\site-packages\pandas\io\html.py", line 890, in _data_to_frame
with TextParser(body, header=header, **kwargs) as tp:
File "C:\Users\miniconda3\lib\site-packages\pandas\io\parsers\readers.py", line 1876, in TextParser
return TextFileReader(*args, **kwds)
File "C:\Users\miniconda3\lib\site-packages\pandas\io\parsers\readers.py", line 1442, in __init__
self._engine = self._make_engine(f, self.engine)
File "C:\Users\miniconda3\lib\site-packages\pandas\io\parsers\readers.py", line 1753, in _make_engine
return mapping[engine](f, **self.options)
File "C:\Users\miniconda3\lib\site-packages\pandas\io\parsers\python_parser.py", line 122, in __init__
) = self._infer_columns()
File "C:\Users\miniconda3\lib\site-packages\pandas\io\parsers\python_parser.py", line 395, in _infer_columns
raise ValueError(
ValueError: Passed header=[1,2], len of 2, but only 2 lines in file
答案1
得分: 0
设置 header=0
。你会得到很多数据框,但你可以解析它们以获取你需要的内容。
website_df = pd.read_html("https://ballotpedia.org/Roger_Wicker", header=0)
英文:
Set header=0
. You're going to get a lot of dataframes, but you can parse them to get what you need.
website_df = pd.read_html("https://ballotpedia.org/Roger_Wicker", header=0)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论