英文:
How to extract table from website using python3
问题
I want to get/export table from https://www.ethernodes.org/nodes
to a txt file to access with bashscript.
OpenAI help me with this Python3 code but it get nothing
import requests
from bs4 import BeautifulSoup
url = 'https://www.ethernodes.org/nodes?page=8'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
host_ips = []
node_list = soup.find('ul', class_='nodes-list')
if node_list is not None:
for li in node_list.find_all('li'):
host_ip = li.find('div', class_='node-host').text.strip()
host_ips.append(host_ip)
print(host_ips)
英文:
I want to get/export table from https://www.ethernodes.org/nodes
to a txt file to access with bashscript.
OpenAI help me with this Python3 code but it get nothing
import requests
from bs4 import BeautifulSoup
url = 'https://www.ethernodes.org/nodes?page=8'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
host_ips = []
node_list = soup.find('ul', class_='nodes-list')
if node_list is not None:
for li in node_list.find_all('li'):
host_ip = li.find('div', class_='node-host').text.strip()
host_ips.append(host_ip)
print(host_ips)
答案1
得分: 1
以下是您可以获取数据并将其转储到.csv
文件的代码部分:
import time
import pandas as pd
import requests
url = "https://www.ethernodes.org/data?"
payload = {
"draw": "2",
"columns[0][data]": "id",
"columns[0][name]": "",
"columns[0][searchable]": "true",
"columns[0][orderable]": "true",
"columns[0][search][value]": "",
"columns[0][search][regex]": "false",
"columns[1][data]": "host",
"columns[1][name]": "",
"columns[1][searchable]": "true",
"columns[1][orderable]": "true",
"columns[1][search][value]": "",
"columns[1][search][regex]": "false",
"columns[2][data]": "isp",
"columns[2][name]": "",
"columns[2][searchable]": "true",
"columns[2][orderable]": "true",
"columns[2][search][value]": "",
"columns[2][search][regex]": "false",
"columns[3][data]": "country",
"columns[3][name]": "",
"columns[3][searchable]": "true",
"columns[3][orderable]": "true",
"columns[3][search][value]": "",
"columns[3][search][regex]": "false",
"columns[4][data]": "client",
"columns[4][name]": "",
"columns[4][searchable]": "true",
"columns[4][orderable]": "true",
"columns[4][search][value]": "",
"columns[4][search][regex]": "false",
"columns[5][data]": "clientVersion",
"columns[5][name]": "",
"columns[5][searchable]": "true",
"columns[5][orderable]": "true",
"columns[5][search][value]": "",
"columns[5][search][regex]": "false",
"columns[6][data]": "os",
"columns[6][name]": "",
"columns[6][searchable]": "true",
"columns[6][orderable]": "true",
"columns[6][search][value]": "",
"columns[6][search][regex]": "false",
"columns[7][data]": "lastUpdate",
"columns[7][name]": "",
"columns[7][searchable]": "true",
"columns[7][orderable]": "true",
"columns[7][search][value]": "",
"columns[7][search][regex]": "false",
"columns[8][data]": "inSync",
"columns[8][name]": "",
"columns[8][searchable]": "true",
"columns[8][orderable]": "true",
"columns[8][search][value]": "",
"columns[8][search][regex]": "false",
"order[0][column]": "0",
"order[0][dir]": "asc",
"start": "0",
"length": "100",
"search[value]": "",
"search[regex]": "false",
"_": time.time()
}
headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36 Edg/112.0.1722.48",
"Accept": "application/json, text/javascript, */*; q=0.01",
"X-Requested-With": "XMLHttpRequest",
}
data = requests.get(url, headers=headers, params=payload).json()["data"]
df = pd.DataFrame(data)
df.to_csv("nodes.csv", index=False)
输出:
如果您只需要主机IP
,可以添加以下内容:
hosts = df["host"].values
with open("hosts.txt", "w") as f:
f.write("\n".join(hosts))
然后您可以执行:
$ cat hosts.txt
英文:
Here's how you can get the data and dump it to a .csv
file.
import time
import pandas as pd
import requests
url = "https://www.ethernodes.org/data?"
payload = {
"draw": "2",
"columns[0][data]": "id",
"columns[0][name]": "",
"columns[0][searchable]": "true",
"columns[0][orderable]": "true",
"columns[0][search][value]": "",
"columns[0][search][regex]": "false",
"columns[1][data]": "host",
"columns[1][name]": "",
"columns[1][searchable]": "true",
"columns[1][orderable]": "true",
"columns[1][search][value]": "",
"columns[1][search][regex]": "false",
"columns[2][data]": "isp",
"columns[2][name]": "",
"columns[2][searchable]": "true",
"columns[2][orderable]": "true",
"columns[2][search][value]": "",
"columns[2][search][regex]": "false",
"columns[3][data]": "country",
"columns[3][name]": "",
"columns[3][searchable]": "true",
"columns[3][orderable]": "true",
"columns[3][search][value]": "",
"columns[3][search][regex]": "false",
"columns[4][data]": "client",
"columns[4][name]": "",
"columns[4][searchable]": "true",
"columns[4][orderable]": "true",
"columns[4][search][value]": "",
"columns[4][search][regex]": "false",
"columns[5][data]": "clientVersion",
"columns[5][name]": "",
"columns[5][searchable]": "true",
"columns[5][orderable]": "true",
"columns[5][search][value]": "",
"columns[5][search][regex]": "false",
"columns[6][data]": "os",
"columns[6][name]": "",
"columns[6][searchable]": "true",
"columns[6][orderable]": "true",
"columns[6][search][value]": "",
"columns[6][search][regex]": "false",
"columns[7][data]": "lastUpdate",
"columns[7][name]": "",
"columns[7][searchable]": "true",
"columns[7][orderable]": "true",
"columns[7][search][value]": "",
"columns[7][search][regex]": "false",
"columns[8][data]": "inSync",
"columns[8][name]": "",
"columns[8][searchable]": "true",
"columns[8][orderable]": "true",
"columns[8][search][value]": "",
"columns[8][search][regex]": "false",
"order[0][column]": "0",
"order[0][dir]": "asc",
"start": "0",
"length": "100",
"search[value]": "",
"search[regex]": "false",
"_": time.time()
}
headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36 Edg/112.0.1722.48",
"Accept": "application/json, text/javascript, */*; q=0.01",
"X-Requested-With": "XMLHttpRequest",
}
data = requests.get(url, headers=headers, params=payload).json()["data"]
df = pd.DataFrame(data)
df.to_csv("nodes.csv", index=False)
Output:
And if all you need is the hosts IPs
add this:
hosts = df["host"].values
with open("hosts.txt", "w") as f:
f.write("\n".join(hosts))
Then you can
$ cat hosts.txt
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论