使用Python Selenium捕获binance.com网站中所有位于tr标签内的数据。

huangapple go评论68阅读模式
英文:

Capture all data in tr tag within binance.com using Python Selenium

问题

无法读取Binance期货页面上tbody标签中的所有数据,使用Python的Selenium。我尝试抓取这个链接:https://www.binance.com/en/futures-activity/leaderboard/user/um?encryptedUid=14507FCBFF9FBE584EDDEC628C4593B8

我使用了以下命令:

tr = driver.find_elements(By.TAG_NAME, 'tbody')

但没有文本输出。

我试图将tbody标签下的所有tr标签中的数据存储在数组或列表对象中。我还需要知道链接中有多少个tr标签。

英文:

I am unable to read all the data in the tbody tag on the Binance futures page using python selenium. I try to scrape this link: https://www.binance.com/en/futures-activity/leaderboard/user/um?encryptedUid=14507FCBFF9FBE584EDDEC628C4593B8

I used to command below:

tr = driver.find_elements(By.TAG_NAME,'tbody')

but there is no text output.

I'm trying to get all the data in the tr tags under the tbody tag in an array or an list object. I also need to know how many tr tag in the link.

答案1

得分: 1

为了获取<tbody>标签内的所有<tr>标签中的数据,您需要使用WebDriverWait来等待visibility_of_all_elements_located(),并且您可以使用以下任一定位策略之一:

  • 使用_CSS_SELECTOR_和get_attribute("innerHTML")
driver.get("https://www.binance.com/en/futures-activity/leaderboard/user/um?encryptedUid=14507FCBFF9FBE584EDDEC628C4593B8")
elements = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "tbody.bn-table-tbody tr")))
for element in elements:
    print(element.get_attribute("textContent"))
driver.quit()
  • 使用_XPATH_和_text_属性:
driver.get("https://www.binance.com/en/futures-activity/leaderboard/user/um?encryptedUid=14507FCBFF9FBE584EDDEC628C4593B8")
elements = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//tbody[@class='bn-table-tbody']//tr")))
for element in elements:
    print(element.get_attribute("textContent"))
driver.quit()
  • 控制台输出:
SOLUSDT Perpetual Short20x703621.952020.65609,118.79 (125.4862%)2023-03-03 19:31:49Trade
ETHUSDT Perpetual Short30x385.3831,562.541,568.19-2,176.72 (-10.8052%)2023-03-05 04:03:30Trade
EOSUSDT Perpetual Short20x138526.51.2721.2078,996.67 (107.6456%)2023-03-04 05:12:13Trade
COCOSUSDT Perpetual Short10x33878.52.2631201.58500022,973.69 (427.8359%)2023-03-03 06:50:52Trade
SSVUSDT Perpetual Short10x1010.344.25224938.0900006,225.72 (161.7813%)2023-03-03 20:05:15Trade
  • 注意:您需要添加以下导入语句:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
英文:

To get all the data in the &lt;tr&gt; tags within the &lt;tbody&gt; tag in a list object you need to induce WebDriverWait for visibility_of_all_elements_located() and you can use either of the following Locator Strategies:

  • Using CSS_SELECTOR and get_attribute(&quot;innerHTML&quot;):

    driver.get(&quot;https://www.binance.com/en/futures-activity/leaderboard/user/um?encryptedUid=14507FCBFF9FBE584EDDEC628C4593B8&quot;)
    elements = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, &quot;tbody.bn-table-tbody tr&quot;)))
    for element in elements:
    	print(element.get_attribute(&quot;textContent&quot;))
    driver.quit()
    
  • Using XPATH and text attribute:

    driver.get(&quot;https://www.binance.com/en/futures-activity/leaderboard/user/um?encryptedUid=14507FCBFF9FBE584EDDEC628C4593B8&quot;)
    elements = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, &quot;//tbody[@class=&#39;bn-table-tbody&#39;]//tr&quot;)))
    for element in elements:
    	print(element.get_attribute(&quot;textContent&quot;))
    driver.quit()
    
  • Console Output:

    SOLUSDT Perpetual Short20x703621.952020.65609,118.79&#160;(125.4862%)2023-03-03 19:31:49Trade
    ETHUSDT Perpetual Short30x385.3831,562.541,568.19-2,176.72&#160;(-10.8052%)2023-03-05 04:03:30Trade
    EOSUSDT Perpetual Short20x138526.51.2721.2078,996.67&#160;(107.6456%)2023-03-04 05:12:13Trade
    COCOSUSDT Perpetual Short10x33878.52.2631201.58500022,973.69&#160;(427.8359%)2023-03-03 06:50:52Trade
    SSVUSDT Perpetual Short10x1010.344.25224938.0900006,225.72&#160;(161.7813%)2023-03-03 20:05:15Trade
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

答案2

得分: 1

以下是代码的翻译部分:

from bs4 import BeautifulSoup
from selenium import webdriver
import time
from selenium.webdriver.chrome.options import Options

url = f'https://www.binance.com/en/futures-activity/leaderboard/user/um?encryptedUid=14507FCBFF9FBE584EDDEC628C4593B8'

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) "
                  "Chrome/93.0.4577.82 Safari/537.36",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,"
              "application/signed-exchange;v=b3;q=0.9",
}


def get_result(url, headers):
    chrome_options = Options()
    options = webdriver.ChromeOptions()
    options.add_argument('headless')
    options.add_argument('--no-sandbox')
    driver = webdriver.Chrome(options=chrome_options, executable_path=".../chromedriver_linux64/chromedriver") # 填写你的chromedriver的路径
    driver.get(url)
    time.sleep(10)
    html = driver.page_source
    soup = BeautifulSoup(html, "lxml")
    tbody = soup.find('tbody', class_='bn-table-tbody')
    trs = tbody.find_all('tr')
    data = list()
    for tr in trs:
        tr_key = tr.get('data-row-key')
        if tr_key is None:
            pass
        else:
            mid_data = list()
            count = 0
            mid_data.append(f'Symbol - {tr_key}')
            tds = tr.find_all('td')
            mid_data.append(f'td_count - {len(tds)}')
            for td in tds:
                count += 1
                mid_data.append(f'td_{count} - {td.text}')
            print(mid_data)


def main():
    get_result(url=url, headers=headers)


if __name__ == "__main__":
    main()

请注意,代码的翻译已经包括在原始内容中,没有额外的信息。

英文:

For your task, you can use selenium + BeautifulSoup. Open the page in selenium, wait for the page to load, and then use the received data as a 'soup' object. First we find 'tbody', then we search for all 'tr' and for each 'tr' we find all 'td'. We extract the data and write it to the list. The first element is 'Symbol', the second is the total number of 'td' elements in the section, and then all the data from the table. Code:

from bs4 import BeautifulSoup
from selenium import webdriver
import time
from selenium.webdriver.chrome.options import Options

url = f&#39;https://www.binance.com/en/futures-activity/leaderboard/user/um?encryptedUid=14507FCBFF9FBE584EDDEC628C4593B8&#39;

headers = {
    &quot;User-Agent&quot;: &quot;Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) &quot;
                  &quot;Chrome/93.0.4577.82 Safari/537.36&quot;,
    &quot;Accept&quot;: &quot;text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,&quot;
              &quot;application/signed-exchange;v=b3;q=0.9&quot;,
}


def get_result(url, headers):
    chrome_options = Options()
    options = webdriver.ChromeOptions()
    options.add_argument(&#39;headless&#39;)
    options.add_argument(&#39;--no-sandbox&#39;)
    driver = webdriver.Chrome(options=chrome_options, executable_path=&quot;.../chromedriver_linux64/chromedriver&quot;) # the path to your chromedriver
    driver.get(url)
    time.sleep(10)
    html = driver.page_source
    soup = BeautifulSoup(html, &quot;lxml&quot;)
    tbody = soup.find(&#39;tbody&#39;, class_=&#39;bn-table-tbody&#39;)
    trs = tbody.find_all(&#39;tr&#39;)
    data = list()
    for tr in trs:
        tr_key=tr.get(&#39;data-row-key&#39;)
        if tr_key is None:
            pass
        else:
            mid_data = list()
            count=0
            mid_data.append(f&#39;Symbol - {tr_key}&#39;)
            tds = tr.find_all(&#39;td&#39;)
            mid_data.append(f&#39;td_count - {len(tds)}&#39;)
            for td in tds:
                count+=1
                mid_data.append(f&#39;td_{count} - {td.text}&#39;)
            print(mid_data)
    

def main():
    get_result(url=url, headers=headers)


if __name__ == &quot;__main__&quot;:
    main()

Will return:

[&#39;Symbol - SOLUSDT&#39;, &#39;td_count - 7&#39;, &#39;td_1 - SOLUSDT Perpetual Short20x&#39;, &#39;td_2 - 7036&#39;, &#39;td_3 - 21.9520&#39;, &#39;td_4 - 20.4050&#39;, &#39;td_5 - 10,884.64\xa0(151.6288%)&#39;, &#39;td_6 - 2023-03-03 17:01:49&#39;, &#39;td_7 - Trade&#39;]
[&#39;Symbol - ETHUSDT&#39;, &#39;td_count - 7&#39;, &#39;td_1 - ETHUSDT Perpetual Short30x&#39;, &#39;td_2 - 385.383&#39;, &#39;td_3 - 1,562.54&#39;, &#39;td_4 - 1,564.50&#39;, &#39;td_5 - -754.66\xa0(-3.7549%)&#39;, &#39;td_6 - 2023-03-05 01:33:30&#39;, &#39;td_7 - Trade&#39;]
[&#39;Symbol - EOSUSDT&#39;, &#39;td_count - 7&#39;, &#39;td_1 - EOSUSDT Perpetual Short20x&#39;, &#39;td_2 - 138526.5&#39;, &#39;td_3 - 1.272&#39;, &#39;td_4 - 1.175&#39;, &#39;td_5 - 13,383.85\xa0(164.4547%)&#39;, &#39;td_6 - 2023-03-04 02:42:13&#39;, &#39;td_7 - Trade&#39;]
[&#39;Symbol - COCOSUSDT&#39;, &#39;td_count - 7&#39;, &#39;td_1 - COCOSUSDT Perpetual Short10x&#39;, &#39;td_2 - 33878.5&#39;, &#39;td_3 - 2.263120&#39;, &#39;td_4 - 1.534000&#39;, &#39;td_5 - 24,701.49\xa0(475.3063%)&#39;, &#39;td_6 - 2023-03-03 04:20:52&#39;, &#39;td_7 - Trade&#39;]
[&#39;Symbol - SSVUSDT&#39;, &#39;td_count - 7&#39;, &#39;td_1 - SSVUSDT Perpetual Short10x&#39;, &#39;td_2 - 1010.3&#39;, &#39;td_3 - 44.252249&#39;, &#39;td_4 - 38.808423&#39;, &#39;td_5 - 5,499.90\xa0(140.2743%)&#39;, &#39;td_6 - 2023-03-03 17:35:15&#39;, &#39;td_7 - Trade&#39;]

You can process the final data as you like.

huangapple
  • 本文由 发表于 2023年3月7日 06:14:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/75656339.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定