问题

import requests
from requests_html import HTML, HTMLSession
from bs4 import BeautifulSoup
import pandas as pd
import csv
import json

url = 'https://lehighsports.com/sports/mens-soccer/schedule/2018'
lehigh = requests.get(url).text
soup = BeautifulSoup(lehigh, 'lxml')

opp_list = []
for opp in soup.find_all('div', class_="sidearm-schedule-game-opponent-text"):
    opp_list.append(opp.text)

conf_list = []
for conf in soup.find_all('div', class_="sidearm-schedule-game-conference-conference"):
    conf_list.append(conf.text)

data = {'opponent': opp_list, 'conference': conf_list}
df = pd.DataFrame(data)
print(df)

英文:

How can I combine the full lists into a dataframe. When I print it seems to only print the first record and it also includes \n and other redundancies like ' etc.

    import requests
    from requests_html import HTML, HTMLSession
    from bs4 import BeautifulSoup
    import pandas as pd
    import csv
    import json
    
    url = &#39;https://lehighsports.com/sports/mens-soccer/schedule/2018&#39;
    lehigh = requests.get(url).text
    soup = BeautifulSoup(lehigh,&#39;lxml&#39;)
    
    for opp in soup.find_all(&#39;div&#39;,class_=&quot;sidearm-schedule-game-opponent-text&quot;):
        opp_list = []
        opp_list.append(opp.text)
     #   print(opp_list)
    
    for conf in soup.find_all(&#39;div&#39;,class_=&quot;sidearm-schedule-game-conference-conference&quot;):
        conf_list = []
        conf_list.append(conf.text)
    #    print(conf_list)
    
    dict = {&#39;opponent&#39;:[opp_list],&#39;conference&#39;:[conf_list]}
    df = pd.DataFrame(dict)
    print(df)

答案1

得分: 1

你在每次迭代中都将opp_list和conf_list设置为[] - 只需初始化它们一次。此外，你不必在创建字典时使用大括号{'opponent':opp_list,'conference':conf_list}。

要去除空格，你可以使用.get_text()方法，并使用strip=True和separator=参数。

例如：

import requests
import pandas as pd
from bs4 import BeautifulSoup

url = 'https://lehighsports.com/sports/mens-soccer/schedule/2018'
lehigh = requests.get(url).text
soup = BeautifulSoup(lehigh,'lxml')

opp_list = []
for opp in soup.find_all('div',class_="sidearm-schedule-game-opponent-text"):
    opp_list.append(opp.get_text(strip=True, separator=' '))

conf_list = []
for conf in soup.find_all('div',class_="sidearm-schedule-game-conference-conference"):
    conf_list.append(conf.get_text(strip=True))

dict = {'opponent':opp_list,'conference':conf_list}
df = pd.DataFrame(dict)
print(df)

输出：

                             opponent       conference
0                        at UConn                 
1                       vs Drexel                 
2            at George Washington                 
3                   at St. John's                 
4                   vs Binghamton                 
5                        at Rider                 
6                         vs Penn                 
7                         at Army  Patriot League*
8                      vs Cornell                 
9                     at Boston U  Patriot League*
10                 vs #20 Colgate  Patriot League*
11                        vs Navy  Patriot League*
12                   at Lafayette  Patriot League*
13                   at Dartmouth                 
14                    vs American  Patriot League*
15                    at Bucknell  Patriot League*
16                at Loyola (Md.)  Patriot League*
17     vs Holy Cross Senior Night  Patriot League*
18  vs No. 3 Colgate (Semifinals)

英文:

You are setting opp_list and conf_list in every iteration to [] - initialize them only once. Alson, you don't have to put brackets in dictionary creation {'opponent':opp_list,'conference':conf_list}

To remove whitespace, you can use .get_text() method with strip=True and separator= parameters.

For example:

import requests
import pandas as pd
from bs4 import BeautifulSoup

url = &#39;https://lehighsports.com/sports/mens-soccer/schedule/2018&#39;
lehigh = requests.get(url).text
soup = BeautifulSoup(lehigh,&#39;lxml&#39;)

opp_list = []
for opp in soup.find_all(&#39;div&#39;,class_=&quot;sidearm-schedule-game-opponent-text&quot;):
    opp_list.append(opp.get_text(strip=True, separator=&#39; &#39;))

conf_list = []
for conf in soup.find_all(&#39;div&#39;,class_=&quot;sidearm-schedule-game-conference-conference&quot;):
    conf_list.append(conf.get_text(strip=True))

dict = {&#39;opponent&#39;:opp_list,&#39;conference&#39;:conf_list}
df = pd.DataFrame(dict)
print(df)

Prints:

                         opponent       conference
0                        at UConn                 
1                       vs Drexel                 
2            at George Washington                 
3                   at St. John&#39;s                 
4                   vs Binghamton                 
5                        at Rider                 
6                         vs Penn                 
7                         at Army  Patriot League*
8                      vs Cornell                 
9                     at Boston U  Patriot League*
10                 vs #20 Colgate  Patriot League*
11                        vs Navy  Patriot League*
12                   at Lafayette  Patriot League*
13                   at Dartmouth                 
14                    vs American  Patriot League*
15                    at Bucknell  Patriot League*
16                at Loyola (Md.)  Patriot League*
17     vs Holy Cross Senior Night  Patriot League*
18  vs No. 3 Colgate (Semifinals)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Python Beautiful Soup：创建和组合列表，并删除冗余项，如

问题

答案1

Pandas滚动应用以意外方式返回NaN。

将所有数据框合并成一个单一文件。

从列表中嵌套的字典中获取特定键的值。

Website cannot be scraped? not giving full source code.

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论