2020年1月3日 21:12:13go评论139阅读模式

英文:

searching a list for multiple substrings in python?

问题

Sure, here's the translated code snippet for your request:

import requests
from bs4 import BeautifulSoup
import re

ca = requests.get(ca_data)
soup = BeautifulSoup(ca.content, 'html.parser')
links = []

for link in soup.findAll('a', attrs={'href': re.compile("^http://")}):
   links.append(link.get('href'))

r = re.compile(".*vote")
newlist = list(filter(r.match, links))
print(newlist)

subs = 'sen_floor'
sen_votes = list(filter(lambda x: subs in x, newlist))
print(str(sen_votes))

sub = 'asm_floor'
asm_votes = list(filter(lambda x: sub in x, newlist))
print(str(asm_votes))

Please note that this code appears to be written in Python and is used for web scraping. If you encounter issues with the "asm_floor" search not working, you may want to check the HTML structure of the page you're scraping to ensure that the links containing "asm_floor" are correctly formatted.

英文:

So I have a list containing 10-15 links, and I want to search for the links that contain either 'sen_floor' or 'asm_floor'

this is my code so far (ca_data is the original link):

import requests
from bs4 import BeautifulSoup
import re

ca = requests.get(ca_data)
soup = BeautifulSoup(ca.content, &#39;html.parser&#39;)
links = []

for link in soup.findAll(&#39;a&#39;, attrs={&#39;href&#39;: re.compile(&quot;^http://&quot;)}):
   links.append(link.get(&#39;href&#39;))

r = re.compile(&quot;.*vote&quot;)
newlist = list(filter(r.match, links))
print(newlist)

subs = &#39;sen_floor&#39;
sen_votes = list(filter(lambda x: subs in x, newlist))
print(str(sen_votes))

this effectively returns a list of all links containing sen_floor. Ideally I'd like to have a separate list with asm_floor. I tried repeating the last paragraph:

sub = &#39;asm_floor&#39;
asm_votes = list(filter(lambda x: sub in x, newlist))
print(str(asm_votes))

but it doesn't work, just returns the same result as the sen_floor search.

Help?

答案1

得分: 1

import requests
from bs4 import BeautifulSoup

r = requests.get(
"http://www.legislature.ca.gov/cgi-bin/port-postquery?bill_number=ab_2&sess=CUR&house=B&author=alejo_%3Calejo%3E")

soup = BeautifulSoup(r.text, 'html.parser')

sen = []
asm = []
for item in soup.findAll("a", {'href': True}):
item = item.get("href")
if 'sen_floor' in item:
sen.append(item)
elif 'asm_floor' in item:
asm.append(item)

英文:

import requests
from bs4 import BeautifulSoup

r = requests.get(
    &quot;http://www.legislature.ca.gov/cgi-bin/port-postquery?bill_number=ab_2&amp;sess=CUR&amp;house=B&amp;author=alejo_%3Calejo%3E&quot;)

soup = BeautifulSoup(r.text, &#39;html.parser&#39;)

sen = []
asm = []
for item in soup.findAll(&quot;a&quot;, {&#39;href&#39;: True}):
    item = item.get(&quot;href&quot;)
    if &#39;sen_floor&#39; in item:
        sen.append(item)
    elif &#39;asm_floor&#39; in item:
        asm.append(item)

答案2

得分: 0

只需使用包含运算符和Or语法来指定要在hrefs中匹配的多个子字符串。这将仅返回包含指定子字符串之一的hrefs。如果要检查多个页面，请在循环中使用，并确保更新soup对象。

matches = [i['href'] for i in soup.select('[href*=asm_floor],[href*=sen_floor]')]

分开列表以添加到

import requests
from bs4 import BeautifulSoup

r = requests.get("http://www.legislature.ca.gov/cgi-bin/port-postquery?bill_number=ab_2&sess=CUR&house=B&author=alejo_%3Calejo%3E")
soup = BeautifulSoup(r.text, 'html.parser')
sen = [i['href'] for i in soup.select('[href*=sen_floor]')]
asm = [i['href'] for i in soup.select('[href*=asm_floor]')]
print('sen: ', sen)
print('asm:', asm)

Note: The code provided is a translation of the code snippet you provided.

英文:

Just use contains operator with Or syntax to specify multiple substrings to match on in hrefs. This returns only hrefs containing either of the specified substrings. Use in a loop if checking multiple pages ensuring you update soup object.

matches = [i[&#39;href&#39;] for i in soup.select(&#39;[href*=asm_floor],[href*=sen_floor]&#39;)]

Separate lists to add to

import requests
from bs4 import BeautifulSoup

r = requests.get(&quot;http://www.legislature.ca.gov/cgi-bin/port-postquery?bill_number=ab_2&amp;sess=CUR&amp;house=B&amp;author=alejo_%3Calejo%3E&quot;)
soup = BeautifulSoup(r.text, &#39;html.parser&#39;)
sen = [i[&#39;href&#39;] for i in soup.select(&#39;[href*=sen_floor]&#39;)]
asm = [i[&#39;href&#39;] for i in soup.select(&#39;[href*=asm_floor]&#39;)]
print(&#39;sen: &#39;, sen)
print(&#39;asm:&#39;, asm)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在Python中搜索列表中多个子字符串？

问题

答案1

答案2

Pandas/Dask 读取 Parquet 文件时不区分大小写的列名

Python Tkinter：使用Pack()为Canvas添加水平滚动条

有没有办法删除不符合特定时间值的xarray数据集中的时间值？

Python代码未遍历超过一个项目的列表项。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论