英文:
searching a list for multiple substrings in python?
问题
Sure, here's the translated code snippet for your request:
import requests
from bs4 import BeautifulSoup
import re
ca = requests.get(ca_data)
soup = BeautifulSoup(ca.content, 'html.parser')
links = []
for link in soup.findAll('a', attrs={'href': re.compile("^http://")}):
links.append(link.get('href'))
r = re.compile(".*vote")
newlist = list(filter(r.match, links))
print(newlist)
subs = 'sen_floor'
sen_votes = list(filter(lambda x: subs in x, newlist))
print(str(sen_votes))
sub = 'asm_floor'
asm_votes = list(filter(lambda x: sub in x, newlist))
print(str(asm_votes))
Please note that this code appears to be written in Python and is used for web scraping. If you encounter issues with the "asm_floor" search not working, you may want to check the HTML structure of the page you're scraping to ensure that the links containing "asm_floor" are correctly formatted.
英文:
So I have a list containing 10-15 links, and I want to search for the links that contain either 'sen_floor' or 'asm_floor'
this is my code so far (ca_data is the original link):
import requests
from bs4 import BeautifulSoup
import re
ca = requests.get(ca_data)
soup = BeautifulSoup(ca.content, 'html.parser')
links = []
for link in soup.findAll('a', attrs={'href': re.compile("^http://")}):
links.append(link.get('href'))
r = re.compile(".*vote")
newlist = list(filter(r.match, links))
print(newlist)
subs = 'sen_floor'
sen_votes = list(filter(lambda x: subs in x, newlist))
print(str(sen_votes))
this effectively returns a list of all links containing sen_floor
. Ideally I'd like to have a separate list with asm_floor
. I tried repeating the last paragraph:
sub = 'asm_floor'
asm_votes = list(filter(lambda x: sub in x, newlist))
print(str(asm_votes))
but it doesn't work, just returns the same result as the sen_floor search.
Help?
答案1
得分: 1
import requests
from bs4 import BeautifulSoup
r = requests.get(
"http://www.legislature.ca.gov/cgi-bin/port-postquery?bill_number=ab_2&sess=CUR&house=B&author=alejo_%3Calejo%3E")
soup = BeautifulSoup(r.text, 'html.parser')
sen = []
asm = []
for item in soup.findAll("a", {'href': True}):
item = item.get("href")
if 'sen_floor' in item:
sen.append(item)
elif 'asm_floor' in item:
asm.append(item)
英文:
import requests
from bs4 import BeautifulSoup
r = requests.get(
"http://www.legislature.ca.gov/cgi-bin/port-postquery?bill_number=ab_2&sess=CUR&house=B&author=alejo_%3Calejo%3E")
soup = BeautifulSoup(r.text, 'html.parser')
sen = []
asm = []
for item in soup.findAll("a", {'href': True}):
item = item.get("href")
if 'sen_floor' in item:
sen.append(item)
elif 'asm_floor' in item:
asm.append(item)
答案2
得分: 0
只需使用包含运算符和Or语法来指定要在hrefs中匹配的多个子字符串。这将仅返回包含指定子字符串之一的hrefs。如果要检查多个页面,请在循环中使用,并确保更新soup对象。
matches = [i['href'] for i in soup.select('[href*=asm_floor],[href*=sen_floor]')]
分开列表以添加到
import requests
from bs4 import BeautifulSoup
r = requests.get("http://www.legislature.ca.gov/cgi-bin/port-postquery?bill_number=ab_2&sess=CUR&house=B&author=alejo_%3Calejo%3E")
soup = BeautifulSoup(r.text, 'html.parser')
sen = [i['href'] for i in soup.select('[href*=sen_floor]')]
asm = [i['href'] for i in soup.select('[href*=asm_floor]')]
print('sen: ', sen)
print('asm:', asm)
Note: The code provided is a translation of the code snippet you provided.
英文:
Just use contains operator with Or syntax to specify multiple substrings to match on in hrefs. This returns only hrefs containing either of the specified substrings. Use in a loop if checking multiple pages ensuring you update soup object.
matches = [i['href'] for i in soup.select('[href*=asm_floor],[href*=sen_floor]')]
Separate lists to add to
import requests
from bs4 import BeautifulSoup
r = requests.get("http://www.legislature.ca.gov/cgi-bin/port-postquery?bill_number=ab_2&sess=CUR&house=B&author=alejo_%3Calejo%3E")
soup = BeautifulSoup(r.text, 'html.parser')
sen = [i['href'] for i in soup.select('[href*=sen_floor]')]
asm = [i['href'] for i in soup.select('[href*=asm_floor]')]
print('sen: ', sen)
print('asm:', asm)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论