英文:
beautiful soup extract data from script
问题
I can help you translate the code parts you provided into Chinese. Here are the translated code sections:
我获得这段代码,需要从脚本标签中提取纬度和经度。
```JavaScript
<script>
var loadPoints = '/Map/Points'
var mapDetails = {"point":{"latitude":-34.023418,"longitude":18.331407,"title":"Sandy Bay","location":null,"subject":"P","link":"/Explore/South-Africa/Western-Cape/Sandy-Bay"},"bounds":null,"moveMarkerCallback":null,"changeBoundsCallback":null}
requireExploreMap(loadPoints, mapDetails)
</script>
我可以看到所有的HTML内容在soup中,但是当我尝试这种方式时:
def get_textchunk(word1, word2, text):
if not (word1 in text and word2 in text): return ''
return text.split(word1)[-1].split(word2)[0]
lat = get_textchunk('latitude":', ',"longitude', soup.get_text(' '))
它没有返回任何内容。
我如何修复它?
更新
这是我的代码
with open('urls.txt', 'r' ,encoding="utf-8") as inf:
with open('data2.csv' , 'w' ,encoding="utf-8") as outf:
outf.write('Titre,add,art,club,tel,\n')
for row in inf:
url = row.strip()
response = requests.get(url)
if response.ok:
print ("ok")
soup = BeautifulSoup(response.text, 'html.parser')
print (soup)
stag = soup.find("script")
obj = json.loads(re.search(r"mapDetails\s*= \s*({.*});", str(stag)).group(1))
lat, lon = obj["point"]["latitude"], obj["point"]["longitude"]
#Faire une pause
time.sleep(2)
问题是BS找到第一个脚本标签,所需的信息不在第一个标签中。
谢谢你的帮助
我尝试爬取的页面:
https://worldbeachlist.com/Explore/Australia/Victoria/Bells-Beach
这是你提供的代码的中文翻译部分,如有需要,请随时提出问题。
<details>
<summary>英文:</summary>
I get this code and I need to extract latitude and longitude from script tag.
```JavaScript
<script>
var loadPoints = '/Map/Points';
var mapDetails = {"point":{"latitude":-34.023418,"longitude":18.331407,"title":"Sandy Bay","location":null,"subject":"P","link":"/Explore/South-Africa/Western-Cape/Sandy-Bay"},"bounds":null,"moveMarkerCallback":null,"changeBoundsCallback":null};
requireExploreMap(loadPoints, mapDetails);
</script>
I can see all HTML content in soup but when I try this way:
def get_textchunk(word1, word2, text):
if not (word1 in text and word2 in text): return ''
return text.split(word1)[-1].split(word2)[0]
lat = get_textchunk('latitude":', ',"longitude', soup.get_text(' '))
it doesn't return anything.
How can I fix it?
UPDATE
This is my code
with open('urls.txt', 'r' ,encoding="utf-8") as inf:
with open('data2.csv' , 'w' ,encoding="utf-8") as outf:
outf.write('Titre,add,art,club,tel,\n')
for row in inf:
url = row.strip()
response = requests.get(url)
if response.ok:
print ("ok")
soup = BeautifulSoup(response.text, 'html.parser')
print (soup)
stag = soup.find("script")
obj = json.loads(re.search(r"mapDetails\s*= \s*({.*});", str(stag)).group(1))
lat, lon = obj["point"]["latitude"], obj["point"]["longitude"]
#Faire une pause
time.sleep(2)
The problem is BS find the first script tag and the information needed are not in the first tag.
Thanks a lot for your help
The page i try to scrap :
https://worldbeachlist.com/Explore/Australia/Victoria/Bells-Beach
答案1
得分: 1
以下是代码部分的翻译:
import json
import re
from bs4 import BeautifulSoup
sample_script = """
<script>
var loadPoints = '/Map/Points'
var mapDetails = {"point":{"latitude":-34.023418,"longitude":18.331407,"title":"Sandy Bay","location":null,"subject":"P","link":"/Explore/South-Africa/Western-Cape/Sandy-Bay"},"bounds":null,"moveMarkerCallback":null,"changeBoundsCallback":null};
requireExploreMap(loadPoints, mapDetails);
</script>
"""
soup = BeautifulSoup(sample_script, 'html.parser').find('script').string
data = json.loads(re.search(r"mapDetails = (.+?);", soup).group(1))
print(json.dumps(data, indent=4))
# Access the keys
print(data['point']['latitude'])
print(data['point']['longitude'])
输出:
{
"point": {
"latitude": -34.023418,
"longitude": 18.331407,
"title": "Sandy Bay",
"location": null,
"subject": "P",
"link": "/Explore/South-Africa/Western-Cape/Sandy-Bay"
},
"bounds": null,
"moveMarkerCallback": null,
"changeBoundsCallback": null
}
-34.023418
18.331407
英文:
Try this:
import json
import re
from bs4 import BeautifulSoup
sample_script = """
<script>
var loadPoints = '/Map/Points';
var mapDetails = {"point":{"latitude":-34.023418,"longitude":18.331407,"title":"Sandy Bay","location":null,"subject":"P","link":"/Explore/South-Africa/Western-Cape/Sandy-Bay"},"bounds":null,"moveMarkerCallback":null,"changeBoundsCallback":null};
requireExploreMap(loadPoints, mapDetails);
</script>
"""
soup = BeautifulSoup(sample_script, 'html.parser').find('script').string
data = json.loads(re.search(r"mapDetails = (.+?);", soup).group(1))
print(json.dumps(data, indent=4))
# Access the keys
print(data['point']['latitude'])
print(data['point']['longitude'])
Output:
{
"point": {
"latitude": -34.023418,
"longitude": 18.331407,
"title": "Sandy Bay",
"location": null,
"subject": "P",
"link": "/Explore/South-Africa/Western-Cape/Sandy-Bay"
},
"bounds": null,
"moveMarkerCallback": null,
"changeBoundsCallback": null
}
-34.023418
18.331407
答案2
得分: 1
import re, json
stag = soup.find("script")
obj = json.loads(re.search(r"mapDetails\s*=\s*({.*});", str(stag)).group(1))
lat, lon = obj["point"]["latitude"], obj["point"]["longitude"]
Output :
print("Latitude:", lat) # Latitude: -34.023418
print("Longitude:", lon) # Longitude: 18.331407
英文:
import re, json
stag = soup.find("script")
obj = json.loads(re.search(r"mapDetails\s*=\s*({.*});", str(stag)).group(1))
lat, lon = obj["point"]["latitude"], obj["point"]["longitude"]
Output :
print("Latitude:", lat) # Latitude: -34.023418
print("Longitude:", lon) # Longitude: 18.331407
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论