2023年4月11日 15:35:31go评论105阅读模式

英文:

beautiful soup extract data from script

问题

I can help you translate the code parts you provided into Chinese. Here are the translated code sections:

我获得这段代码，需要从脚本标签中提取纬度和经度。
   ```JavaScript
&lt;script&gt;
	var loadPoints = &#39;/Map/Points&#39;
	var mapDetails = {&quot;point&quot;:{&quot;latitude&quot;:-34.023418,&quot;longitude&quot;:18.331407,&quot;title&quot;:&quot;Sandy Bay&quot;,&quot;location&quot;:null,&quot;subject&quot;:&quot;P&quot;,&quot;link&quot;:&quot;/Explore/South-Africa/Western-Cape/Sandy-Bay&quot;},&quot;bounds&quot;:null,&quot;moveMarkerCallback&quot;:null,&quot;changeBoundsCallback&quot;:null}
	requireExploreMap(loadPoints, mapDetails)
&lt;/script&gt;

我可以看到所有的HTML内容在soup中，但是当我尝试这种方式时：

def get_textchunk(word1, word2, text):
 if not (word1 in text and word2 in text): return &#39;&#39;
 return text.split(word1)[-1].split(word2)[0]
lat  = get_textchunk(&#39;latitude&quot;:&#39;, &#39;,&quot;longitude&#39;, soup.get_text(&#39; &#39;))

它没有返回任何内容。

我如何修复它？

更新

这是我的代码

with open(&#39;urls.txt&#39;, &#39;r&#39; ,encoding=&quot;utf-8&quot;) as inf:
    with open(&#39;data2.csv&#39; , &#39;w&#39; ,encoding=&quot;utf-8&quot;) as outf:
        outf.write(&#39;Titre,add,art,club,tel,\n&#39;)
    for row in inf:
        url =  row.strip()
        response = requests.get(url)
        if response.ok:
            print (&quot;ok&quot)
            soup = BeautifulSoup(response.text, &#39;html.parser&#39;)
            print (soup)
            stag = soup.find(&quot;script&quot;)
            obj = json.loads(re.search(r&quot;mapDetails\s*= \s*({.*});&quot;, str(stag)).group(1))
            lat, lon = obj[&quot;point&quot;][&quot;latitude&quot;], obj[&quot;point&quot;][&quot;longitude&quot;]
            #Faire une pause    
            time.sleep(2)

问题是BS找到第一个脚本标签，所需的信息不在第一个标签中。

谢谢你的帮助

我尝试爬取的页面：
https://worldbeachlist.com/Explore/Australia/Victoria/Bells-Beach


这是你提供的代码的中文翻译部分，如有需要，请随时提出问题。
<details>
<summary>英文:</summary>
I get this code and I need to extract latitude and longitude from script tag. 
   ```JavaScript
&lt;script&gt;
	var loadPoints = &#39;/Map/Points&#39;;
	var mapDetails = {&quot;point&quot;:{&quot;latitude&quot;:-34.023418,&quot;longitude&quot;:18.331407,&quot;title&quot;:&quot;Sandy Bay&quot;,&quot;location&quot;:null,&quot;subject&quot;:&quot;P&quot;,&quot;link&quot;:&quot;/Explore/South-Africa/Western-Cape/Sandy-Bay&quot;},&quot;bounds&quot;:null,&quot;moveMarkerCallback&quot;:null,&quot;changeBoundsCallback&quot;:null};
	requireExploreMap(loadPoints, mapDetails);
&lt;/script&gt;

I can see all HTML content in soup but when I try this way:

def get_textchunk(word1, word2, text):
 if not (word1 in text and word2 in text): return &#39;&#39;
 return text.split(word1)[-1].split(word2)[0]
lat  = get_textchunk(&#39;latitude&quot;:&#39;, &#39;,&quot;longitude&#39;, soup.get_text(&#39; &#39;))

it doesn't return anything.

How can I fix it?

UPDATE

This is my code

with open(&#39;urls.txt&#39;, &#39;r&#39; ,encoding=&quot;utf-8&quot;) as inf:
    with open(&#39;data2.csv&#39; , &#39;w&#39; ,encoding=&quot;utf-8&quot;) as outf:
        outf.write(&#39;Titre,add,art,club,tel,\n&#39;)
    
    
    
    for row in inf:      
        url =  row.strip()
        response = requests.get(url)
  
     
        
        if response.ok:
            print (&quot;ok&quot;)
           
            
           
           
            soup = BeautifulSoup(response.text, &#39;html.parser&#39;)
            print (soup)
            stag = soup.find(&quot;script&quot;)
            obj = json.loads(re.search(r&quot;mapDetails\s*= \s*({.*});&quot;, str(stag)).group(1))
            lat, lon = obj[&quot;point&quot;][&quot;latitude&quot;], obj[&quot;point&quot;][&quot;longitude&quot;]                
            
     
           
            
        
            #Faire une pause    
            time.sleep(2)

The problem is BS find the first script tag and the information needed are not in the first tag.
Thanks a lot for your help

The page i try to scrap :
https://worldbeachlist.com/Explore/Australia/Victoria/Bells-Beach

答案1

得分: 1

以下是代码部分的翻译：

import json
import re
from bs4 import BeautifulSoup
sample_script = &quot;&quot;&quot;
&lt;script&gt;
    var loadPoints = &#39;/Map/Points&#39;
    var mapDetails = {&quot;point&quot;:{&quot;latitude&quot;:-34.023418,&quot;longitude&quot;:18.331407,&quot;title&quot;:&quot;Sandy Bay&quot;,&quot;location&quot;:null,&quot;subject&quot;:&quot;P&quot;,&quot;link&quot;:&quot;/Explore/South-Africa/Western-Cape/Sandy-Bay&quot;},&quot;bounds&quot;:null,&quot;moveMarkerCallback&quot;:null,&quot;changeBoundsCallback&quot;:null};
    requireExploreMap(loadPoints, mapDetails);
&lt;/script&gt;
&quot;&quot;&quot;
soup = BeautifulSoup(sample_script, &#39;html.parser&#39;).find(&#39;script&#39;).string
data = json.loads(re.search(r&quot;mapDetails = (.+?);&quot;, soup).group(1))
print(json.dumps(data, indent=4))
# Access the keys
print(data[&#39;point&#39;][&#39;latitude&#39;])
print(data[&#39;point&#39;][&#39;longitude&#39;])

输出：

{
    &quot;point&quot;: {
        &quot;latitude&quot;: -34.023418,
        &quot;longitude&quot;: 18.331407,
        &quot;title&quot;: &quot;Sandy Bay&quot;,
        &quot;location&quot;: null,
        &quot;subject&quot;: &quot;P&quot;,
        &quot;link&quot;: &quot;/Explore/South-Africa/Western-Cape/Sandy-Bay&quot;
    },
    &quot;bounds&quot;: null,
    &quot;moveMarkerCallback&quot;: null,
    &quot;changeBoundsCallback&quot;: null
}
-34.023418
18.331407

英文:

Try this:

import json
import re
from bs4 import BeautifulSoup
sample_script = &quot;&quot;&quot;
&lt;script&gt;
                var loadPoints = &#39;/Map/Points&#39;;
                var mapDetails = {&quot;point&quot;:{&quot;latitude&quot;:-34.023418,&quot;longitude&quot;:18.331407,&quot;title&quot;:&quot;Sandy Bay&quot;,&quot;location&quot;:null,&quot;subject&quot;:&quot;P&quot;,&quot;link&quot;:&quot;/Explore/South-Africa/Western-Cape/Sandy-Bay&quot;},&quot;bounds&quot;:null,&quot;moveMarkerCallback&quot;:null,&quot;changeBoundsCallback&quot;:null};
                requireExploreMap(loadPoints, mapDetails);
            &lt;/script&gt;
&quot;&quot;&quot;
soup = BeautifulSoup(sample_script, &#39;html.parser&#39;).find(&#39;script&#39;).string
data = json.loads(re.search(r&quot;mapDetails = (.+?);&quot;, soup).group(1))
print(json.dumps(data, indent=4))
# Access the keys
print(data[&#39;point&#39;][&#39;latitude&#39;])
print(data[&#39;point&#39;][&#39;longitude&#39;])

Output:

{
    &quot;point&quot;: {
        &quot;latitude&quot;: -34.023418,
        &quot;longitude&quot;: 18.331407,
        &quot;title&quot;: &quot;Sandy Bay&quot;,
        &quot;location&quot;: null,
        &quot;subject&quot;: &quot;P&quot;,
        &quot;link&quot;: &quot;/Explore/South-Africa/Western-Cape/Sandy-Bay&quot;
    },
    &quot;bounds&quot;: null,
    &quot;moveMarkerCallback&quot;: null,
    &quot;changeBoundsCallback&quot;: null
}
-34.023418
18.331407

答案2

得分: 1

我会使用 loads/search：

import re, json
stag = soup.find("script")
obj = json.loads(re.search(r"mapDetails\s*=\s*({.*});", str(stag)).group(1))
lat, lon = obj["point"]["latitude"], obj["point"]["longitude"]
Output :
print("Latitude:", lat) # Latitude: -34.023418
print("Longitude:", lon) # Longitude: 18.331407

英文:

I would use a loads/search :

import re, json
stag = soup.find(&quot;script&quot;)
obj = json.loads(re.search(r&quot;mapDetails\s*=\s*({.*});&quot;, str(stag)).group(1))
lat, lon = obj[&quot;point&quot;][&quot;latitude&quot;], obj[&quot;point&quot;][&quot;longitude&quot;]

Output :

print(&quot;Latitude:&quot;, lat) # Latitude: -34.023418
print(&quot;Longitude:&quot;, lon) # Longitude: 18.331407

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Beautiful Soup从脚本中提取数据

问题

答案1

答案2

创建PNG图像在Python中

“TensorFlow”无法使用PIP安装？”找不到适配的发行版本tensorflow”

Using an enumerate function for a string, with inserted file extracted variables, is it possible to recall correlated enumerate numbers upon input?

pyparsing: NotAny(FollowedBy()) 失败

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。