一个有效的方法是如何找到URL的正确oEmbed提供者?

huangapple go评论85阅读模式
英文:

What's an efficient way of finding the correct oEmbed provider for a URL?

问题

我偶然发现了oEmbed规范,并发现他们还有一个providers.json文件,您可以在其中找到所有已知的oEmbed提供商。它基本上是一个大数组,其中包含像这样的对象:

{
    "provider_name": "Vimeo",
    "provider_url": "https://vimeo.com/",
    "endpoints": [
        {
            "schemes": [
                "https://vimeo.com/*",
                "https://vimeo.com/album/*/video/*",
                "https://vimeo.com/channels/*/*",
                "https://vimeo.com/groups/*/videos/*",
                "https://vimeo.com/ondemand/*/*",
                "https://player.vimeo.com/video/*"
            ],
            "url": "https://vimeo.com/api/oembed.{format}",
            "discovery": true
        }
    ]
},
{
    "provider_name": "YouTube",
    "provider_url": "https://www.youtube.com/",
    "endpoints": [
        {
            "schemes": [
                "https://*.youtube.com/watch*",
                "https://*.youtube.com/v/*",
                "https://youtu.be/*",
                "https://*.youtube.com/playlist?list=*",
                "https://youtube.com/playlist?list=*",
                "https://*.youtube.com/shorts*"
            ],
            "url": "https://www.youtube.com/oembed",
            "discovery": true
        }
    ]
},

我想在我的JavaScript项目中利用这一点,但不太确定如何高效地使用它。假设您有一个函数,该函数给定某个URL,现在您需要找出此URL是否与任何提供商匹配。您会如何做?


一种蛮力的方法当然可以是循环遍历每个块,将schemes的每个条目转换为正则表达式,并进行测试,直到找到匹配项或达到列表的末尾。但这感觉会非常慢。是否有方法可以加速它?例如,是否有更高效的方法来匹配那些通配符方案,而不是创建正则表达式实例并进行测试?

英文:

I stumbled upon the oEmbed spec, and found they also have a providers.json file where you can find all their known oEmbed providers. It's basically one big array, with objects like these:

    {
        "provider_name": "Vimeo",
        "provider_url": "https://vimeo.com/",
        "endpoints": [
            {
                "schemes": [
                    "https://vimeo.com/*",
                    "https://vimeo.com/album/*/video/*",
                    "https://vimeo.com/channels/*/*",
                    "https://vimeo.com/groups/*/videos/*",
                    "https://vimeo.com/ondemand/*/*",
                    "https://player.vimeo.com/video/*"
                ],
                "url": "https://vimeo.com/api/oembed.{format}",
                "discovery": true
            }
        ]
    },
    {
        "provider_name": "YouTube",
        "provider_url": "https://www.youtube.com/",
        "endpoints": [
            {
                "schemes": [
                    "https://*.youtube.com/watch*",
                    "https://*.youtube.com/v/*",
                    "https://youtu.be/*",
                    "https://*.youtube.com/playlist?list=*",
                    "https://youtube.com/playlist?list=*",
                    "https://*.youtube.com/shorts*"
                ],
                "url": "https://www.youtube.com/oembed",
                "discovery": true
            }
        ]
    },

}

I'd like to make use of this in my Javascript project, but very unsure how to use it efficiently. Say you have a function which is given some URL, and now you need to find which provider (if any) this URL matches. How would you do that?


A brute-force way could of course be to just loop through each block, convert each entry of schemes into a regex, and test it, until you either find a match, or reach the end of the list. This feels like it's going to be very slow though. Are there ways to speed it up somehow? Are there for example more efficient ways of matching those wildcard schemes than creating regex instances and testing with those?

答案1

得分: 1

看起来 '*' 不能包含 '/',而且在你的示例中,它总是在前面作为子域名。-> 你可以从所有的 schemes 中提取域名,并将它们用作 HashMap 中的键,以减少对特定匹配的检查到只有少数测试。

英文:

Looks like '*' cannot contain '/' and in your examples it always are subdomains when at front. -> You could extract the domain form all schemes and use them as keys in a HashMap to reduce checking for a specific match to only a few tests.

huangapple
  • 本文由 发表于 2023年7月7日 03:27:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/76631992.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定