匹配与Wikipedia SQL转储中的Wikipedia内容相对应的页面

huangapple go评论53阅读模式
英文:

match pages corresponding to the wikipedia content from the wikipedia sql dumps

问题

关于描述的内容:

Wikipedia:Contents对维基百科中的文章类型进行分类。

https://en.wikipedia.org/wiki/Wikipedia:Contents

我想提取所有涉及到内容类型的文章,如概要(Outlines)和列表(Lists)。这些文章与普通文章位于相同的命名空间,因此通过命名空间筛选页面无法解决问题。

我查看了以下信息:

https://meta.wikimedia.org/wiki/Data_dumps/What%27s_available_for_download

以及contentcontent models表:

https://www.mediawiki.org/wiki/Manual:Database_layout

https://www.mediawiki.org/wiki/Manual:Content_models_table

但我无法找到解决问题的方法。

  • 我应该如何提取属于概要和列表等内容类型的页面的页面ID或标题,如https://en.wikipedia.org/wiki/Wikipedia:Contents中所提到的?
  • 哪些数据转储包含这样的信息?
英文:

Referring to the description:

Wikipedia:Contents categorises types of articles in Wikipedia.

https://en.wikipedia.org/wiki/Wikipedia:Contents

I want to extract all of the articles referring to types of contents, like Outlines, Lists. These articles are in the same namespace as normal articles, so filtering pages by namespace did not work.

I looked at info at:

https://meta.wikimedia.org/wiki/Data_dumps/What%27s_available_for_download

and at content and content models tables in :
https://www.mediawiki.org/wiki/Manual:Database_layout
https://www.mediawiki.org/wiki/Manual:Content_models_table

but I could not find a way to solve the problem.

答案1

得分: 1

我在Quarry使用了这个查询,我认为它可以满足你的要求:

SELECT page_title, page_id FROM page
WHERE (page_title LIKE 'List_%' 
    OR page_title LIKE 'Outline_%')
    AND page_is_redirect = 0
    AND page_namespace = 0

你可以在这里查看结果:https://quarry.wmcloud.org/query/71439

英文:

I've used this query at Quarry which I think does what you're asking:

SELECT page_title,page_id from page
Where (page_title LIKE 'List_%' 
    OR page_title LIKE 'Outline_%')
and page_is_redirect = 0
and page_namespace = 0

You can see the results here: https://quarry.wmcloud.org/query/71439

huangapple
  • 本文由 发表于 2023年2月16日 19:03:26
  • 转载请务必保留本文链接:https://go.coder-hub.com/75471331.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定