英文:
match pages corresponding to the wikipedia content from the wikipedia sql dumps
问题
关于描述的内容:
Wikipedia:Contents对维基百科中的文章类型进行分类。
https://en.wikipedia.org/wiki/Wikipedia:Contents
我想提取所有涉及到内容类型的文章,如概要(Outlines)和列表(Lists)。这些文章与普通文章位于相同的命名空间,因此通过命名空间筛选页面无法解决问题。
我查看了以下信息:
https://meta.wikimedia.org/wiki/Data_dumps/What%27s_available_for_download
以及content
和content models
表:
https://www.mediawiki.org/wiki/Manual:Database_layout
https://www.mediawiki.org/wiki/Manual:Content_models_table
但我无法找到解决问题的方法。
- 我应该如何提取属于概要和列表等内容类型的页面的页面ID或标题,如https://en.wikipedia.org/wiki/Wikipedia:Contents中所提到的?
- 哪些数据转储包含这样的信息?
英文:
Referring to the description:
Wikipedia:Contents categorises types of articles in Wikipedia.
https://en.wikipedia.org/wiki/Wikipedia:Contents
I want to extract all of the articles referring to types of contents, like Outlines, Lists. These articles are in the same namespace as normal articles, so filtering pages by namespace did not work.
I looked at info at:
https://meta.wikimedia.org/wiki/Data_dumps/What%27s_available_for_download
and at content
and content models
tables in :
https://www.mediawiki.org/wiki/Manual:Database_layout
https://www.mediawiki.org/wiki/Manual:Content_models_table
but I could not find a way to solve the problem.
- How could I extract the pageids, or titles, of the pages that belongs to types of content as Outline and List, mentioned in https://en.wikipedia.org/wiki/Wikipedia:Contents ?
- Which Dumps contains such info ?
答案1
得分: 1
我在Quarry使用了这个查询,我认为它可以满足你的要求:
SELECT page_title, page_id FROM page
WHERE (page_title LIKE 'List_%'
OR page_title LIKE 'Outline_%')
AND page_is_redirect = 0
AND page_namespace = 0
你可以在这里查看结果:https://quarry.wmcloud.org/query/71439
英文:
I've used this query at Quarry which I think does what you're asking:
SELECT page_title,page_id from page
Where (page_title LIKE 'List_%'
OR page_title LIKE 'Outline_%')
and page_is_redirect = 0
and page_namespace = 0
You can see the results here: https://quarry.wmcloud.org/query/71439
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论