gpt4 book ai didi

text - 维基百科文字下载

转载 作者:行者123 更新时间:2023-12-03 11:13:14 24 4
gpt4 key购买 nike

我正在为我的大学项目下载完整的维基百科文本。我是否必须编写自己的蜘蛛才能下载此内容,或者是否有在线的维基百科公共(public)数据集?

只是给你一些我的项目的概述,我想从我感兴趣的几篇文章中找出有趣的词。但是为了找到这些有趣的词,我打算应用 tf/idf 来计算每个词的词频并挑选那些频率高的。但是要计算 tf,我需要知道整个维基百科的总出现次数。

如何才能做到这一点?

最佳答案

来自维基百科:http://en.wikipedia.org/wiki/Wikipedia_database

Wikipedia offers free copies of all available content to interested users. These databases can be used for mirroring, personal use, informal backups, offline use or database queries (such as for Wikipedia:Maintenance). All text content is multi-licensed under the Creative Commons Attribution-ShareAlike 3.0 License (CC-BY-SA) and the GNU Free Documentation License (GFDL). Images and other files are available under different terms, as detailed on their description pages. For our advice about complying with these licenses, see Wikipedia:Copyrights.


看来你也很走运。从转储部分:

As of 12 March 2010, the latest complete dump of the English-language Wikipedia can be found at http://download.wikimedia.org/enwiki/20100130/ This is the first complete dump of the English-language Wikipedia to have been created since 2008.Please note that more recent dumps (such as the 20100312 dump) are incomplete.


所以数据只有 9 天 :)

关于text - 维基百科文字下载,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/2683506/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com