gpt4 book ai didi

google-bigquery - 了解 Google BigQuery GDELT GKG 2.0 中的主题

转载 作者:行者123 更新时间:2023-12-02 20:52:28 26 4
gpt4 key购买 nike

我正在使用 Google bigquery 来分析 GDELT GKG 2.0 dataset并且想更好地了解如何基于主题(或V2Themes)进行查询。 docs提到“类别列表”电子表格,但到目前为止我还没有成功找到该列表。

以下asesome blog提到您可以使用世界银行分类法等来缩小搜索范围。我的目标是找到所有提到“干旱/水太少”的项目、所有提到“洪水/水太多”的项目以及所有提到“质量差/水太脏”的项目,这些项目在子项目上有地理匹配国家级别。

到目前为止,我已经能够获得不同主题的列表,但这并不广泛,而且我不知道它的层次结构/结构。

SELECT
DISTINCT theme
FROM (
SELECT
GKGRECORDID,
locations,
REGEXP_EXTRACT(themes,r'(^.[^,]+)') AS theme,
CAST(REGEXP_EXTRACT(locations,r'^(?:[^#]*#){0}([^#]*)') AS NUMERIC) AS location_type,
REGEXP_EXTRACT(locations,r'^(?:[^#]*#){1}([^#]*)') AS location_fullname,
REGEXP_EXTRACT(locations,r'^(?:[^#]*#){2}([^#]*)') AS location_countrycode,
REGEXP_EXTRACT(locations,r'^(?:[^#]*#){3}([^#]*)') AS location_adm1code,
REGEXP_EXTRACT(locations,r'^(?:[^#]*#){4}([^#]*)') AS location_adm2code,
REGEXP_EXTRACT(locations,r'^(?:[^#]*#){5}([^#]*)') AS location_latitude,
REGEXP_EXTRACT(locations,r'^(?:[^#]*#){6}([^#]*)') AS location_longitude,
REGEXP_EXTRACT(locations,r'^(?:[^#]*#){7}([^#]*)') AS location_featureid,
REGEXP_EXTRACT(locations,r'^(?:[^#]*#){8}([^#]*)') AS location_characteroffset,
DocumentIdentifier
FROM
`gdelt-bq.gdeltv2.gkg_partitioned`,
UNNEST(SPLIT(V2Locations,';')) AS locations,
UNNEST(SPLIT(V2Themes,';')) AS themes
WHERE
_PARTITIONTIME >= "2018-08-20 00:00:00"
AND _PARTITIONTIME < "2018-08-21 00:00:00" )
WHERE
(location_type = 5
OR location_type = 4
OR location_type = 2) --WorldState, WorldCity or US State
ORDER BY
theme

以及迄今为止我能找到的与水相关的主题列表(示例,并非详尽):

CRISISLEX_C06_WATER_SANITATION
ENV_WATERWAYS
HUMAN_RIGHTS_ABUSES_WATERBOARD
HUMAN_RIGHTS_ABUSES_WATERBOARDED
HUMAN_RIGHTS_ABUSES_WATERBOARDING
NATURAL_DISASTER_FLOODWATER
NATURAL_DISASTER_FLOODWATERS
NATURAL_DISASTER_FLOOD_WATER
NATURAL_DISASTER_FLOOD_WATERS
NATURAL_DISASTER_HIGH_WATER
NATURAL_DISASTER_HIGH_WATERS
NATURAL_DISASTER_WATER_LEVEL
TAX_AIDGROUPS_WATERAID
TAX_DISEASE_WATERBORNE_DISEASE
TAX_DISEASE_WATERBORNE_DISEASES
TAX_FNCACT_WATERBOY
TAX_FNCACT_WATERMAN
TAX_FNCACT_WATERMEN
TAX_FNCACT_WATER_BOY
TAX_WEAPONS_WATER_CANNON
TAX_WEAPONS_WATER_CANNONS
TAX_WORLDBIRDS_WATERFOWL
TAX_WORLDMAMMALS_WATER_BUFFALO
UNGP_CLEAN_WATER_SANITATION
WATER_SECURITY
WB_1000_WATER_MANAGEMENT_STRUCTURES
WB_1021_WATER_LAW
WB_1063_WATER_ALLOCATION_AND_WATER_SUPPLY
WB_1064_WATER_DEMAND_MANAGEMENT
WB_1199_WATER_SUPPLY_AND_SANITATION
WB_1215_WATER_QUALITY_STANDARDS
WB_137_WATER
WB_138_WATER_SUPPLY
WB_139_SANITATION_AND_WASTEWATER
WB_140_AGRICULTURAL_WATER_MANAGEMENT
WB_141_WATER_RESOURCES_MANAGEMENT
WB_143_RURAL_WATER
WB_144_URBAN_WATER
WB_1462_WATER_SANITATION_AND_HYGIENE
WB_149_WASTEWATER_TREATMENT_AND_DISPOSAL
WB_150_WASTEWATER_REUSE
WB_155_WATERSHED_MANAGEMENT
WB_156_GROUNDWATER_MANAGEMENT
WB_159_TRANSBOUNDARY_WATER
WB_1729_URBAN_WATER_FINANCIAL_SUSTAINABILITY
WB_1731_NON_REVENUE_WATER
WB_1778_FRESHWATER_ECOSYSTEMS
WB_1790_INTERNATIONAL_WATERWAYS
WB_1798_WATER_POLLUTION
WB_1805_WATERWAYS
WB_1998_WATER_ECONOMICS
WB_2008_WATER_TREATMENT
WB_2009_WATER_QUALITY_MONITORING
WB_2971_WATER_PRICING
WB_2981_DRINKING_WATER_QUALITY_STANDARDS
WB_2992_FRESHWATER_FISHERIES
WB_427_WATER_ALLOCATION_AND_WATER_ECONOMICS

最佳答案

虽然此链接作为主题列表提供:

http://data.gdeltproject.org/documentation/GDELT-Global_Knowledge_Graph_CategoryList.xlsx

...它还远未完成(也许只是原始主题列表?)。我刚刚提取了一天的 GKG,还有大量主题不在该电子表格的 283 个主题列表中。

GKG 文档位于 https://blog.gdeltproject.org/world-bank-group-topical-taxonomy-now-in-gkg/指向位于 http://pubdocs.worldbank.org/en/275841490966525495/Theme-Taxonomy-and-definitions.pdf 的世界银行分类法。 GKG 帖子暗示该世界银行分类法已纳入 GKG 主题列表中。

这是世界银行分类主题的完整列表。不幸的是,我在 GKG 中发现了许多世界银行主题,但本出版物中没有这些主题。这两个列表的联合代表了 GKG 主题的一部分,但它绝对不是全部。

关于google-bigquery - 了解 Google BigQuery GDELT GKG 2.0 中的主题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51967429/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com