Publication Details

Category Text Publication
Reference Category Journals
DOI 10.5194/nhess-26-2609-2026
Licence creative commons licence
Title (Primary) Wikimpacts 1.0: a new global climate impact database based on automated information extraction from Wikipedia
Author Li, N.; Thiery, W.; Zahra, S.; de Brito, M.M. ORCID logo ; Worou, K.; Kurfalı, M.; Lampe, S.; Muñoz, P.; Flynn, C.; Trigoso, C.; Nivre, J.; Zscheischler, J. ORCID logo ; Messori, G.
Source Titel Natural Hazards and Earth System Sciences
Year 2026
Department SUSOZ; CER
Volume 26
Issue 6
Page From 2609
Page To 2636
Language englisch
Topic T5 Future Landscapes
Data and Software links https://doi.org/10.17043/li-2025-wikimpacts-1.0.final
https://doi.org/10.5281/zenodo.19428787
Supplements Supplement 1
Abstract Climate extremes like storms, heatwaves, wildfires, droughts and floods significantly threaten society and ecosystems. However, comprehensive data on the socio-economic impacts of climate extremes remains limited. Here we present Wikimpacts 1.0, a global climate impact database built by extracting information from Wikipedia using natural language processing. Our method identifies relevant articles, extracts the information using GPT4o, post-processes the information and consolidates the database. Impact data is stored at the event, national, and sub-national levels, covering 2726 events from 1034 to 2024, with 17 912 national and 32 343 sub-national entries. The database shows low error scores (range from 0 to 1) for event-level information like timing (0.05), deaths (0.03), and economic damage (0.12), and slightly higher error scores for injuries (0.21), homelessness (0.25), displacement (0.29), and damaged buildings (0.28) compared to manually annotated data from 156 events. Wikimpacts 1.0 provides a different event coverage than EM-DAT, notably providing broader coverage of storm impacts but more limited coverage of flood impacts. We match 179 events between the two databases to compare impact values, and find that 32 out of 179 matched events have identical data for deaths, and 7 out of 77 for injuries. However, there are notable discrepancies in information on homelessness and damage. We view the publicly available Wikimpacts 1.0 database as a complementary resource to existing impact databases, which facilitates subnational climate impact assessments, and highlights the potential of natural language processing to enhance existing impact datasets and provide robust information on climate impacts. Lastly, we provide a static version of the database as used in this paper at https://bolin.su.se/data/li-2025-wikimpacts-1.0.final (last access: 20 May 2026) and the Wikimpacts website (https://www.wikimpacts.eu/, last access: 20 May 2026) for visualization and access for future updates of the database.
Li, N., Thiery, W., Zahra, S., de Brito, M.M., Worou, K., Kurfalı, M., Lampe, S., Muñoz, P., Flynn, C., Trigoso, C., Nivre, J., Zscheischler, J., Messori, G. (2026):
Wikimpacts 1.0: a new global climate impact database based on automated information extraction from Wikipedia
Nat. Hazards Earth Syst. Sci. 26 (6), 2609 - 2636
10.5194/nhess-26-2609-2026