CKAN Harvester
This guide explains how to configure CKAN to harvest metadata from other open data platforms, such as Socrata and ArcGIS.
Plugins Used
-
Harvest Basket - Provides a unified interface for managing multiple harvesters.
-
ArcGIS Harvester - Allows CKAN to harvest datasets from ArcGIS-based platforms.
-
Socrata Harvester - Enables CKAN to import metadata from Socrata data portals.
-
CKAN Harvester - Supports harvesting metadata from other CKAN instances.
Harvesting
Harvesting is the process of automatically collecting metadata or datasets from external data platforms and importing them into CKAN. This helps keep your CKAN catalog up-to-date with data published on other portals.
Harvesting in CKAN
CKAN supports harvesting through plugins, and it can be managed via the CKAN web interface. The harvesting view is accessible from the Sysadmin dashboard.

To add a new harvest source, click the Add Harvest Source button. This will open a configuration window where you can set various harvesting options.

Among the configuration options are:
-
URL – Enter the address of the external data source to be harvested.
-
Title of Harvest Source – A name to identify the harvest source in CKAN.
-
Description – A short description in Markdown format.
-
Source Type – Select the type of source (options depend on installed plugins, such as DCAT, ArcGIS, Socrata, or CKAN).
-
Update Frequency – How often CKAN should check for new or updated datasets.
-
Configuration – Additional settings provided by the plugin’s transmute schema (advanced use; usually plugin-specific).
Saving the configuration creates a new Harvesting Job.
A supervisord service runs inside the Podman container hosting the CKAN application. It periodically executes the following commands to process harvesting jobs:
ckan harvester fetch-consumer
ckan harvester gather-consumer