Didomi can export daily batches of user consent data to provide you with a dump of the users, events, and proofs stored on the platform.
The Batch export is not yet compatible with a multi-regulation approach. At this moment in time, the user consent will be associated with GDPR in the historical data to which you have access.
Destinations
Didomi can send exported data to various locations:
Didomi offers the ability to retrieve information and monitor the status of your batch exports. Read more on logging.
Format
Data exported at the destination follows the Apache Hive naming conventions for partitioning and exposes 2 partitions:
Partition name
Description
export-id
Unique ID for the configured batch export. This allows you to distinguish files exported between multiple configured exports.
date
The date/hour when the export happened (YYYY-MM-DD format).
Example of a path to the exported data: /data/export-id={batch export ID}/date={YYYY-MM-DD}/
Users & Events
Structure
Didomi exports user-level data in a users/ folder and in files formatted as newline-delimited JSON (one JSON object per line), and compressed as GZIP. We use the file extension .json.gz.
For every partition, the exported data can be split up into multiple files. Make sure that you read every .json.gz file in the partition folder to get all the records belonging to that export. File names are subject to change over time so make sure that your read files from their extension (.json.gz) and not from a specific filename or format.
Additionally, a file named _SUCCESS is created once the export is complete. Do not read files from a partition until the _SUCCESS file is created as you might read partial records otherwise.
Content
The files exported by Didomi contain users and their events for the period covered by the export. Every line of the files contains a self-sufficient JSON record (newline-delimited JSON) that includes both the user and its associated events.
You should first split the files on new lines and then parse every line as an individual JSON record. It is a pretty standard encoding for data processing tools like Hive, Spark, Presto, etc. and should be straightforward to import in your existing tools.
The schema of the objects exported is as follows:
{/** * User object following the User schema * https://developers.didomi.io/api/consents/users#user-schema */"user": {... },/** * Array of event objects following the Event schema * https://developers.didomi.io/api/consents/events#event-schema */"events": [ { ... }, { ... },... ]}
For every export, the proofs file is split up into files of up to 4GB with the naming convention proofs.tar.gz.XY where XY is a number indicating the split number.
For instance, if the total size of the proofs is 12GB, three files will be generated (proofs.tar.gz.00, proofs.tar.gz.01, proofs.tar.gz.02).
To be able to extract the proofs, you need to join the different files together to generate the original proofs.tar.gz archive:
# Join all parts of the proofs archive togethercatproofs.tar.gz.*>proofs.tar.gz# Extract the proofstar-xvfproofs.tar.gz
Additionally, a file named _SUCCESS is created once the export is complete. Do not read files from a partition until that file is created as you might get partial records.
Content
The files exported by Didomi contain proofs for the period covered by the export. Every file is named after the proof ID that is present in the consent events.