Introduction
The mc3d-source package hosts the code for the MC3D source pipeline: the workflow that turns raw CIF dumps from external crystal structure databases (COD, ICSD, MPDS) into the unique, curated set of structures that seed all MC3D calculations.
For the scientific context, see the MC3D website and the accompanying paper in Digital Discovery (Huber et al., 2026).
Pipeline
The following flowchart from our MC3D Figma board summarises the steps:

The package exposes a single CLI, mc3d-source, whose subcommands implement the stages in order:
import— fetch raw CIFs from a source database into AiiDA.- (Cleaning step.) The raw
CifDataare processed byCifCleanWorkChainruns, submitted in batch from the separate runner inpipeline/cif_clean/(not part of the installed package). curate— from a group of completedCifCleanWorkChain, attach source/spacegroup/formula extras to the parsedStructureDataand collect the clean ones into a curated group.update— when re-importing an updated version of a source database, reconcile the new curated set against the previous one.analyse— produce the per-source deprecation report (id_removed,structure_updated,incorrect_formula).uniq— deduplicate structures across sources viapymatgen'sStructureMatcher, emit unique families as JSON.select— pick the final MC3D structures from the unique families, taking the previous MC3D set and the deprecation report into account.
Where to go next
- Usage — worked examples of the stages above.
- Topics — algorithmic notes and the data model (extras schema, source strings, deprecation lifecycle).
- Developer guide — setup, pre-commit, tests, docs build.