Skip to content

Introduction

The mc3d-source package hosts the code for the MC3D source pipeline: the workflow that turns raw CIF dumps from external crystal structure databases (COD, ICSD, MPDS) into the unique, curated set of structures that seed all MC3D calculations.

For the scientific context, see the MC3D website and the accompanying paper in Digital Discovery (Huber et al., 2026).

Pipeline

The following flowchart from our MC3D Figma board summarises the steps:

The MC3D-source pipeline

The package exposes a single CLI, mc3d-source, whose subcommands implement the stages in order:

  1. import — fetch raw CIFs from a source database into AiiDA.
  2. (Cleaning step.) The raw CifData are processed by CifCleanWorkChain runs, submitted in batch from the separate runner in pipeline/cif_clean/ (not part of the installed package).
  3. curate — from a group of completed CifCleanWorkChain, attach source/spacegroup/formula extras to the parsed StructureData and collect the clean ones into a curated group.
  4. update — when re-importing an updated version of a source database, reconcile the new curated set against the previous one.
  5. analyse — produce the per-source deprecation report (id_removed, structure_updated, incorrect_formula).
  6. uniq — deduplicate structures across sources via pymatgen's StructureMatcher, emit unique families as JSON.
  7. select — pick the final MC3D structures from the unique families, taking the previous MC3D set and the deprecation report into account.

Where to go next

  • Usage — worked examples of the stages above.
  • Topics — algorithmic notes and the data model (extras schema, source strings, deprecation lifecycle).
  • Developer guide — setup, pre-commit, tests, docs build.