ilustrado package

Subpackages

Submodules

ilustrado.adapt module

This file contains a wrapper for mutation and crossover.

ilustrado.adapt.adapt(possible_parents, mutation_rate, crossover_rate, mutations=None, max_num_mutations=3, max_num_atoms=40, structure_filter=None, minsep_dict=None, debug=False)[source]

Take a list of possible parents and randomly adapt according to given mutation weightings.

Parameters
  • possible_parents (list(dict)) – list of all breeding stock,

  • mutation_rate (float) – rate of mutations relative to crossover,

  • crossover_rate (float) – see mutation_rate.

Keyword Arguments
  • mutations (list(str)) – list of desired mutations to choose from (as strings),

  • max_num_mutations (int) – rand(1, this) mutations will be performed,

  • max_num_atoms (int) – any structures with more than this many atoms will be filtered out.

  • structure_filter (callable(dict)) – custom filter to pass to check_feasible.

  • minsep_dict (dict) – dictionary containing element-specific minimum separations, e.g. {(‘K’, ‘K’): 2.5, (‘K’, ‘P’): 2.0}.

Returns

the mutated/newborn structure.

Return type

dict

ilustrado.adapt.check_feasible(mutant, parents, max_num_atoms, structure_filter=None, minsep_dict=None, debug=False)[source]

Check if a mutated/newly-born cell is “feasible”. Here, feasible means:

  • number density within 25% of pre-mutation/birth level,

  • no overlapping atoms, parameterised by minsep_dict,

  • cell angles between 50 and 130 degrees,

  • fewer than max_num_atoms in the cell,

  • ensure number of atomic types is maintained,

  • any custom filter is obeyed.

Parameters
  • mutant (dict) – matador doc containing new structure.

  • parents (list(dict)) – list of doc(s) containing parent structures.

  • max_num_atoms (int) – any structures with more than this many atoms will be filtered out.

Keyword Arguments
  • structure_filter (callable) – any function that takes a matador document and returns True or False.

  • minsep_dict (dict) – dictionary containing element-specific minimum separations, e.g. {(‘K’, ‘K’): 2.5, (‘K’, ‘P’): 2.0}.

Returns

True if structure is feasible, else False.

Return type

bool

ilustrado.adapt.minseps_feasible(mutant, minsep_dict=None, debug=False)[source]

Check if minimum separations between species of atom are satisfied by mutant.

Parameters
  • mutant (dict) – trial mutated structure

  • minsep_dict (dict) – dictionary containing element-specific minimum separations, e.g. {(‘K’, ‘K’): 2.5, (‘K’, ‘P’): 2.0}.

Returns

True if minseps are greater than desired value else False.

Return type

bool

ilustrado.analysis module

Some assorted analysis functions.

ilustrado.analysis.fitness_swarm_plot(generations, ax=None, save=False)[source]

Make a swarm plot of the fitness of all generations.

ilustrado.analysis.plot_new_2d_hull(generations, hull, points=True, label_hull=True, save=True)[source]

Add new structures to old ConvexHull plot.

ilustrado.crossover module

This file implements crossover functionality.

ilustrado.crossover.crossover(parents, method='random_slice', debug=False)[source]

Attempt to create a child structure from two parents structures.

Parameters
  • parents (list(dict)) – list of two parent structures

  • method (str) – currently only ‘random_slice’

Returns

newborn structure from parents.

Return type

dict

ilustrado.crossover.random_slice(parent_seeds, standardize=True, supercell=True, shift=True, debug=False)[source]

Simple cut-and-splice crossover of two parents.

The overall size of the child can vary between 0.5 and 1.5 the size of the parent structures. Both parent structures are cut and spliced along the same crystallographic axis.

Parameters
  • parents (list(dict)) – parent structures to crossover,

  • standardize (bool) – use spglib to standardize parents pre-crossover,

  • supercell (bool) – make a random supercell to rescale parents,

  • shift (bool) – randomly shift atoms in parents to unbias.

Returns

newborn structure from parents.

Return type

dict

ilustrado.fitness module

This file implements all notions of fitness.

class ilustrado.fitness.FitnessCalculator(fitness_metric='dummy', fitness_function=None, hull=None, sandbagging=False, debug=False)[source]

Bases: object

This class calculates the fitnesses of generations, by some global definition of generation-agnostic fitness.

Parameters
  • fitness_metric (str) – either ‘dummy’, ‘hull’ or ‘hull_test’.

  • fitness_function (callable) – function to operate on numpy array of raw fitness values,

  • hull (QueryConvexHull) – matador hull from which to calculate metastability,

  • sandbagging (bool) – whether or not to “sandbag” particular compositions, i.e. lower a structure’s fitness based on the number of nearby phases

evaluate(generation)[source]

Assign normalised fitnesses to an entire generation. Normalisation uses the logistic function such that

fitness = 1 - tanh(2*distance_from_hull),

Parameters

generation (Generation/list) – list/iterator over optimised structures,

update_sandbag_multipliers(generation, modifier=0.95)[source]

Assign composition penalty based on number of nearby structures. Updates fitness.sandbag_multipliers to a dictionary with chemical concentration as keys and values of fitness penalty.

Parameters

generation (Generation) – list of optimised structures.

apply_sandbag_multipliers(generation, locality=0.05)[source]

Scale the generation’s fitness by the sandbag modifier. This updates the ‘fitness’ key and the ‘modifier’ key (total scaling) of each document in the generation.

Parameters

generation (Generation) – list of optimised structures.

Keyword Arguments

locality (float) – tolerance by which two structures are “nearby”

ilustrado.fitness.default_fitness_function(raw, c=50, offset=0.075)[source]

Default fitness function: logistic function.

Parameters

raw (ndarray) – 1D array of raw fitness values.

Returns

1D array of rescaled fitnesses.

Return type

ndarray

ilustrado.generation module

This file implements the Generation class which is used to store each generation of structures, and to evaulate their fitness.

class ilustrado.generation.Generation(run_hash: str, generation_idx: int, num_survivors: int, num_accepted: int, populace=None, dumpfile=None, fitness_calculator=None)[source]

Bases: object

Stores each generation of structures.

Parameters
  • run_hash (str) – hash for this GA run,

  • generation_idx (int) – index of this generation,

  • num_survivors (int) – number of structures to aim for per generation,

  • num_accepted (int) – number to accept from this generation, i.e. excluding elites,

Keyword Arguments
  • populace (list(dict)) – initial structures to populate generation with (optional)

  • dumpfile (str) – dumpfile name for this generation (optional)

  • fitness_calculator (str) – fitness metric to use, e.g. ‘hull’.

dump(gen_suffix)[source]

Dump the current generation to JSON file.

Parameters

gen_suffix (str) – typically gen<gen_number>.

dump_bourgeoisie(gen_suffix)[source]

Dump the current generation’s bourgeoisie to JSON file.

Parameters

gen_suffix (str) – typically gen<gen_number>.

load(gen_fname)[source]

Load populace of the generation from a JSON dump.

Parameters

gen_fname (str) – filename to load.

load_bourgeoisie(bourge_fname)[source]

Load bourgeoisie of the generation from a JSON dump.

Parameters

bourge_fname (str) – filename to load.

birth(populum: dict)[source]

Add a structure to the populace.

Parameters

populum (dict) – structure to add.

rank()[source]

Evaluate the fitness of all structures in the generation.

clean()[source]

Remove structures with pathological formation enthalpies.

Returns

number of pathological structures removed.

Return type

num_removed (int)

set_bourgeoisie(elites=None, best_from_stoich=True)[source]

Set the structures that will continue to the next generation, i.e. the bourgeoisie.

Keyword Arguments
  • list (elites) – list of elite structures to include from the previous generation,

  • best_from_stoich (bool) – whether to include one structure from each stoichiometry.

calc_pdfs()[source]

Compute PDFs for each structure in the generation.

is_dupe(doc, sim_tol=0.05, extra_pdfs=None)[source]

Compare doc with all other structures at same stoichiometry via PDF overlap.

Parameters

doc (dict) – structure to compare.

Keyword Arguments
  • sim_tol (float) – similarity tolerance to compare to

  • extra_pdfs (list(dict)) – list of structures with extra pdfs to compare against

property pdfs

Returns list of PDFs for generation, calculating if necessary.

property fitnesses

Return list of normalised fitnesses for population.

property raw_fitnesses

Return list of raw fitnesses for population.

property average_pleb_fitness

Return the average normalised fitness of the generation.

property average_bourgeois_fitness

Return the average normalised fitness of the bourgeoisie.

ilustrado.ilustrado module

This file implements the GA algorithm and acts as main().

class ilustrado.ilustrado.ArtificialSelector(**kwargs)[source]

Bases: object

ArtificialSelector takes an initial gene pool and applies a genetic algorithm to optimise some fitness function.

Keyword Arguments
  • gene_pool (list(dict)) – initial cursor to use as “Generation 0”,

  • seed (str) – seed name of cell and param files for CASTEP,

  • seed_prefix (str) – if not specifying a seed, this name will prefix all runs

  • fitness_metric (str) – currently either ‘hull’ or ‘test’,

  • hull (QueryConvexHull) – matador QueryConvexHull object to calculate distances,

  • res_path (str) – path to folder of res files to create hull, if no hull object passed

  • mutation_rate (float) – rate at which to perform single-parent mutations (DEFAULT: 0.5)

  • crossover_rate (float) – rate at which to perform crossovers (DEFAULT: 0.5)

  • num_generations (int) – number of generations to breed before quitting (DEFAULT: 5)

  • num_survivors (int) – number of structures to survive to next generation for breeding (DEFAULT: 10)

  • population (int) – number of structures to breed in any given generation (DEFAULT: 25)

  • failure_ratio (int) – maximum number of attempts per success (DEFAULT: 5)

  • elitism (float) – fraction of next generation to be comprised of elite structures from previous generation (DEFAULT: 0.2)

  • best_from_stoich (bool) – whether to always include the best structure from a stoichiomtery in the next generation,

  • mutations (list(str)) – list of mutation names to use,

  • structure_filter (fn(doc)) – any function that takes a matador doc and returns True or False,

  • check_dupes (bool) – if True, filter relaxed structures for uniqueness on-the-fly (DEFAULT: True)

  • check_dupes_hull (bool) – compare pdf with all hull structures (DEFAULT: True)

  • sandbagging (bool) – whether or not to disfavour nearby compositions (DEFAULT: False)

  • minsep_dict (dict) – dictionary containing element-specific minimum separations, e.g. {(‘K’, ‘K’): 2.5, (‘K’, ‘P’): 2.0}. These should only be set such that atoms do not overlap; let the DFT deal with bond lengths. No effort is made to push apart atoms that are too close, the trial will simply be discarded. (DEFAULT: None)

  • max_num_mutations (int) – maximum number of mutations to perform on a single structure,

  • max_num_atoms (int) – most atoms allowed in a structure post-mutation/crossover,

  • nodes (list(str)) – list of node names to run on,

  • ncores (int or list(int)) – specifies the number of cores used by listed nodes per thread,

  • nprocs (int) – total number of processes,

  • recover_from (str) – recover from previous run_hash, by default ilustrado will recover if it finds only one run hash in the folder

  • load_only (bool) – only load structures, do not continue breeding (DEFAULT: False)

  • executable (str) – path to DFT binary (DEFAULT: castep)

  • compute_mode (str) – either direct, slurm, manual (DEFAULT: direct)

  • max_num_nodes (int) – amount of array jobs to run per generation in slurm mode,

  • walltime_hrs (int) – maximum walltime for a SLURM array job,

  • slurm_template (str) – path to template slurm script that includes module loads etc,

  • entrypoint (str) – path to script that initialised this object, such that it can be called by SLURM

  • debug (bool) – maximum printing level

  • testing (bool) – run test code only if true

  • verbosity (int) – extra printing level,

  • loglevel (str) – follows std library logging levels.

start()[source]

Start running GA.

breed_generation()[source]

Build next generation from mutations/crossover of current and perform relaxations if necessary.

write_unrelaxed_generation()[source]

Perform mutations and write res files for the resulting structures. Additionally, dump an unrelaxed json file.

batch_birth()[source]

Assess whether a generation has been relaxed already. This is done by checking for the existence of a file called <run_hash>-genunrelaxed.json.

If so, match the relaxations up with the cached unrelaxed structures and rank them ready for the next generation.

If not, create a new generation of structures, dump the unrelaxed structures to file, create the jobscripts to relax them, submit them and the job to check up on the relaxations, then exit.

slurm_submit_relaxations_and_monitor()[source]

Prepare and submit the appropriate slurm files.

continuous_birth()[source]

Create new generation and relax “as they come”, filling the compute resources allocated.

enforce_elitism()[source]

Add elite structures from previous generations to bourgeoisie of current generation, through the merit of their ancestors alone.

reset_and_dump()[source]

Add now complete generation to generation list, reset the next_gen variable and write dump files.

birth_new_structure()[source]

Generate a new structure from current settings.

Returns

newborn structure to be optimised

Return type

dict

scrape_result(result, proc=None, newborns=None)[source]

Check process for result and scrape into self.next_gen if successful, with duplicate detection if desired. If the optional arguments are provided, extra logging info will be found when running in direct mode.

Parameters

result (dict) – containing output from process

Keyword Arguments
  • proc (tuple) – standard process tuple from above,

  • newborns (list) – of new structures to append result to.

kill_all(procs)[source]

Loop over processes and kill them all.

Parameters

procs (list) – list of NewbornProcess in form documented above.

recover()[source]

Attempt to recover previous generations from files in cwd named ‘<run_hash>_gen{}.json’.format(gen_idx).

seed_generation_0(gene_pool)[source]

Set up first generation from gene pool.

Parameters

gene_pool (list(dict)) – list of structure with which to seed generation.

is_newborn_dupe(newborn, extra_pdfs=None)[source]

Check each generation for a duplicate structure to the current newborn, using PDF calculator from matador.

Parameters

newborn (dict) – new structure to screen against the existing,

Keyword Arguments

extra_pdfs (list(dict)) – any extra PDFs to compare to, e.g. other hull structures not used to seed any generation

Returns

True if duplicate, else False.

Return type

bool

finalise_files_for_export()[source]

Move unique structures from gen1 onwards to folder “<run_hash>-results”.

ilustrado.mutate module

This file implements all possible single mutant mutations.

ilustrado.mutate.mutate(parent, mutations=None, max_num_mutations=2, debug=False)[source]

Wrap _mutate to check for null/invalid mutations.

Parameters

parent (dict) – parent structure to mutate,

Keyword Arguments
  • mutations (list(fn)) – list of possible mutation functions to apply,

  • max_num_mutations (int) – maximum number of mutations to apply.

ilustrado.mutate.permute_atoms(mutant, debug=False)[source]

Swap the positions of random pairs of atoms.

Parameters

mutant (dict) – structure to mutate in-place.

Raises

RuntimeError – if only one type of atom is present.

ilustrado.mutate.transmute_atoms(mutant, debug=False)[source]

Transmute one atom for another type in the cell.

Parameters

mutant (dict) – structure to mutate in-place.

Raises

RuntimeError – if only one type of atom is present.

ilustrado.mutate.vacancy(mutant, debug=False)[source]

Remove a random atom from the structure.

Parameters

mutant (dict) – structure to mutate in-place.

ilustrado.mutate.voronoi_shuffle(mutant, element_to_remove=None, preserve_stoich=False, debug=False, testing=False)[source]

Remove all atoms of type element, then perform Voronoi analysis on the remaining sublattice. Cluster the nodes with KMeans, then repopulate the clustered Voronoi nodes with atoms of the removed element.

Parameters

mutant (dict) – structure to mutate in-place.

Keyword Arguments
  • element_to_remove (str) – symbol of element to remove,

  • preserve_stoich (bool) – whether to always reinsert the same number of atoms.

  • testing (bool) – write a cell at each step, with H atoms indicating Voronoi nodes.

Raises

RuntimeError – if unable to perform Voronoi shuffle.

ilustrado.mutate.random_strain(mutant, debug=False)[source]

Apply random strain tensor to unit cell from 6 epsilon_i components with values between -1 and 1. The cell is then scaled to the parent’s volume.

Parameters

mutant (dict) – structure to mutate in-place.

ilustrado.mutate.nudge_positions(mutant, amplitude=0.5, debug=False)[source]

Apply Gaussian noise to all atomic positions.

Parameters

mutant (dict) – structure to mutate in-place.

Keyword Arguments

amplitude (float) – amplitude of random noise in Angstroms.

ilustrado.mutate.null_nudge_positions(mutant, debug=False)[source]

Apply minimal Gaussian noise to all atomic positions, mostly for testing purposes.

Parameters

mutant (dict) – structure to mutate in-place.

ilustrado.util module

Catch-all file for utility functions.

ilustrado.util.strip_useless(doc, to_run=False)[source]

Strip useless information from a matador doc.

Parameters
  • doc (dict) – structure to strip information from.

  • to_run (bool) – whether the structure needs to be rerun, i.e. whether to delete data from previous run.

Returns

matador document stripped of useless keys

Return type

dict

class ilustrado.util.FakeComputeTask(*args, **kwargs)[source]

Bases: matador.compute.compute.ComputeTask

Fake Relaxer for testing, with same parameters as the real one from matador.compute.

relax()[source]

Alias for backwards-compatibility.

class ilustrado.util.NewbornProcess(newborn_id, node, process, ncores=None)[source]

Bases: object

Simple container of process data.

class ilustrado.util.AseRelaxation(doc, queue, calculator=None)[source]

Bases: object

Perform a variable cell relaxation with ASE, using a predefined calculator.

relax()[source]