Methods reference

Methods reference

GenomicBreeding.jl

GenomicBreeding.GBInput
Base.:==
Base.hash
GenomicBreeding.checkinputs
GenomicBreeding.cv
GenomicBreeding.fit
GenomicBreeding.gwas
GenomicBreeding.loadcvs
GenomicBreeding.loadfits
GenomicBreeding.loadgenomesphenomes
GenomicBreeding.plot
GenomicBreeding.predict
GenomicBreeding.prepareinputs
GenomicBreeding.prepareoutprefixandoutdir
GenomicBreeding.submitslurmarrayjobs
GenomicBreedingCore.checkdims
GenomicBreedingCore.clone

GenomicBreeding.GBInput — Type

mutable struct GBInput

Main input struct for genomic breeding analysis (implements GenomicBreedingCore.AbstractGB)

Fields

Required

fname_geno: Path to genotype file (see file format guide for details)

Optional Data Files

fname_pheno: Path to phenotype file (Default: "" - see file format guide for details)
fname_allele_effects_jld2s: Vector of paths to JLD2 files containing Fit structs (Default: [""])

Analysis Settings

analysis: Analysis function to perform (Default: cv)
- cv: Replicated k-fold cross-validation
- fit: Fit genomic prediction models to extract allele effects
- predict: Compute GEBVs using existing model fits
- gwas: Genome-wide association study
bulk_cv: Perform cross-validation across all populations (Default: false)
populations: Vector of populations to include (Default: all)
traits: Vector of traits to analyze (Default: all)
models: Vector of genomic prediction model functions (Default: [ridge, bayesa])
gwas_models: Vector of GWAS model functions (Default: [gwasols, gwaslmm])

Cross-Validation Parameters

n_folds: Number of CV folds (Default: 5)
n_replications: Number of CV replications (Default: 5)

Filtering Parameters

keep_all: Keep all entries when merging data (Default: false)
maf: Minimum allele frequency (Default: 0.05)
mtv: Minimum trait variance (Default: 1e-7)

Model Parameters

n_iter: MCMC iterations (Default: 1_500)
n_burnin: MCMC burn-in iterations (Default: 500)

Output Settings

fname_out_prefix: Output file prefix & directory (Default: GBOutput/output-<timestamp>-)
verbose: Show progress messages (Default: true)

SLURM Settings

SLURM_job_name: Job array name (Default: GBJob-<timestamp>)
SLURM_account_name: Account name (Default: "")
SLURM_partition_name: Partition to use (Default: "")
SLURM_nodes_per_array_job: Nodes per job (Default: 1)
SLURM_tasks_per_node: Tasks per node (Default: 1)
SLURM_cpus_per_task: CPUs per task (Default: 1)
SLURM_mem_G: Memory in GB (Default: 1)
SLURM_time_limit_dd_hhmmss: Time limit as "dd-hh:mm:ss" (Default: "00-01:00:00")
SLURM_max_array_jobs_running: Max concurrent array jobs (Default: 20)
SLURM_module_load_Conda_version_name: Conda module name (Default: "Miniconda3")
SLURM_module_load_R_version_name: R module name (Default: "conda" which will use the R installed in the conda environment - see installation instructions for details)
SLURM_module_load_Julia_version_name: Julia module name (Default: "" which will use the Julia installed via JuliaUp - see installation instructions for details)

source

Base.:== — Method

Base.:(==)(x::GBInput, y::GBInput)::Bool

Compare two GBInput structs for equality by comparing their hash values.

This method overloads the == operator for GBInput structs, allowing direct comparison using the == operator. Two GBInput structs are considered equal if they have identical hash values, which implies they have the same values for all relevant fields.

Arguments

x::GBInput: First GBInput struct to compare
y::GBInput: Second GBInput struct to compare

Returns

Bool: true if the hash values of both structs are equal, false otherwise

Examples

julia> input_1 = input = GBInput(fname_geno="geno1.jld2", fname_pheno="pheno1.jld2", fname_out_prefix="test1-", SLURM_job_name="slurmjob1");

julia> input_2 = input = GBInput(fname_geno="geno1.jld2", fname_pheno="pheno1.jld2", fname_out_prefix="test1-", SLURM_job_name="slurmjob1");

julia> input_3 = input = GBInput(fname_geno="geno2.jld2", fname_pheno="pheno2.jld2");

julia> input_1 == input_2
true

julia> input_1 == input_3
false

source

Base.hash — Method

Base.hash(x::GBInput, h::UInt)::UInt

Compute a hash value for a GBInput struct by combining the hash values of all its fields.

Arguments

x::GBInput: The input structure to be hashed
h::UInt: The hash value seed

Returns

UInt: A hash value that uniquely identifies the content of the GBInput struct

Details

This method implements hash computation for the GBInput type by iterating through all fields and combining their hash values.

Examples

julia> input = GBInput(fname_geno="", fname_pheno="");

julia> typeof(hash(input))
UInt64

source

GenomicBreeding.checkinputs — Method

checkinputs(input::GBInput)::Vector{String}

Check the compatibility and validity of inputs for genomic analysis.

Returns a vector of error messages. An empty vector indicates all inputs are valid.

Arguments

input::GBInput: A struct containing analysis parameters including:
- analysis: Analysis type (cv, fit, predict, or gwas)
- fname_geno: Path to genotype file
- fname_pheno: Path to phenotype file
- models: Vector of selected models
- fname_allele_effects_jld2s: Paths to allele effects files (for predict)

Returns

Vector{String}: Collection of error messages, empty if all inputs are valid

Validation Rules

Analysis type must be one of: cv, fit, predict, or gwas
For cv, fit, and gwas:
- Genotype file must exist
- Phenotype file must exist
- At least one model must be specified
For predict:
- Genotype file must exist
- All specified allele effects files must exist

Examples

julia> genomes = simulategenomes(n=300, n_populations=3, verbose=false);

julia> trials, _ = simulatetrials(genomes=genomes, n_years=1, n_seasons=1, n_harvests=1, n_sites=1, n_replications=1, verbose=false);

julia> phenomes = extractphenomes(trials);

julia> fname_geno = try writedelimited(genomes, fname="test-geno.tsv"); catch; rm("test-geno.tsv"); writedelimited(genomes, fname="test-geno.tsv"); end;
    
julia> fname_pheno = try writedelimited(phenomes, fname="test-pheno.tsv"); catch; rm("test-pheno.tsv"); writedelimited(phenomes, fname="test-pheno.tsv"); end;

julia> input = GBInput(fname_geno=fname_geno, fname_pheno=fname_pheno, analysis=cv);

julia> length(checkinputs(input)) == 0
true

julia> input.fname_pheno = ""; length(checkinputs(input)) == 0
false

source

GenomicBreeding.cv — Method

cv(input::GBInput)::Tuple{Vector{String},Vector{String}}

Assess genomic prediction accuracy via replicated k-fold cross-validation.

Arguments

input::GBInput: A GBInput struct containing configuration parameters including:
- bulk_cv: Boolean flag for bulk cross-validation
- populations: Vector of population names to analyze
- models: Statistical models to use for prediction
- n_folds: Number of folds for cross-validation
- n_replications: Number of replications for cross-validation
- fname_out_prefix: Prefix for output filenames
- verbose: Boolean flag for detailed output

Returns

Tuple{Vector{String},Vector{String}}: A tuple containing:
- First element: Vector of paths to JLD2 files containing CV results
- Second element: Vector of paths to text files containing error notes

Details

The function supports three types of cross-validation:

Single population CV when one population is specified
Pairwise population CV when two populations are specified
Both pairwise and leave-one-population-out CV when more than two populations are specified

Results are saved as JLD2 files (one per fold, replication, and trait) and optional text files containing notes about failed jobs.

Example

julia> genomes = GenomicBreedingCore.simulategenomes(n=300, l=1_000, verbose=false); genomes.populations = StatsBase.sample(string.("pop_", 1:3), length(genomes.entries), replace=true);

julia> proportion_of_variance = fill(0.0, 9, 3); proportion_of_variance[1, :] .= 1.00; # 100% variance on the additive genetic effects

julia> trials, _ = GenomicBreedingCore.simulatetrials(genomes=genomes, n_years=1, n_seasons=1, n_harvests=1, n_sites=1, n_replications=1, proportion_of_variance=proportion_of_variance, verbose=false);

julia> phenomes = extractphenomes(trials);

julia> fname_geno = try writedelimited(genomes, fname="test-geno.tsv"); catch; rm("test-geno.tsv"); writedelimited(genomes, fname="test-geno.tsv"); end;

julia> fname_pheno = try writedelimited(phenomes, fname="test-pheno.tsv"); catch; rm("test-pheno.tsv"); writedelimited(phenomes, fname="test-pheno.tsv"); end;

julia> input = GBInput(fname_geno=fname_geno, fname_pheno=fname_pheno, analysis=cv, bulk_cv=false, fname_out_prefix="GBOutput_cv_3/output-3-", populations=["pop_1", "pop_3"], traits=["trait_1"], n_replications=2, n_folds=3, verbose=false);

julia> fnames_cvs, fnames_notes = cv(input);

julia> length(fnames_cvs) == 4, length(fnames_notes) == 0
(true, true)

source

GenomicBreeding.fit — Method

fit(input::GBInput)::Vector{String}

Extract allele effects by fitting genomic prediction models without cross-validation.

Arguments

input::GBInput: A GBInput struct containing:
- fname_geno: Path to genotype data file
- fname_pheno: Path to phenotype data file
- models: Vector of model functions to fit (e.g., [bayesa, bayesb])
- traits: Optional vector of trait names to analyze
- populations: Optional vector of population names to analyze
- n_iter: Number of iterations for Bayesian models
- n_burnin: Number of burn-in iterations for Bayesian models
- verbose: Boolean for detailed output

Returns

Vector{String}: Paths to JLD2 files containing fitted model results, one file per model-trait-population combination

Details

The function fits specified genomic prediction models to the full dataset without cross-validation. For each combination of model, trait, and population, it:

Loads and processes genotype and phenotype data
Fits the specified model
Saves results to a JLD2 file with naming pattern: {prefix}_model_{name}-trait_{name}-population_{name}.jld2

Notes

If populations is not specified, all unique populations in phenotype data are used
If traits is not specified, all unique traits in phenotype data are used
For Bayesian models (names containing "bayes"), uses specified iterations and burn-in
Will throw an error if output files already exist

Example

julia> genomes = GenomicBreedingCore.simulategenomes(n=300, l=1_000, verbose=false); genomes.populations = StatsBase.sample(string.("pop_", 1:3), length(genomes.entries), replace=true);

julia> proportion_of_variance = fill(0.0, 9, 3); proportion_of_variance[1, :] .= 1.00; # 100% variance on the additive genetic effects

julia> trials, _ = GenomicBreedingCore.simulatetrials(genomes=genomes, n_years=1, n_seasons=1, n_harvests=1, n_sites=1, n_replications=1, proportion_of_variance=proportion_of_variance, verbose=false);

julia> phenomes = extractphenomes(trials);

julia> fname_geno = try writedelimited(genomes, fname="test-geno.tsv"); catch; rm("test-geno.tsv"); writedelimited(genomes, fname="test-geno.tsv"); end;

julia> fname_pheno = try writedelimited(phenomes, fname="test-pheno.tsv"); catch; rm("test-pheno.tsv"); writedelimited(phenomes, fname="test-pheno.tsv"); end;

julia> input = GBInput(fname_geno=fname_geno, fname_pheno=fname_pheno, analysis=GenomicBreeding.fit, fname_out_prefix="GBOutput_fit_4/output-4-", populations=["pop_1", "pop_3"], traits=["trait_1"], models=[bayesa, bayesb], verbose=false);

julia> fname_allele_effects_jld2s = GenomicBreeding.fit(input);

julia> length(fname_allele_effects_jld2s) == 4
true

source

GenomicBreeding.gwas — Method

gwas(input::GBInput)::Vector{String}

Perform genome-wide association study (GWAS) analysis on genomic and phenotypic data.

Arguments

input::GBInput: A GBInput struct containing:
- fname_geno: Path to genotype data file
- fname_pheno: Path to phenotype data file
- gwas_models: Vector of GWAS models to apply
- traits: Optional vector of trait names to analyze
- populations: Optional vector of population names to analyze
- verbose: Boolean flag for detailed output

Returns

Vector{String}: Paths to generated JLD2 files containing Fit structs for each model-trait-population combination

Details

The function performs GWAS analysis for each combination of:

GWAS models specified in input
Traits found in phenotype data
Populations specified (if none specified, analyzes all data together)

For each combination, it:

Filters data for the specific population if specified
Fits the GWAS model
Saves results to a JLD2 file containing a Fit struct

The Fit struct's b_hat field contains:

t-statistics for gwasols model
z-statistics for other GWAS models

Notes

Output files are named as: <prefix>_model_<model>-trait_<trait>-population_<pop>.jld2
Will throw an error if output files already exist
When verbose=true, displays a progress bar and correlation heatmap of estimated allele effects

Example

julia> genomes = GenomicBreedingCore.simulategenomes(n=300, l=1_000, verbose=false); genomes.populations = StatsBase.sample(string.("pop_", 1:3), length(genomes.entries), replace=true);

julia> proportion_of_variance = fill(0.0, 9, 3); proportion_of_variance[1, :] .= 1.00; # 100% variance on the additive genetic effects

julia> trials, _ = GenomicBreedingCore.simulatetrials(genomes=genomes, n_years=1, n_seasons=1, n_harvests=1, n_sites=1, n_replications=1, proportion_of_variance=proportion_of_variance, verbose=false);

julia> phenomes = extractphenomes(trials);

julia> fname_geno = try writedelimited(genomes, fname="test-geno.tsv"); catch; rm("test-geno.tsv"); writedelimited(genomes, fname="test-geno.tsv"); end;

julia> fname_pheno = try writedelimited(phenomes, fname="test-pheno.tsv"); catch; rm("test-pheno.tsv"); writedelimited(phenomes, fname="test-pheno.tsv"); end;

julia> input = GBInput(fname_geno=fname_geno, fname_pheno=fname_pheno, gwas_models=[gwasols], verbose=false);

julia> fname_test_statistics_jld2s = GenomicBreeding.gwas(input);

julia> length(fname_test_statistics_jld2s) == 3
true

source

GenomicBreeding.loadcvs — Method

loadcvs(input::GBInput; min_train_size::Int64=10, verbose::Bool=true)::Vector{CV}

Load and filter cross-validation (CV) results from files generated by GenomicBreeding.cv().

Arguments

input::GBInput: Input configuration containing the output directory path in fname_out_prefix
min_train_size::Int64=10: Minimum required size for training sets (default: 10)
verbose::Bool=true: Whether to display a progress meter during loading (default: true)

Returns

Vector{CV}: Vector of valid CV objects that meet the following criteria:
- Contains complete metrics (length of fit.metrics == 9)
- Has valid correlation metric (not missing, NaN, or Inf)
- Validation set size ≥ mintrainsize

Throws

ArgumentError: If the output directory doesn't exist or contains no CV results

Details

The function searches for files with pattern "-cv-" and extension ".jld2" in the output directory. Invalid or failed CV results are automatically filtered out during loading. A progress meter is displayed during loading when verbose is true.

Example

julia> genomes = GenomicBreedingCore.simulategenomes(n=300, l=1_000, verbose=false); genomes.populations = StatsBase.sample(string.("pop_", 1:3), length(genomes.entries), replace=true);

julia> proportion_of_variance = fill(0.0, 9, 3); proportion_of_variance[1, :] .= 1.00; # 100% variance on the additive genetic effects

julia> trials, _ = GenomicBreedingCore.simulatetrials(genomes=genomes, n_years=1, n_seasons=1, n_harvests=1, n_sites=1, n_replications=1, proportion_of_variance=proportion_of_variance, verbose=false);

julia> phenomes = extractphenomes(trials);

julia> fname_geno = try writedelimited(genomes, fname="test-geno.tsv"); catch; rm("test-geno.tsv"); writedelimited(genomes, fname="test-geno.tsv"); end;

julia> fname_pheno = try writedelimited(phenomes, fname="test-pheno.tsv"); catch; rm("test-pheno.tsv"); writedelimited(phenomes, fname="test-pheno.tsv"); end;

julia> input = GBInput(fname_geno=fname_geno, fname_pheno=fname_pheno, fname_out_prefix="GBOutput_cv_1/output-1-", populations=["pop_1", "pop_3"], traits=["trait_1"], n_replications=2, n_folds=3, verbose=false);

julia> fnames_cvs, fnames_notes = cv(input);

julia> cvs = loadcvs(input, verbose=false);

julia> length(cvs) == length(fnames_cvs)
true

source

GenomicBreeding.loadfits — Method

loadfits(input::GBInput)::Vector{Fit}

Load fitted allele frequency effects from genomic prediction models stored in JLD2 files.

Arguments

input::GBInput: Input configuration containing paths to JLD2 files with fitted allele effects.

Returns

Vector{Fit}: Array of Fit objects containing the loaded allele frequency effects.

Details

The function attempts to load all JLD2 files specified in input.fname_allele_effects_jld2s. If a file cannot be loaded, that entry will be skipped and remain undef in the output vector.

Example

julia> genomes = GenomicBreedingCore.simulategenomes(n=300, l=1_000, verbose=false); genomes.populations = StatsBase.sample(string.("pop_", 1:3), length(genomes.entries), replace=true);

julia> proportion_of_variance = fill(0.0, 9, 3); proportion_of_variance[1, :] .= 1.00; # 100% variance on the additive genetic effects

julia> trials, _ = GenomicBreedingCore.simulatetrials(genomes=genomes, n_years=1, n_seasons=1, n_harvests=1, n_sites=1, n_replications=1, proportion_of_variance=proportion_of_variance, verbose=false);

julia> phenomes = extractphenomes(trials);

julia> fname_geno = try writedelimited(genomes, fname="test-geno.tsv"); catch; rm("test-geno.tsv"); writedelimited(genomes, fname="test-geno.tsv"); end;

julia> fname_pheno = try writedelimited(phenomes, fname="test-pheno.tsv"); catch; rm("test-pheno.tsv"); writedelimited(phenomes, fname="test-pheno.tsv"); end;

julia> input = GBInput(fname_geno=fname_geno, fname_pheno=fname_pheno, fname_out_prefix="GBOutput_fit_2/output-2-", populations=["pop_1", "pop_3"], traits=["trait_1"], models=[bayesa, bayesb], verbose=false);

julia> input.fname_allele_effects_jld2s = GenomicBreeding.fit(input);

julia> fits = loadfits(input);

julia> length(fits) == length(input.fname_allele_effects_jld2s)
true

source

GenomicBreeding.loadgenomesphenomes — Method

loadgenomesphenomes(input::GBInput)::Tuple{Genomes, Phenomes, Vector{String}, Vector{String}}

Load, merge, and filter genotype and phenotype data from specified input files.

Arguments

input::GBInput: A struct containing input parameters including:
- fname_geno: Path to genotype data file
- fname_pheno: Path to phenotype data file
- bulk_cv: Boolean for bulk cross-validation
- populations: Vector of population names to include (optional)
- traits: Vector of trait names to include (optional)
- n_folds: Number of cross-validation folds
- keep_all: Boolean to keep all entries during merging
- maf: Minimum allele frequency threshold
- mtv: Minimum trait variance threshold
- verbose: Boolean for detailed output

Returns

A tuple containing:

Genomes: Filtered genomic data
Phenomes: Filtered phenotypic data
Vector{String}: Traits to skip due to insufficient data for cross-validation
Vector{String}: Populations to skip due to insufficient data for cross-validation

Notes

Supports multiple file formats (string-delimited, JLD2, VCF)
Performs data validation and compatibility checks
Filters traits with variance below minimum trait variance (mtv) threshold
Ensures sufficient sample size for cross-validation
Filters markers based on minimum allele frequency (maf)

Examples

julia> genomes = simulategenomes(n=300, n_populations=3, verbose=false);

julia> trials, _ = simulatetrials(genomes=genomes, n_years=1, n_seasons=1, n_harvests=1, n_sites=1, n_replications=1, verbose=false);

julia> phenomes = extractphenomes(trials);

julia> fname_geno = try writedelimited(genomes, fname="test-geno.tsv"); catch; rm("test-geno.tsv"); writedelimited(genomes, fname="test-geno.tsv"); end;
    
julia> fname_pheno = try writedelimited(phenomes, fname="test-pheno.tsv"); catch; rm("test-pheno.tsv"); writedelimited(phenomes, fname="test-pheno.tsv"); end;

julia> input = GBInput(fname_geno=fname_geno, fname_pheno=fname_pheno, populations=["pop_1", "pop_3"], traits=["trait_1"], verbose=false);

julia> genomes, phenomes, traits_to_skip, populations_to_skip = loadgenomesphenomes(input);

julia> length(unique(genomes.populations)) == length(unique(phenomes.populations)) == 2
true

julia> length(phenomes.traits) == 1
true

julia> rm.([fname_geno, fname_pheno]);

source

GenomicBreeding.plot — Method

plot(;
    input::GBInput,
    skip_genomes::Bool = false,
    skip_phenomes::Bool = false,
    skip_cvs::Bool = false,
    format::String = "svg",
    plot_size::Tuple{Int64,Int64} = (600, 450),
    overwrite::Bool = false
)::String

Generate and save visualization plots for genomic, phenomic, and cross-validation data.

Arguments

input::GBInput: Input configuration containing file paths and settings
skip_genomes::Bool: If true, skip generating genome-related plots
skip_phenomes::Bool: If true, skip generating phenome-related plots
skip_cvs::Bool: If true, skip generating cross-validation plots
format::String: Output file format for plots (e.g., "svg", "png")
plot_size::Tuple{Int64,Int64}: Dimensions of output plots in pixels (width, height)
overwrite::Bool: If true, overwrite existing plot files

Returns

String: Path to the output directory containing generated plots

Plot Types

Genomes and Phenomes

Distribution plots
Violin plots
Correlation heatmaps
Tree plots
PCA biplots

Cross-validation

Bar plots
Box plots

Output Structure

Creates a directory structure under input.fname_out_prefix/plots/ with subdirectories:

genomes/: Genome-related visualizations
phenomes/: Phenome-related visualizations
cvs/: Cross-validation visualizations

Example

julia> genomes = GenomicBreedingCore.simulategenomes(n=300, l=1_000, verbose=false); genomes.populations = StatsBase.sample(string.("pop_", 1:3), length(genomes.entries), replace=true);

julia> trials, _ = GenomicBreedingCore.simulatetrials(genomes=genomes, n_years=1, n_seasons=1, n_harvests=1, n_sites=1, n_replications=1, verbose=false);

julia> phenomes = extractphenomes(trials);

julia> fname_geno = try writedelimited(genomes, fname="test-geno.tsv"); catch; rm("test-geno.tsv"); writedelimited(genomes, fname="test-geno.tsv"); end;

julia> fname_pheno = try writedelimited(phenomes, fname="test-pheno.tsv"); catch; rm("test-pheno.tsv"); writedelimited(phenomes, fname="test-pheno.tsv"); end;
    
julia> input = GBInput(fname_geno=fname_geno, fname_pheno=fname_pheno, SLURM_cpus_per_task=6, SLURM_mem_G=5, fname_out_prefix="GBOutput/test-", verbose=false);

julia> GenomicBreeding.plot(input=input, format="png", plot_size = (700, 525))
"GBOutput/plots"

julia> GenomicBreeding.plot(input=input, format="png", plot_size = (700, 525), overwrite=true, skip_genomes=true)
"GBOutput/plots"

source

GenomicBreeding.predict — Method

predict(input::GBInput)::String

Predict trait values (GEBVs) for a set of genotypes using pre-trained models from GenomicBreeding.fit().

Arguments

input::GBInput: Input configuration containing:
- fname_geno: Path to genotype data file
- fname_allele_effects_jld2s: Vector of paths to saved model files from previous fit() calls
- fname_out_prefix: Prefix for output files
- analysis: Set to GenomicBreeding.predict

Returns

String: Path to the JLD2 file containing predicted phenotypes

Details

Takes a GBInput object with genotype data and pre-trained models to predict trait values for new individuals. The function:

Loads genotype data and model parameters
Predicts trait values using the loaded models
Saves predictions in a Phenomes struct
Returns the path to the saved predictions file

Output File Format

For single trait prediction: {prefix}{model_name}-predicted_phenomes.jld2
For multi-trait prediction: {prefix}{hash}-predicted_phenomes.jld2

Example

julia> genomes = GenomicBreedingCore.simulategenomes(n=300, l=1_000, verbose=false); genomes.populations = StatsBase.sample(string.("pop_", 1:3), length(genomes.entries), replace=true);

julia> proportion_of_variance = fill(0.0, 9, 3); proportion_of_variance[1, :] .= 1.00; # 100% variance on the additive genetic effects

julia> trials, _ = GenomicBreedingCore.simulatetrials(genomes=genomes, n_years=1, n_seasons=1, n_harvests=1, n_sites=1, n_replications=1, proportion_of_variance=proportion_of_variance, verbose=false);

julia> phenomes = extractphenomes(trials);

julia> fname_geno = try writedelimited(genomes, fname="test-geno.tsv"); catch; rm("test-geno.tsv"); writedelimited(genomes, fname="test-geno.tsv"); end;

julia> fname_pheno = try writedelimited(phenomes, fname="test-pheno.tsv"); catch; rm("test-pheno.tsv"); writedelimited(phenomes, fname="test-pheno.tsv"); end;

julia> input = GBInput(fname_geno=fname_geno, fname_pheno=fname_pheno, fname_out_prefix="GBOutput_predict_5/output-5-", analysis=GenomicBreeding.fit, verbose=false);

julia> input.fname_allele_effects_jld2s = GenomicBreeding.fit(input);

julia> input.analysis = GenomicBreeding.predict;

julia> fname_phenomes_predicted = GenomicBreeding.predict(input);

julia> phenomes_predicted = readjld2(Phenomes, fname=fname_phenomes_predicted);

julia> dimensions(phenomes_predicted)
Dict{String, Int64} with 8 entries:
  "n_total"       => 5400
  "n_zeroes"      => 0
  "n_nan"         => 0
  "n_entries"     => 300
  "n_traits"      => 18
  "n_inf"         => 0
  "n_populations" => 3
  "n_missing"     => 0

source

GenomicBreeding.prepareinputs — Method

prepareinputs(input::GBInput)::Vector{GBInput}

Create a vector of GBInput objects for parallel processing based on the input configuration.

Arguments

input::GBInput: Initial input configuration containing analysis parameters.

Returns

Vector{GBInput}: Array of GBInput objects configured for different combinations of:
- Models (RR-BLUP, BayesB, etc.)
- Traits from phenotype data
- Population groups (including bulk and across-population analyses)

Details

For different analysis types, the function generates the following combinations:

Cross-validation (cv): 2 models × traits × (populations + bulk + across-pop)
Model fitting (fit): 2 models × traits × (populations + bulk)
Prediction (predict): 1 GBInput per allele effects file
GWAS (gwas): 1 model × traits × (populations + bulk)

The function handles special cases:

Skips trait-population combinations with insufficient data
Adjusts settings for bulk and across-population analyses
Configures specific parameters for prediction tasks

Examples

julia> genomes = GenomicBreedingCore.simulategenomes(n=300, l=1_000, verbose=false); genomes.populations = StatsBase.sample(string.("pop_", 1:3), length(genomes.entries), replace=true);

julia> trials, _ = GenomicBreedingCore.simulatetrials(genomes=genomes, n_years=1, n_seasons=1, n_harvests=1, n_sites=1, n_replications=1, verbose=false);

julia> phenomes = extractphenomes(trials);

julia> fname_geno = try writedelimited(genomes, fname="test-geno.tsv"); catch; rm("test-geno.tsv"); writedelimited(genomes, fname="test-geno.tsv"); end;
    
julia> fname_pheno = try writedelimited(phenomes, fname="test-pheno.tsv"); catch; rm("test-pheno.tsv"); writedelimited(phenomes, fname="test-pheno.tsv"); end;

julia> input_cv = GBInput(fname_geno=fname_geno, fname_pheno=fname_pheno, analysis=GenomicBreeding.cv, verbose=false);

julia> input_fit = GBInput(fname_geno=fname_geno, fname_pheno=fname_pheno, analysis=GenomicBreeding.fit, verbose=false);

julia> input_predict = GBInput(fname_geno=fname_geno, fname_pheno=fname_pheno, fname_allele_effects_jld2s=["dummy.jld2"], analysis=GenomicBreeding.predict, verbose=false); writejld2(Fit(n=1, l=1), fname="dummy.jld2");

julia> input_gwas = GBInput(fname_geno=fname_geno, fname_pheno=fname_pheno, analysis=GenomicBreeding.gwas, verbose=false);

julia> inputs_cv = prepareinputs(input_cv); # expect 30 GBInputs = 2 models x 3 traits x (3 populations + 1 bulk + 1 across pops)

julia> inputs_fit = prepareinputs(input_fit); # expect 24 GBInputs = 2 models x 3 traits x (3 populations + 1 bulk)

julia> inputs_predict = prepareinputs(input_predict); # expect 1 GBInput = 1 dummy Fit struct

julia> inputs_gwas = prepareinputs(input_gwas); # expect 24 GBInputs = 1 models x 3 traits x (3 populations + 1 bulk)

julia> length(inputs_cv) == 30
true

julia> length(inputs_fit) == 24
true

julia> length(inputs_predict) == 1
true

julia> length(inputs_gwas) == 24
true

julia> rm.([fname_geno, fname_pheno, "dummy.jld2"]);

source

GenomicBreeding.prepareoutprefixandoutdir — Method

prepareoutprefixandoutdir(input::GBInput)::String

Prepare the output directory and sanitize the output filename prefix for genomic breeding analysis results.

This function performs two main tasks:

Creates the output directory if it doesn't exist
Sanitizes the output filename prefix by replacing problematic characters with underscores

Arguments

input::GBInput: Input configuration containing the output file prefix and analysis type

Returns

String: Sanitized output file prefix path

Details

Problematic characters that are replaced include spaces, newlines, tabs, parentheses, and special characters (&|:=+*%@!). The function also ensures the prefix ends with the analysis type and a hyphen.

Throws

ArgumentError: If unable to create the output directory

Example

julia> input = GBInput(fname_geno="some_dir/fname_geno.jld2", fname_pheno="some_dir/fname_pheno.jld2", fname_out_prefix="GBOutput/some@!_%&prefix", verbose=false);

julia> fname_out_prefix = prepareoutprefixandoutdir(input)
"GBOutput/some_____prefix-cv-"

julia> rm(dirname(fname_out_prefix), recursive=true);

source

GenomicBreeding.submitslurmarrayjobs — Method

submitslurmarrayjobs(; input::GBInput)::String

Submit an array of Slurm jobs for genomic prediction analysis.

Arguments

input::GBInput: A GBInput struct containing all necessary parameters for job submission and analysis.

Returns

String: Path to the output directory where results will be stored.

Details

This function handles the submission of parallel genomic prediction jobs to a Slurm cluster. It performs the following steps:

Validates input parameters and checks for required R packages
Creates necessary output directories
Prepares individual job inputs
Generates Julia and Slurm scripts
Submits the array job to the Slurm scheduler

The function supports various genomic analyses including:

Cross-validation (cv)
Model fitting (fit)
Prediction (predict)
GWAS analysis (gwas)

Job Configuration

Uses Slurm array jobs for parallel execution
Configurable CPU, memory, and time limit parameters
Supports both module-based and conda environments
Interactive confirmation before job submission

Notes

Requires a working Slurm environment
BGLR R package must be installed
User will be prompted to enter "YES" to confirm job submission
Job array size is controlled by SLURM_max_array_jobs_running

Example

using GenomicBreeding, StatsBase;
using GenomicBreeding: cv, fit, predict, gwas, ols, rigde, lasso, bayesa, bayesb, bayesc, gwasols, gwaslmm, gwasreml;
genomes = GenomicBreeding.GenomicBreedingCore.simulategenomes(n=300, l=1_000, n_populations=3, verbose=false);
trials, _ = GenomicBreeding.GenomicBreedingCore.simulatetrials(genomes=genomes, n_years=1, n_seasons=1, n_harvests=1, n_sites=1, n_replications=1, verbose=false);
phenomes = extractphenomes(trials);
fname_geno = try writedelimited(genomes, fname="test-geno.tsv"); catch; rm("test-geno.tsv"); writedelimited(genomes, fname="test-geno.tsv"); end;
fname_pheno = try writedelimited(phenomes, fname="test-pheno.tsv"); catch; rm("test-pheno.tsv"); writedelimited(phenomes, fname="test-pheno.tsv"); end;

# Repeated k-fold cross-validation
input_cv = GBInput(fname_geno=fname_geno, fname_pheno=fname_pheno, analysis=cv, SLURM_account_name="dbiof1", SLURM_cpus_per_task=5, SLURM_mem_G=5, fname_out_prefix="GBOutput/test-", verbose=false);
outdir = submitslurmarrayjobs(input_cv); ### You will be asked to enter "YES" to proceed with job submission.
run(`sh -c 'squeue -u "$USER"'`)
run(`sh -c 'tail slurm-*_*.out'`)
run(`sh -c 'grep -i "err" slurm-*_*.out'`)
cvs = loadcvs(input_cv)
df_across_entries, df_per_entry = tabularise(cvs)
sum(df_across_entries.model .== "bayesa") / nrow(df_across_entries)
sum(df_across_entries.model .== "ridge") / nrow(df_across_entries)
sort(combine(groupby(df_across_entries, [:validation_population, :model]), [:cor => mean, :cor => length]), :cor_mean, rev=true)

# Genomic prediction equation full data fit
input_fit = clone(input_cv)
input_fit.analysis = fit
outdir = submitslurmarrayjobs(input_fit);
input_fit.fname_allele_effects_jld2s = begin
    files = readdir(outdir)
    idx = findall(.!isnothing.(match.(Regex("-fit-"), files)) .&& .!isnothing.(match.(Regex("jld2$"), files)))
    joinpath.(outdir, files[idx])
end
fits = loadfits(input_fit)
length(fits)

input_predict = clone(input_fit)
input_predict.analysis = predict
outdir = submitslurmarrayjobs(input_predict)
run(`squeue`)

source

GenomicBreedingCore.checkdims — Method

checkdims(input::GBInput)::Bool

Check dimension compatibility of the fields of the GBInput struct. Returns true if both models and gwas_models fields in the input are not nothing, false otherwise.

Arguments

input::GBInput: Input structure containing genomic data and models

Returns

Bool: true if dimensions are compatible, false otherwise

Examples

julia> input = GBInput(fname_geno="geno1.jld2", fname_pheno="pheno1.jld2");

julia> checkdims(input)
true

julia> input.models = nothing

julia> checkdims(input)
false

source

GenomicBreedingCore.clone — Method

clone(x::GBInput)::GBInput

Create a deep copy of a GBInput object, duplicating all field values into a new instance.

This function allows you to create an independent copy of a GBInput object where modifications to the clone won't affect the original object.

Arguments

x::GBInput: The source GBInput object to be cloned

Returns

::GBInput: A new GBInput instance with identical field values

Example

julia> input = GBInput(fname_geno="geno1.jld2", fname_pheno="pheno1.jld2");

julia> copy_input = clone(input);

julia> input == copy_input
true

source

Methods reference

Contents

GenomicBreeding.jl

GenomicBreedingCore.jl

GenomicBreedingIO.jl

GenomicBreedingModels.jl

GenomicBreedingPlots.jl