Methods reference

Contents

GenomicBreeding.jl

GenomicBreeding.GBInputType
mutable struct GBInput

Main input struct for genomic breeding analysis (implements GenomicBreedingCore.AbstractGB)

Fields

Required

  • fname_geno: Path to genotype file (see file format guide for details)

Optional Data Files

  • fname_pheno: Path to phenotype file (Default: "" - see file format guide for details)
  • fname_allele_effects_jld2s: Vector of paths to JLD2 files containing Fit structs (Default: [""])

Analysis Settings

  • analysis: Analysis function to perform (Default: cv)
    • cv: Replicated k-fold cross-validation
    • fit: Fit genomic prediction models to extract allele effects
    • predict: Compute GEBVs using existing model fits
    • gwas: Genome-wide association study
  • bulk_cv: Perform cross-validation across all populations (Default: false)
  • populations: Vector of populations to include (Default: all)
  • traits: Vector of traits to analyze (Default: all)
  • models: Vector of genomic prediction model functions (Default: [ridge, bayesa])
  • gwas_models: Vector of GWAS model functions (Default: [gwasols, gwaslmm])

Cross-Validation Parameters

  • n_folds: Number of CV folds (Default: 5)
  • n_replications: Number of CV replications (Default: 5)

Filtering Parameters

  • keep_all: Keep all entries when merging data (Default: false)
  • maf: Minimum allele frequency (Default: 0.05)
  • mtv: Minimum trait variance (Default: 1e-7)

Model Parameters

  • n_iter: MCMC iterations (Default: 1_500)
  • n_burnin: MCMC burn-in iterations (Default: 500)

Output Settings

  • fname_out_prefix: Output file prefix & directory (Default: GBOutput/output-<timestamp>-)
  • verbose: Show progress messages (Default: true)

SLURM Settings

  • SLURM_job_name: Job array name (Default: GBJob-<timestamp>)
  • SLURM_account_name: Account name (Default: "")
  • SLURM_partition_name: Partition to use (Default: "")
  • SLURM_nodes_per_array_job: Nodes per job (Default: 1)
  • SLURM_tasks_per_node: Tasks per node (Default: 1)
  • SLURM_cpus_per_task: CPUs per task (Default: 1)
  • SLURM_mem_G: Memory in GB (Default: 1)
  • SLURM_time_limit_dd_hhmmss: Time limit as "dd-hh:mm:ss" (Default: "00-01:00:00")
  • SLURM_max_array_jobs_running: Max concurrent array jobs (Default: 20)
  • SLURM_module_load_Conda_version_name: Conda module name (Default: "Miniconda3")
  • SLURM_module_load_R_version_name: R module name (Default: "conda" which will use the R installed in the conda environment - see installation instructions for details)
  • SLURM_module_load_Julia_version_name: Julia module name (Default: "" which will use the Julia installed via JuliaUp - see installation instructions for details)
source
Base.:==Method
Base.:(==)(x::GBInput, y::GBInput)::Bool

Compare two GBInput structs for equality by comparing their hash values.

This method overloads the == operator for GBInput structs, allowing direct comparison using the == operator. Two GBInput structs are considered equal if they have identical hash values, which implies they have the same values for all relevant fields.

Arguments

  • x::GBInput: First GBInput struct to compare
  • y::GBInput: Second GBInput struct to compare

Returns

  • Bool: true if the hash values of both structs are equal, false otherwise

Examples

julia> input_1 = input = GBInput(fname_geno="geno1.jld2", fname_pheno="pheno1.jld2", fname_out_prefix="test1-", SLURM_job_name="slurmjob1");

julia> input_2 = input = GBInput(fname_geno="geno1.jld2", fname_pheno="pheno1.jld2", fname_out_prefix="test1-", SLURM_job_name="slurmjob1");

julia> input_3 = input = GBInput(fname_geno="geno2.jld2", fname_pheno="pheno2.jld2");

julia> input_1 == input_2
true

julia> input_1 == input_3
false
source
Base.hashMethod
Base.hash(x::GBInput, h::UInt)::UInt

Compute a hash value for a GBInput struct by combining the hash values of all its fields.

Arguments

  • x::GBInput: The input structure to be hashed
  • h::UInt: The hash value seed

Returns

  • UInt: A hash value that uniquely identifies the content of the GBInput struct

Details

This method implements hash computation for the GBInput type by iterating through all fields and combining their hash values.

Examples

julia> input = GBInput(fname_geno="", fname_pheno="");

julia> typeof(hash(input))
UInt64
source
GenomicBreeding.checkinputsMethod
checkinputs(input::GBInput)::Vector{String}

Check the compatibility and validity of inputs for genomic analysis.

Returns a vector of error messages. An empty vector indicates all inputs are valid.

Arguments

  • input::GBInput: A struct containing analysis parameters including:
    • analysis: Analysis type (cv, fit, predict, or gwas)
    • fname_geno: Path to genotype file
    • fname_pheno: Path to phenotype file
    • models: Vector of selected models
    • fname_allele_effects_jld2s: Paths to allele effects files (for predict)

Returns

  • Vector{String}: Collection of error messages, empty if all inputs are valid

Validation Rules

  • Analysis type must be one of: cv, fit, predict, or gwas
  • For cv, fit, and gwas:
    • Genotype file must exist
    • Phenotype file must exist
    • At least one model must be specified
  • For predict:
    • Genotype file must exist
    • All specified allele effects files must exist

Examples

julia> genomes = simulategenomes(n=300, n_populations=3, verbose=false);

julia> trials, _ = simulatetrials(genomes=genomes, n_years=1, n_seasons=1, n_harvests=1, n_sites=1, n_replications=1, verbose=false);

julia> phenomes = extractphenomes(trials);

julia> fname_geno = try writedelimited(genomes, fname="test-geno.tsv"); catch; rm("test-geno.tsv"); writedelimited(genomes, fname="test-geno.tsv"); end;
    
julia> fname_pheno = try writedelimited(phenomes, fname="test-pheno.tsv"); catch; rm("test-pheno.tsv"); writedelimited(phenomes, fname="test-pheno.tsv"); end;

julia> input = GBInput(fname_geno=fname_geno, fname_pheno=fname_pheno, analysis=cv);

julia> length(checkinputs(input)) == 0
true

julia> input.fname_pheno = ""; length(checkinputs(input)) == 0
false
source
GenomicBreeding.cvMethod
cv(input::GBInput)::Tuple{Vector{String},Vector{String}}

Assess genomic prediction accuracy via replicated k-fold cross-validation.

Arguments

  • input::GBInput: A GBInput struct containing configuration parameters including:
    • bulk_cv: Boolean flag for bulk cross-validation
    • populations: Vector of population names to analyze
    • models: Statistical models to use for prediction
    • n_folds: Number of folds for cross-validation
    • n_replications: Number of replications for cross-validation
    • fname_out_prefix: Prefix for output filenames
    • verbose: Boolean flag for detailed output

Returns

  • Tuple{Vector{String},Vector{String}}: A tuple containing:
    • First element: Vector of paths to JLD2 files containing CV results
    • Second element: Vector of paths to text files containing error notes

Details

The function supports three types of cross-validation:

  • Single population CV when one population is specified
  • Pairwise population CV when two populations are specified
  • Both pairwise and leave-one-population-out CV when more than two populations are specified

Results are saved as JLD2 files (one per fold, replication, and trait) and optional text files containing notes about failed jobs.

Example

julia> genomes = GenomicBreedingCore.simulategenomes(n=300, l=1_000, verbose=false); genomes.populations = StatsBase.sample(string.("pop_", 1:3), length(genomes.entries), replace=true);

julia> proportion_of_variance = fill(0.0, 9, 3); proportion_of_variance[1, :] .= 1.00; # 100% variance on the additive genetic effects

julia> trials, _ = GenomicBreedingCore.simulatetrials(genomes=genomes, n_years=1, n_seasons=1, n_harvests=1, n_sites=1, n_replications=1, proportion_of_variance=proportion_of_variance, verbose=false);

julia> phenomes = extractphenomes(trials);

julia> fname_geno = try writedelimited(genomes, fname="test-geno.tsv"); catch; rm("test-geno.tsv"); writedelimited(genomes, fname="test-geno.tsv"); end;

julia> fname_pheno = try writedelimited(phenomes, fname="test-pheno.tsv"); catch; rm("test-pheno.tsv"); writedelimited(phenomes, fname="test-pheno.tsv"); end;

julia> input = GBInput(fname_geno=fname_geno, fname_pheno=fname_pheno, analysis=cv, bulk_cv=false, fname_out_prefix="GBOutput_cv_3/output-3-", populations=["pop_1", "pop_3"], traits=["trait_1"], n_replications=2, n_folds=3, verbose=false);

julia> fnames_cvs, fnames_notes = cv(input);

julia> length(fnames_cvs) == 4, length(fnames_notes) == 0
(true, true)
source
GenomicBreeding.fitMethod
fit(input::GBInput)::Vector{String}

Extract allele effects by fitting genomic prediction models without cross-validation.

Arguments

  • input::GBInput: A GBInput struct containing:
    • fname_geno: Path to genotype data file
    • fname_pheno: Path to phenotype data file
    • models: Vector of model functions to fit (e.g., [bayesa, bayesb])
    • traits: Optional vector of trait names to analyze
    • populations: Optional vector of population names to analyze
    • n_iter: Number of iterations for Bayesian models
    • n_burnin: Number of burn-in iterations for Bayesian models
    • verbose: Boolean for detailed output

Returns

  • Vector{String}: Paths to JLD2 files containing fitted model results, one file per model-trait-population combination

Details

The function fits specified genomic prediction models to the full dataset without cross-validation. For each combination of model, trait, and population, it:

  1. Loads and processes genotype and phenotype data
  2. Fits the specified model
  3. Saves results to a JLD2 file with naming pattern: {prefix}_model_{name}-trait_{name}-population_{name}.jld2

Notes

  • If populations is not specified, all unique populations in phenotype data are used
  • If traits is not specified, all unique traits in phenotype data are used
  • For Bayesian models (names containing "bayes"), uses specified iterations and burn-in
  • Will throw an error if output files already exist

Example

julia> genomes = GenomicBreedingCore.simulategenomes(n=300, l=1_000, verbose=false); genomes.populations = StatsBase.sample(string.("pop_", 1:3), length(genomes.entries), replace=true);

julia> proportion_of_variance = fill(0.0, 9, 3); proportion_of_variance[1, :] .= 1.00; # 100% variance on the additive genetic effects

julia> trials, _ = GenomicBreedingCore.simulatetrials(genomes=genomes, n_years=1, n_seasons=1, n_harvests=1, n_sites=1, n_replications=1, proportion_of_variance=proportion_of_variance, verbose=false);

julia> phenomes = extractphenomes(trials);

julia> fname_geno = try writedelimited(genomes, fname="test-geno.tsv"); catch; rm("test-geno.tsv"); writedelimited(genomes, fname="test-geno.tsv"); end;

julia> fname_pheno = try writedelimited(phenomes, fname="test-pheno.tsv"); catch; rm("test-pheno.tsv"); writedelimited(phenomes, fname="test-pheno.tsv"); end;

julia> input = GBInput(fname_geno=fname_geno, fname_pheno=fname_pheno, analysis=GenomicBreeding.fit, fname_out_prefix="GBOutput_fit_4/output-4-", populations=["pop_1", "pop_3"], traits=["trait_1"], models=[bayesa, bayesb], verbose=false);

julia> fname_allele_effects_jld2s = GenomicBreeding.fit(input);

julia> length(fname_allele_effects_jld2s) == 4
true
source
GenomicBreeding.gwasMethod
gwas(input::GBInput)::Vector{String}

Perform genome-wide association study (GWAS) analysis on genomic and phenotypic data.

Arguments

  • input::GBInput: A GBInput struct containing:
    • fname_geno: Path to genotype data file
    • fname_pheno: Path to phenotype data file
    • gwas_models: Vector of GWAS models to apply
    • traits: Optional vector of trait names to analyze
    • populations: Optional vector of population names to analyze
    • verbose: Boolean flag for detailed output

Returns

  • Vector{String}: Paths to generated JLD2 files containing Fit structs for each model-trait-population combination

Details

The function performs GWAS analysis for each combination of:

  • GWAS models specified in input
  • Traits found in phenotype data
  • Populations specified (if none specified, analyzes all data together)

For each combination, it:

  1. Filters data for the specific population if specified
  2. Fits the GWAS model
  3. Saves results to a JLD2 file containing a Fit struct

The Fit struct's b_hat field contains:

  • t-statistics for gwasols model
  • z-statistics for other GWAS models

Notes

  • Output files are named as: <prefix>_model_<model>-trait_<trait>-population_<pop>.jld2
  • Will throw an error if output files already exist
  • When verbose=true, displays a progress bar and correlation heatmap of estimated allele effects

Example

julia> genomes = GenomicBreedingCore.simulategenomes(n=300, l=1_000, verbose=false); genomes.populations = StatsBase.sample(string.("pop_", 1:3), length(genomes.entries), replace=true);

julia> proportion_of_variance = fill(0.0, 9, 3); proportion_of_variance[1, :] .= 1.00; # 100% variance on the additive genetic effects

julia> trials, _ = GenomicBreedingCore.simulatetrials(genomes=genomes, n_years=1, n_seasons=1, n_harvests=1, n_sites=1, n_replications=1, proportion_of_variance=proportion_of_variance, verbose=false);

julia> phenomes = extractphenomes(trials);

julia> fname_geno = try writedelimited(genomes, fname="test-geno.tsv"); catch; rm("test-geno.tsv"); writedelimited(genomes, fname="test-geno.tsv"); end;

julia> fname_pheno = try writedelimited(phenomes, fname="test-pheno.tsv"); catch; rm("test-pheno.tsv"); writedelimited(phenomes, fname="test-pheno.tsv"); end;

julia> input = GBInput(fname_geno=fname_geno, fname_pheno=fname_pheno, gwas_models=[gwasols], verbose=false);

julia> fname_test_statistics_jld2s = GenomicBreeding.gwas(input);

julia> length(fname_test_statistics_jld2s) == 3
true
source
GenomicBreeding.loadcvsMethod
loadcvs(input::GBInput; min_train_size::Int64=10, verbose::Bool=true)::Vector{CV}

Load and filter cross-validation (CV) results from files generated by GenomicBreeding.cv().

Arguments

  • input::GBInput: Input configuration containing the output directory path in fname_out_prefix
  • min_train_size::Int64=10: Minimum required size for training sets (default: 10)
  • verbose::Bool=true: Whether to display a progress meter during loading (default: true)

Returns

  • Vector{CV}: Vector of valid CV objects that meet the following criteria:
    • Contains complete metrics (length of fit.metrics == 9)
    • Has valid correlation metric (not missing, NaN, or Inf)
    • Validation set size ≥ mintrainsize

Throws

  • ArgumentError: If the output directory doesn't exist or contains no CV results

Details

The function searches for files with pattern "-cv-" and extension ".jld2" in the output directory. Invalid or failed CV results are automatically filtered out during loading. A progress meter is displayed during loading when verbose is true.

Example

julia> genomes = GenomicBreedingCore.simulategenomes(n=300, l=1_000, verbose=false); genomes.populations = StatsBase.sample(string.("pop_", 1:3), length(genomes.entries), replace=true);

julia> proportion_of_variance = fill(0.0, 9, 3); proportion_of_variance[1, :] .= 1.00; # 100% variance on the additive genetic effects

julia> trials, _ = GenomicBreedingCore.simulatetrials(genomes=genomes, n_years=1, n_seasons=1, n_harvests=1, n_sites=1, n_replications=1, proportion_of_variance=proportion_of_variance, verbose=false);

julia> phenomes = extractphenomes(trials);

julia> fname_geno = try writedelimited(genomes, fname="test-geno.tsv"); catch; rm("test-geno.tsv"); writedelimited(genomes, fname="test-geno.tsv"); end;

julia> fname_pheno = try writedelimited(phenomes, fname="test-pheno.tsv"); catch; rm("test-pheno.tsv"); writedelimited(phenomes, fname="test-pheno.tsv"); end;

julia> input = GBInput(fname_geno=fname_geno, fname_pheno=fname_pheno, fname_out_prefix="GBOutput_cv_1/output-1-", populations=["pop_1", "pop_3"], traits=["trait_1"], n_replications=2, n_folds=3, verbose=false);

julia> fnames_cvs, fnames_notes = cv(input);

julia> cvs = loadcvs(input, verbose=false);

julia> length(cvs) == length(fnames_cvs)
true
source
GenomicBreeding.loadfitsMethod
loadfits(input::GBInput)::Vector{Fit}

Load fitted allele frequency effects from genomic prediction models stored in JLD2 files.

Arguments

  • input::GBInput: Input configuration containing paths to JLD2 files with fitted allele effects.

Returns

  • Vector{Fit}: Array of Fit objects containing the loaded allele frequency effects.

Details

The function attempts to load all JLD2 files specified in input.fname_allele_effects_jld2s. If a file cannot be loaded, that entry will be skipped and remain undef in the output vector.

Example

julia> genomes = GenomicBreedingCore.simulategenomes(n=300, l=1_000, verbose=false); genomes.populations = StatsBase.sample(string.("pop_", 1:3), length(genomes.entries), replace=true);

julia> proportion_of_variance = fill(0.0, 9, 3); proportion_of_variance[1, :] .= 1.00; # 100% variance on the additive genetic effects

julia> trials, _ = GenomicBreedingCore.simulatetrials(genomes=genomes, n_years=1, n_seasons=1, n_harvests=1, n_sites=1, n_replications=1, proportion_of_variance=proportion_of_variance, verbose=false);

julia> phenomes = extractphenomes(trials);

julia> fname_geno = try writedelimited(genomes, fname="test-geno.tsv"); catch; rm("test-geno.tsv"); writedelimited(genomes, fname="test-geno.tsv"); end;

julia> fname_pheno = try writedelimited(phenomes, fname="test-pheno.tsv"); catch; rm("test-pheno.tsv"); writedelimited(phenomes, fname="test-pheno.tsv"); end;

julia> input = GBInput(fname_geno=fname_geno, fname_pheno=fname_pheno, fname_out_prefix="GBOutput_fit_2/output-2-", populations=["pop_1", "pop_3"], traits=["trait_1"], models=[bayesa, bayesb], verbose=false);

julia> input.fname_allele_effects_jld2s = GenomicBreeding.fit(input);

julia> fits = loadfits(input);

julia> length(fits) == length(input.fname_allele_effects_jld2s)
true
source
GenomicBreeding.loadgenomesphenomesMethod
loadgenomesphenomes(input::GBInput)::Tuple{Genomes, Phenomes, Vector{String}, Vector{String}}

Load, merge, and filter genotype and phenotype data from specified input files.

Arguments

  • input::GBInput: A struct containing input parameters including:
    • fname_geno: Path to genotype data file
    • fname_pheno: Path to phenotype data file
    • bulk_cv: Boolean for bulk cross-validation
    • populations: Vector of population names to include (optional)
    • traits: Vector of trait names to include (optional)
    • n_folds: Number of cross-validation folds
    • keep_all: Boolean to keep all entries during merging
    • maf: Minimum allele frequency threshold
    • mtv: Minimum trait variance threshold
    • verbose: Boolean for detailed output

Returns

A tuple containing:

  1. Genomes: Filtered genomic data
  2. Phenomes: Filtered phenotypic data
  3. Vector{String}: Traits to skip due to insufficient data for cross-validation
  4. Vector{String}: Populations to skip due to insufficient data for cross-validation

Notes

  • Supports multiple file formats (string-delimited, JLD2, VCF)
  • Performs data validation and compatibility checks
  • Filters traits with variance below minimum trait variance (mtv) threshold
  • Ensures sufficient sample size for cross-validation
  • Filters markers based on minimum allele frequency (maf)

Examples

julia> genomes = simulategenomes(n=300, n_populations=3, verbose=false);

julia> trials, _ = simulatetrials(genomes=genomes, n_years=1, n_seasons=1, n_harvests=1, n_sites=1, n_replications=1, verbose=false);

julia> phenomes = extractphenomes(trials);

julia> fname_geno = try writedelimited(genomes, fname="test-geno.tsv"); catch; rm("test-geno.tsv"); writedelimited(genomes, fname="test-geno.tsv"); end;
    
julia> fname_pheno = try writedelimited(phenomes, fname="test-pheno.tsv"); catch; rm("test-pheno.tsv"); writedelimited(phenomes, fname="test-pheno.tsv"); end;

julia> input = GBInput(fname_geno=fname_geno, fname_pheno=fname_pheno, populations=["pop_1", "pop_3"], traits=["trait_1"], verbose=false);

julia> genomes, phenomes, traits_to_skip, populations_to_skip = loadgenomesphenomes(input);

julia> length(unique(genomes.populations)) == length(unique(phenomes.populations)) == 2
true

julia> length(phenomes.traits) == 1
true

julia> rm.([fname_geno, fname_pheno]);
source
GenomicBreeding.plotMethod
plot(;
    input::GBInput,
    skip_genomes::Bool = false,
    skip_phenomes::Bool = false,
    skip_cvs::Bool = false,
    format::String = "svg",
    plot_size::Tuple{Int64,Int64} = (600, 450),
    overwrite::Bool = false
)::String

Generate and save visualization plots for genomic, phenomic, and cross-validation data.

Arguments

  • input::GBInput: Input configuration containing file paths and settings
  • skip_genomes::Bool: If true, skip generating genome-related plots
  • skip_phenomes::Bool: If true, skip generating phenome-related plots
  • skip_cvs::Bool: If true, skip generating cross-validation plots
  • format::String: Output file format for plots (e.g., "svg", "png")
  • plot_size::Tuple{Int64,Int64}: Dimensions of output plots in pixels (width, height)
  • overwrite::Bool: If true, overwrite existing plot files

Returns

  • String: Path to the output directory containing generated plots

Plot Types

Genomes and Phenomes

  • Distribution plots
  • Violin plots
  • Correlation heatmaps
  • Tree plots
  • PCA biplots

Cross-validation

  • Bar plots
  • Box plots

Output Structure

Creates a directory structure under input.fname_out_prefix/plots/ with subdirectories:

  • genomes/: Genome-related visualizations
  • phenomes/: Phenome-related visualizations
  • cvs/: Cross-validation visualizations

Example

julia> genomes = GenomicBreedingCore.simulategenomes(n=300, l=1_000, verbose=false); genomes.populations = StatsBase.sample(string.("pop_", 1:3), length(genomes.entries), replace=true);

julia> trials, _ = GenomicBreedingCore.simulatetrials(genomes=genomes, n_years=1, n_seasons=1, n_harvests=1, n_sites=1, n_replications=1, verbose=false);

julia> phenomes = extractphenomes(trials);

julia> fname_geno = try writedelimited(genomes, fname="test-geno.tsv"); catch; rm("test-geno.tsv"); writedelimited(genomes, fname="test-geno.tsv"); end;

julia> fname_pheno = try writedelimited(phenomes, fname="test-pheno.tsv"); catch; rm("test-pheno.tsv"); writedelimited(phenomes, fname="test-pheno.tsv"); end;
    
julia> input = GBInput(fname_geno=fname_geno, fname_pheno=fname_pheno, SLURM_cpus_per_task=6, SLURM_mem_G=5, fname_out_prefix="GBOutput/test-", verbose=false);

julia> GenomicBreeding.plot(input=input, format="png", plot_size = (700, 525))
"GBOutput/plots"

julia> GenomicBreeding.plot(input=input, format="png", plot_size = (700, 525), overwrite=true, skip_genomes=true)
"GBOutput/plots"
source
GenomicBreeding.predictMethod
predict(input::GBInput)::String

Predict trait values (GEBVs) for a set of genotypes using pre-trained models from GenomicBreeding.fit().

Arguments

  • input::GBInput: Input configuration containing:
    • fname_geno: Path to genotype data file
    • fname_allele_effects_jld2s: Vector of paths to saved model files from previous fit() calls
    • fname_out_prefix: Prefix for output files
    • analysis: Set to GenomicBreeding.predict

Returns

  • String: Path to the JLD2 file containing predicted phenotypes

Details

Takes a GBInput object with genotype data and pre-trained models to predict trait values for new individuals. The function:

  1. Loads genotype data and model parameters
  2. Predicts trait values using the loaded models
  3. Saves predictions in a Phenomes struct
  4. Returns the path to the saved predictions file

Output File Format

  • For single trait prediction: {prefix}{model_name}-predicted_phenomes.jld2
  • For multi-trait prediction: {prefix}{hash}-predicted_phenomes.jld2

Example

julia> genomes = GenomicBreedingCore.simulategenomes(n=300, l=1_000, verbose=false); genomes.populations = StatsBase.sample(string.("pop_", 1:3), length(genomes.entries), replace=true);

julia> proportion_of_variance = fill(0.0, 9, 3); proportion_of_variance[1, :] .= 1.00; # 100% variance on the additive genetic effects

julia> trials, _ = GenomicBreedingCore.simulatetrials(genomes=genomes, n_years=1, n_seasons=1, n_harvests=1, n_sites=1, n_replications=1, proportion_of_variance=proportion_of_variance, verbose=false);

julia> phenomes = extractphenomes(trials);

julia> fname_geno = try writedelimited(genomes, fname="test-geno.tsv"); catch; rm("test-geno.tsv"); writedelimited(genomes, fname="test-geno.tsv"); end;

julia> fname_pheno = try writedelimited(phenomes, fname="test-pheno.tsv"); catch; rm("test-pheno.tsv"); writedelimited(phenomes, fname="test-pheno.tsv"); end;

julia> input = GBInput(fname_geno=fname_geno, fname_pheno=fname_pheno, fname_out_prefix="GBOutput_predict_5/output-5-", analysis=GenomicBreeding.fit, verbose=false);

julia> input.fname_allele_effects_jld2s = GenomicBreeding.fit(input);

julia> input.analysis = GenomicBreeding.predict;

julia> fname_phenomes_predicted = GenomicBreeding.predict(input);

julia> phenomes_predicted = readjld2(Phenomes, fname=fname_phenomes_predicted);

julia> dimensions(phenomes_predicted)
Dict{String, Int64} with 8 entries:
  "n_total"       => 5400
  "n_zeroes"      => 0
  "n_nan"         => 0
  "n_entries"     => 300
  "n_traits"      => 18
  "n_inf"         => 0
  "n_populations" => 3
  "n_missing"     => 0
source
GenomicBreeding.prepareinputsMethod
prepareinputs(input::GBInput)::Vector{GBInput}

Create a vector of GBInput objects for parallel processing based on the input configuration.

Arguments

  • input::GBInput: Initial input configuration containing analysis parameters.

Returns

  • Vector{GBInput}: Array of GBInput objects configured for different combinations of:
    • Models (RR-BLUP, BayesB, etc.)
    • Traits from phenotype data
    • Population groups (including bulk and across-population analyses)

Details

For different analysis types, the function generates the following combinations:

  • Cross-validation (cv): 2 models × traits × (populations + bulk + across-pop)
  • Model fitting (fit): 2 models × traits × (populations + bulk)
  • Prediction (predict): 1 GBInput per allele effects file
  • GWAS (gwas): 1 model × traits × (populations + bulk)

The function handles special cases:

  • Skips trait-population combinations with insufficient data
  • Adjusts settings for bulk and across-population analyses
  • Configures specific parameters for prediction tasks

Examples

julia> genomes = GenomicBreedingCore.simulategenomes(n=300, l=1_000, verbose=false); genomes.populations = StatsBase.sample(string.("pop_", 1:3), length(genomes.entries), replace=true);

julia> trials, _ = GenomicBreedingCore.simulatetrials(genomes=genomes, n_years=1, n_seasons=1, n_harvests=1, n_sites=1, n_replications=1, verbose=false);

julia> phenomes = extractphenomes(trials);

julia> fname_geno = try writedelimited(genomes, fname="test-geno.tsv"); catch; rm("test-geno.tsv"); writedelimited(genomes, fname="test-geno.tsv"); end;
    
julia> fname_pheno = try writedelimited(phenomes, fname="test-pheno.tsv"); catch; rm("test-pheno.tsv"); writedelimited(phenomes, fname="test-pheno.tsv"); end;

julia> input_cv = GBInput(fname_geno=fname_geno, fname_pheno=fname_pheno, analysis=GenomicBreeding.cv, verbose=false);

julia> input_fit = GBInput(fname_geno=fname_geno, fname_pheno=fname_pheno, analysis=GenomicBreeding.fit, verbose=false);

julia> input_predict = GBInput(fname_geno=fname_geno, fname_pheno=fname_pheno, fname_allele_effects_jld2s=["dummy.jld2"], analysis=GenomicBreeding.predict, verbose=false); writejld2(Fit(n=1, l=1), fname="dummy.jld2");

julia> input_gwas = GBInput(fname_geno=fname_geno, fname_pheno=fname_pheno, analysis=GenomicBreeding.gwas, verbose=false);

julia> inputs_cv = prepareinputs(input_cv); # expect 30 GBInputs = 2 models x 3 traits x (3 populations + 1 bulk + 1 across pops)

julia> inputs_fit = prepareinputs(input_fit); # expect 24 GBInputs = 2 models x 3 traits x (3 populations + 1 bulk)

julia> inputs_predict = prepareinputs(input_predict); # expect 1 GBInput = 1 dummy Fit struct

julia> inputs_gwas = prepareinputs(input_gwas); # expect 24 GBInputs = 1 models x 3 traits x (3 populations + 1 bulk)

julia> length(inputs_cv) == 30
true

julia> length(inputs_fit) == 24
true

julia> length(inputs_predict) == 1
true

julia> length(inputs_gwas) == 24
true

julia> rm.([fname_geno, fname_pheno, "dummy.jld2"]);
source
GenomicBreeding.prepareoutprefixandoutdirMethod
prepareoutprefixandoutdir(input::GBInput)::String

Prepare the output directory and sanitize the output filename prefix for genomic breeding analysis results.

This function performs two main tasks:

  1. Creates the output directory if it doesn't exist
  2. Sanitizes the output filename prefix by replacing problematic characters with underscores

Arguments

  • input::GBInput: Input configuration containing the output file prefix and analysis type

Returns

  • String: Sanitized output file prefix path

Details

Problematic characters that are replaced include spaces, newlines, tabs, parentheses, and special characters (&|:=+*%@!). The function also ensures the prefix ends with the analysis type and a hyphen.

Throws

  • ArgumentError: If unable to create the output directory

Example

julia> input = GBInput(fname_geno="some_dir/fname_geno.jld2", fname_pheno="some_dir/fname_pheno.jld2", fname_out_prefix="GBOutput/some@!_%&prefix", verbose=false);

julia> fname_out_prefix = prepareoutprefixandoutdir(input)
"GBOutput/some_____prefix-cv-"

julia> rm(dirname(fname_out_prefix), recursive=true);
source
GenomicBreeding.submitslurmarrayjobsMethod
submitslurmarrayjobs(; input::GBInput)::String

Submit an array of Slurm jobs for genomic prediction analysis.

Arguments

  • input::GBInput: A GBInput struct containing all necessary parameters for job submission and analysis.

Returns

  • String: Path to the output directory where results will be stored.

Details

This function handles the submission of parallel genomic prediction jobs to a Slurm cluster. It performs the following steps:

  1. Validates input parameters and checks for required R packages
  2. Creates necessary output directories
  3. Prepares individual job inputs
  4. Generates Julia and Slurm scripts
  5. Submits the array job to the Slurm scheduler

The function supports various genomic analyses including:

  • Cross-validation (cv)
  • Model fitting (fit)
  • Prediction (predict)
  • GWAS analysis (gwas)

Job Configuration

  • Uses Slurm array jobs for parallel execution
  • Configurable CPU, memory, and time limit parameters
  • Supports both module-based and conda environments
  • Interactive confirmation before job submission

Notes

  • Requires a working Slurm environment
  • BGLR R package must be installed
  • User will be prompted to enter "YES" to confirm job submission
  • Job array size is controlled by SLURM_max_array_jobs_running

Example

using GenomicBreeding, StatsBase;
using GenomicBreeding: cv, fit, predict, gwas, ols, rigde, lasso, bayesa, bayesb, bayesc, gwasols, gwaslmm, gwasreml;
genomes = GenomicBreeding.GenomicBreedingCore.simulategenomes(n=300, l=1_000, n_populations=3, verbose=false);
trials, _ = GenomicBreeding.GenomicBreedingCore.simulatetrials(genomes=genomes, n_years=1, n_seasons=1, n_harvests=1, n_sites=1, n_replications=1, verbose=false);
phenomes = extractphenomes(trials);
fname_geno = try writedelimited(genomes, fname="test-geno.tsv"); catch; rm("test-geno.tsv"); writedelimited(genomes, fname="test-geno.tsv"); end;
fname_pheno = try writedelimited(phenomes, fname="test-pheno.tsv"); catch; rm("test-pheno.tsv"); writedelimited(phenomes, fname="test-pheno.tsv"); end;

# Repeated k-fold cross-validation
input_cv = GBInput(fname_geno=fname_geno, fname_pheno=fname_pheno, analysis=cv, SLURM_account_name="dbiof1", SLURM_cpus_per_task=5, SLURM_mem_G=5, fname_out_prefix="GBOutput/test-", verbose=false);
outdir = submitslurmarrayjobs(input_cv); ### You will be asked to enter "YES" to proceed with job submission.
run(`sh -c 'squeue -u "$USER"'`)
run(`sh -c 'tail slurm-*_*.out'`)
run(`sh -c 'grep -i "err" slurm-*_*.out'`)
cvs = loadcvs(input_cv)
df_across_entries, df_per_entry = tabularise(cvs)
sum(df_across_entries.model .== "bayesa") / nrow(df_across_entries)
sum(df_across_entries.model .== "ridge") / nrow(df_across_entries)
sort(combine(groupby(df_across_entries, [:validation_population, :model]), [:cor => mean, :cor => length]), :cor_mean, rev=true)

# Genomic prediction equation full data fit
input_fit = clone(input_cv)
input_fit.analysis = fit
outdir = submitslurmarrayjobs(input_fit);
input_fit.fname_allele_effects_jld2s = begin
    files = readdir(outdir)
    idx = findall(.!isnothing.(match.(Regex("-fit-"), files)) .&& .!isnothing.(match.(Regex("jld2$"), files)))
    joinpath.(outdir, files[idx])
end
fits = loadfits(input_fit)
length(fits)

input_predict = clone(input_fit)
input_predict.analysis = predict
outdir = submitslurmarrayjobs(input_predict)
run(`squeue`)
source
GenomicBreedingCore.checkdimsMethod
checkdims(input::GBInput)::Bool

Check dimension compatibility of the fields of the GBInput struct. Returns true if both models and gwas_models fields in the input are not nothing, false otherwise.

Arguments

  • input::GBInput: Input structure containing genomic data and models

Returns

  • Bool: true if dimensions are compatible, false otherwise

Examples

julia> input = GBInput(fname_geno="geno1.jld2", fname_pheno="pheno1.jld2");

julia> checkdims(input)
true

julia> input.models = nothing

julia> checkdims(input)
false
source
GenomicBreedingCore.cloneMethod
clone(x::GBInput)::GBInput

Create a deep copy of a GBInput object, duplicating all field values into a new instance.

This function allows you to create an independent copy of a GBInput object where modifications to the clone won't affect the original object.

Arguments

  • x::GBInput: The source GBInput object to be cloned

Returns

  • ::GBInput: A new GBInput instance with identical field values

Example

julia> input = GBInput(fname_geno="geno1.jld2", fname_pheno="pheno1.jld2");

julia> copy_input = clone(input);

julia> input == copy_input
true
source

GenomicBreedingCore.jl

GenomicBreedingIO.jl

GenomicBreedingModels.jl

GenomicBreedingPlots.jl