Methods reference
Contents
GenomicBreeding.jl
GenomicBreeding.GBInput
Base.:==
Base.hash
GenomicBreeding.checkinputs
GenomicBreeding.cv
GenomicBreeding.fit
GenomicBreeding.gwas
GenomicBreeding.loadcvs
GenomicBreeding.loadfits
GenomicBreeding.loadgenomesphenomes
GenomicBreeding.plot
GenomicBreeding.predict
GenomicBreeding.prepareinputs
GenomicBreeding.prepareoutprefixandoutdir
GenomicBreeding.submitslurmarrayjobs
GenomicBreedingCore.checkdims
GenomicBreedingCore.clone
GenomicBreeding.GBInput
— Typemutable struct GBInput
Main input struct for genomic breeding analysis (implements GenomicBreedingCore.AbstractGB)
Fields
Required
fname_geno
: Path to genotype file (see file format guide for details)
Optional Data Files
fname_pheno
: Path to phenotype file (Default: "" - see file format guide for details)fname_allele_effects_jld2s
: Vector of paths to JLD2 files containing Fit structs (Default: [""])
Analysis Settings
analysis
: Analysis function to perform (Default: cv)cv
: Replicated k-fold cross-validationfit
: Fit genomic prediction models to extract allele effectspredict
: Compute GEBVs using existing model fitsgwas
: Genome-wide association study
bulk_cv
: Perform cross-validation across all populations (Default: false)populations
: Vector of populations to include (Default: all)traits
: Vector of traits to analyze (Default: all)models
: Vector of genomic prediction model functions (Default: [ridge, bayesa])gwas_models
: Vector of GWAS model functions (Default: [gwasols, gwaslmm])
Cross-Validation Parameters
n_folds
: Number of CV folds (Default: 5)n_replications
: Number of CV replications (Default: 5)
Filtering Parameters
keep_all
: Keep all entries when merging data (Default: false)maf
: Minimum allele frequency (Default: 0.05)mtv
: Minimum trait variance (Default: 1e-7)
Model Parameters
n_iter
: MCMC iterations (Default: 1_500)n_burnin
: MCMC burn-in iterations (Default: 500)
Output Settings
fname_out_prefix
: Output file prefix & directory (Default: GBOutput/output-<timestamp>-)verbose
: Show progress messages (Default: true)
SLURM Settings
SLURM_job_name
: Job array name (Default: GBJob-<timestamp>)SLURM_account_name
: Account name (Default: "")SLURM_partition_name
: Partition to use (Default: "")SLURM_nodes_per_array_job
: Nodes per job (Default: 1)SLURM_tasks_per_node
: Tasks per node (Default: 1)SLURM_cpus_per_task
: CPUs per task (Default: 1)SLURM_mem_G
: Memory in GB (Default: 1)SLURM_time_limit_dd_hhmmss
: Time limit as "dd-hh:mm:ss" (Default: "00-01:00:00")SLURM_max_array_jobs_running
: Max concurrent array jobs (Default: 20)SLURM_module_load_Conda_version_name
: Conda module name (Default: "Miniconda3")SLURM_module_load_R_version_name
: R module name (Default: "conda" which will use the R installed in the conda environment - see installation instructions for details)SLURM_module_load_Julia_version_name
: Julia module name (Default: "" which will use the Julia installed via JuliaUp - see installation instructions for details)
Base.:==
— MethodBase.:(==)(x::GBInput, y::GBInput)::Bool
Compare two GBInput
structs for equality by comparing their hash values.
This method overloads the ==
operator for GBInput
structs, allowing direct comparison using the ==
operator. Two GBInput
structs are considered equal if they have identical hash values, which implies they have the same values for all relevant fields.
Arguments
x::GBInput
: First GBInput struct to comparey::GBInput
: Second GBInput struct to compare
Returns
Bool
:true
if the hash values of both structs are equal,false
otherwise
Examples
julia> input_1 = input = GBInput(fname_geno="geno1.jld2", fname_pheno="pheno1.jld2", fname_out_prefix="test1-", SLURM_job_name="slurmjob1");
julia> input_2 = input = GBInput(fname_geno="geno1.jld2", fname_pheno="pheno1.jld2", fname_out_prefix="test1-", SLURM_job_name="slurmjob1");
julia> input_3 = input = GBInput(fname_geno="geno2.jld2", fname_pheno="pheno2.jld2");
julia> input_1 == input_2
true
julia> input_1 == input_3
false
Base.hash
— MethodBase.hash(x::GBInput, h::UInt)::UInt
Compute a hash value for a GBInput
struct by combining the hash values of all its fields.
Arguments
x::GBInput
: The input structure to be hashedh::UInt
: The hash value seed
Returns
UInt
: A hash value that uniquely identifies the content of theGBInput
struct
Details
This method implements hash computation for the GBInput
type by iterating through all fields and combining their hash values.
Examples
julia> input = GBInput(fname_geno="", fname_pheno="");
julia> typeof(hash(input))
UInt64
GenomicBreeding.checkinputs
— Methodcheckinputs(input::GBInput)::Vector{String}
Check the compatibility and validity of inputs for genomic analysis.
Returns a vector of error messages. An empty vector indicates all inputs are valid.
Arguments
input::GBInput
: A struct containing analysis parameters including:analysis
: Analysis type (cv
,fit
,predict
, orgwas
)fname_geno
: Path to genotype filefname_pheno
: Path to phenotype filemodels
: Vector of selected modelsfname_allele_effects_jld2s
: Paths to allele effects files (for predict)
Returns
Vector{String}
: Collection of error messages, empty if all inputs are valid
Validation Rules
- Analysis type must be one of:
cv
,fit
,predict
, orgwas
- For
cv
,fit
, andgwas
:- Genotype file must exist
- Phenotype file must exist
- At least one model must be specified
- For
predict
:- Genotype file must exist
- All specified allele effects files must exist
Examples
julia> genomes = simulategenomes(n=300, n_populations=3, verbose=false);
julia> trials, _ = simulatetrials(genomes=genomes, n_years=1, n_seasons=1, n_harvests=1, n_sites=1, n_replications=1, verbose=false);
julia> phenomes = extractphenomes(trials);
julia> fname_geno = try writedelimited(genomes, fname="test-geno.tsv"); catch; rm("test-geno.tsv"); writedelimited(genomes, fname="test-geno.tsv"); end;
julia> fname_pheno = try writedelimited(phenomes, fname="test-pheno.tsv"); catch; rm("test-pheno.tsv"); writedelimited(phenomes, fname="test-pheno.tsv"); end;
julia> input = GBInput(fname_geno=fname_geno, fname_pheno=fname_pheno, analysis=cv);
julia> length(checkinputs(input)) == 0
true
julia> input.fname_pheno = ""; length(checkinputs(input)) == 0
false
GenomicBreeding.cv
— Methodcv(input::GBInput)::Tuple{Vector{String},Vector{String}}
Assess genomic prediction accuracy via replicated k-fold cross-validation.
Arguments
input::GBInput
: A GBInput struct containing configuration parameters including:bulk_cv
: Boolean flag for bulk cross-validationpopulations
: Vector of population names to analyzemodels
: Statistical models to use for predictionn_folds
: Number of folds for cross-validationn_replications
: Number of replications for cross-validationfname_out_prefix
: Prefix for output filenamesverbose
: Boolean flag for detailed output
Returns
Tuple{Vector{String},Vector{String}}
: A tuple containing:- First element: Vector of paths to JLD2 files containing CV results
- Second element: Vector of paths to text files containing error notes
Details
The function supports three types of cross-validation:
- Single population CV when one population is specified
- Pairwise population CV when two populations are specified
- Both pairwise and leave-one-population-out CV when more than two populations are specified
Results are saved as JLD2 files (one per fold, replication, and trait) and optional text files containing notes about failed jobs.
Example
julia> genomes = GenomicBreedingCore.simulategenomes(n=300, l=1_000, verbose=false); genomes.populations = StatsBase.sample(string.("pop_", 1:3), length(genomes.entries), replace=true);
julia> proportion_of_variance = fill(0.0, 9, 3); proportion_of_variance[1, :] .= 1.00; # 100% variance on the additive genetic effects
julia> trials, _ = GenomicBreedingCore.simulatetrials(genomes=genomes, n_years=1, n_seasons=1, n_harvests=1, n_sites=1, n_replications=1, proportion_of_variance=proportion_of_variance, verbose=false);
julia> phenomes = extractphenomes(trials);
julia> fname_geno = try writedelimited(genomes, fname="test-geno.tsv"); catch; rm("test-geno.tsv"); writedelimited(genomes, fname="test-geno.tsv"); end;
julia> fname_pheno = try writedelimited(phenomes, fname="test-pheno.tsv"); catch; rm("test-pheno.tsv"); writedelimited(phenomes, fname="test-pheno.tsv"); end;
julia> input = GBInput(fname_geno=fname_geno, fname_pheno=fname_pheno, analysis=cv, bulk_cv=false, fname_out_prefix="GBOutput_cv_3/output-3-", populations=["pop_1", "pop_3"], traits=["trait_1"], n_replications=2, n_folds=3, verbose=false);
julia> fnames_cvs, fnames_notes = cv(input);
julia> length(fnames_cvs) == 4, length(fnames_notes) == 0
(true, true)
GenomicBreeding.fit
— Methodfit(input::GBInput)::Vector{String}
Extract allele effects by fitting genomic prediction models without cross-validation.
Arguments
input::GBInput
: A GBInput struct containing:fname_geno
: Path to genotype data filefname_pheno
: Path to phenotype data filemodels
: Vector of model functions to fit (e.g., [bayesa, bayesb])traits
: Optional vector of trait names to analyzepopulations
: Optional vector of population names to analyzen_iter
: Number of iterations for Bayesian modelsn_burnin
: Number of burn-in iterations for Bayesian modelsverbose
: Boolean for detailed output
Returns
Vector{String}
: Paths to JLD2 files containing fitted model results, one file per model-trait-population combination
Details
The function fits specified genomic prediction models to the full dataset without cross-validation. For each combination of model, trait, and population, it:
- Loads and processes genotype and phenotype data
- Fits the specified model
- Saves results to a JLD2 file with naming pattern:
{prefix}_model_{name}-trait_{name}-population_{name}.jld2
Notes
- If populations is not specified, all unique populations in phenotype data are used
- If traits is not specified, all unique traits in phenotype data are used
- For Bayesian models (names containing "bayes"), uses specified iterations and burn-in
- Will throw an error if output files already exist
Example
julia> genomes = GenomicBreedingCore.simulategenomes(n=300, l=1_000, verbose=false); genomes.populations = StatsBase.sample(string.("pop_", 1:3), length(genomes.entries), replace=true);
julia> proportion_of_variance = fill(0.0, 9, 3); proportion_of_variance[1, :] .= 1.00; # 100% variance on the additive genetic effects
julia> trials, _ = GenomicBreedingCore.simulatetrials(genomes=genomes, n_years=1, n_seasons=1, n_harvests=1, n_sites=1, n_replications=1, proportion_of_variance=proportion_of_variance, verbose=false);
julia> phenomes = extractphenomes(trials);
julia> fname_geno = try writedelimited(genomes, fname="test-geno.tsv"); catch; rm("test-geno.tsv"); writedelimited(genomes, fname="test-geno.tsv"); end;
julia> fname_pheno = try writedelimited(phenomes, fname="test-pheno.tsv"); catch; rm("test-pheno.tsv"); writedelimited(phenomes, fname="test-pheno.tsv"); end;
julia> input = GBInput(fname_geno=fname_geno, fname_pheno=fname_pheno, analysis=GenomicBreeding.fit, fname_out_prefix="GBOutput_fit_4/output-4-", populations=["pop_1", "pop_3"], traits=["trait_1"], models=[bayesa, bayesb], verbose=false);
julia> fname_allele_effects_jld2s = GenomicBreeding.fit(input);
julia> length(fname_allele_effects_jld2s) == 4
true
GenomicBreeding.gwas
— Methodgwas(input::GBInput)::Vector{String}
Perform genome-wide association study (GWAS) analysis on genomic and phenotypic data.
Arguments
input::GBInput
: A GBInput struct containing:fname_geno
: Path to genotype data filefname_pheno
: Path to phenotype data filegwas_models
: Vector of GWAS models to applytraits
: Optional vector of trait names to analyzepopulations
: Optional vector of population names to analyzeverbose
: Boolean flag for detailed output
Returns
Vector{String}
: Paths to generated JLD2 files containing Fit structs for each model-trait-population combination
Details
The function performs GWAS analysis for each combination of:
- GWAS models specified in input
- Traits found in phenotype data
- Populations specified (if none specified, analyzes all data together)
For each combination, it:
- Filters data for the specific population if specified
- Fits the GWAS model
- Saves results to a JLD2 file containing a Fit struct
The Fit struct's b_hat
field contains:
- t-statistics for
gwasols
model - z-statistics for other GWAS models
Notes
- Output files are named as:
<prefix>_model_<model>-trait_<trait>-population_<pop>.jld2
- Will throw an error if output files already exist
- When verbose=true, displays a progress bar and correlation heatmap of estimated allele effects
Example
julia> genomes = GenomicBreedingCore.simulategenomes(n=300, l=1_000, verbose=false); genomes.populations = StatsBase.sample(string.("pop_", 1:3), length(genomes.entries), replace=true);
julia> proportion_of_variance = fill(0.0, 9, 3); proportion_of_variance[1, :] .= 1.00; # 100% variance on the additive genetic effects
julia> trials, _ = GenomicBreedingCore.simulatetrials(genomes=genomes, n_years=1, n_seasons=1, n_harvests=1, n_sites=1, n_replications=1, proportion_of_variance=proportion_of_variance, verbose=false);
julia> phenomes = extractphenomes(trials);
julia> fname_geno = try writedelimited(genomes, fname="test-geno.tsv"); catch; rm("test-geno.tsv"); writedelimited(genomes, fname="test-geno.tsv"); end;
julia> fname_pheno = try writedelimited(phenomes, fname="test-pheno.tsv"); catch; rm("test-pheno.tsv"); writedelimited(phenomes, fname="test-pheno.tsv"); end;
julia> input = GBInput(fname_geno=fname_geno, fname_pheno=fname_pheno, gwas_models=[gwasols], verbose=false);
julia> fname_test_statistics_jld2s = GenomicBreeding.gwas(input);
julia> length(fname_test_statistics_jld2s) == 3
true
GenomicBreeding.loadcvs
— Methodloadcvs(input::GBInput; min_train_size::Int64=10, verbose::Bool=true)::Vector{CV}
Load and filter cross-validation (CV) results from files generated by GenomicBreeding.cv()
.
Arguments
input::GBInput
: Input configuration containing the output directory path infname_out_prefix
min_train_size::Int64=10
: Minimum required size for training sets (default: 10)verbose::Bool=true
: Whether to display a progress meter during loading (default: true)
Returns
Vector{CV}
: Vector of valid CV objects that meet the following criteria:- Contains complete metrics (length of fit.metrics == 9)
- Has valid correlation metric (not missing, NaN, or Inf)
- Validation set size ≥ mintrainsize
Throws
ArgumentError
: If the output directory doesn't exist or contains no CV results
Details
The function searches for files with pattern "-cv-" and extension ".jld2" in the output directory. Invalid or failed CV results are automatically filtered out during loading. A progress meter is displayed during loading when verbose is true.
Example
julia> genomes = GenomicBreedingCore.simulategenomes(n=300, l=1_000, verbose=false); genomes.populations = StatsBase.sample(string.("pop_", 1:3), length(genomes.entries), replace=true);
julia> proportion_of_variance = fill(0.0, 9, 3); proportion_of_variance[1, :] .= 1.00; # 100% variance on the additive genetic effects
julia> trials, _ = GenomicBreedingCore.simulatetrials(genomes=genomes, n_years=1, n_seasons=1, n_harvests=1, n_sites=1, n_replications=1, proportion_of_variance=proportion_of_variance, verbose=false);
julia> phenomes = extractphenomes(trials);
julia> fname_geno = try writedelimited(genomes, fname="test-geno.tsv"); catch; rm("test-geno.tsv"); writedelimited(genomes, fname="test-geno.tsv"); end;
julia> fname_pheno = try writedelimited(phenomes, fname="test-pheno.tsv"); catch; rm("test-pheno.tsv"); writedelimited(phenomes, fname="test-pheno.tsv"); end;
julia> input = GBInput(fname_geno=fname_geno, fname_pheno=fname_pheno, fname_out_prefix="GBOutput_cv_1/output-1-", populations=["pop_1", "pop_3"], traits=["trait_1"], n_replications=2, n_folds=3, verbose=false);
julia> fnames_cvs, fnames_notes = cv(input);
julia> cvs = loadcvs(input, verbose=false);
julia> length(cvs) == length(fnames_cvs)
true
GenomicBreeding.loadfits
— Methodloadfits(input::GBInput)::Vector{Fit}
Load fitted allele frequency effects from genomic prediction models stored in JLD2 files.
Arguments
input::GBInput
: Input configuration containing paths to JLD2 files with fitted allele effects.
Returns
Vector{Fit}
: Array ofFit
objects containing the loaded allele frequency effects.
Details
The function attempts to load all JLD2 files specified in input.fname_allele_effects_jld2s
. If a file cannot be loaded, that entry will be skipped and remain undef
in the output vector.
Example
julia> genomes = GenomicBreedingCore.simulategenomes(n=300, l=1_000, verbose=false); genomes.populations = StatsBase.sample(string.("pop_", 1:3), length(genomes.entries), replace=true);
julia> proportion_of_variance = fill(0.0, 9, 3); proportion_of_variance[1, :] .= 1.00; # 100% variance on the additive genetic effects
julia> trials, _ = GenomicBreedingCore.simulatetrials(genomes=genomes, n_years=1, n_seasons=1, n_harvests=1, n_sites=1, n_replications=1, proportion_of_variance=proportion_of_variance, verbose=false);
julia> phenomes = extractphenomes(trials);
julia> fname_geno = try writedelimited(genomes, fname="test-geno.tsv"); catch; rm("test-geno.tsv"); writedelimited(genomes, fname="test-geno.tsv"); end;
julia> fname_pheno = try writedelimited(phenomes, fname="test-pheno.tsv"); catch; rm("test-pheno.tsv"); writedelimited(phenomes, fname="test-pheno.tsv"); end;
julia> input = GBInput(fname_geno=fname_geno, fname_pheno=fname_pheno, fname_out_prefix="GBOutput_fit_2/output-2-", populations=["pop_1", "pop_3"], traits=["trait_1"], models=[bayesa, bayesb], verbose=false);
julia> input.fname_allele_effects_jld2s = GenomicBreeding.fit(input);
julia> fits = loadfits(input);
julia> length(fits) == length(input.fname_allele_effects_jld2s)
true
GenomicBreeding.loadgenomesphenomes
— Methodloadgenomesphenomes(input::GBInput)::Tuple{Genomes, Phenomes, Vector{String}, Vector{String}}
Load, merge, and filter genotype and phenotype data from specified input files.
Arguments
input::GBInput
: A struct containing input parameters including:fname_geno
: Path to genotype data filefname_pheno
: Path to phenotype data filebulk_cv
: Boolean for bulk cross-validationpopulations
: Vector of population names to include (optional)traits
: Vector of trait names to include (optional)n_folds
: Number of cross-validation foldskeep_all
: Boolean to keep all entries during mergingmaf
: Minimum allele frequency thresholdmtv
: Minimum trait variance thresholdverbose
: Boolean for detailed output
Returns
A tuple containing:
Genomes
: Filtered genomic dataPhenomes
: Filtered phenotypic dataVector{String}
: Traits to skip due to insufficient data for cross-validationVector{String}
: Populations to skip due to insufficient data for cross-validation
Notes
- Supports multiple file formats (string-delimited, JLD2, VCF)
- Performs data validation and compatibility checks
- Filters traits with variance below minimum trait variance (mtv) threshold
- Ensures sufficient sample size for cross-validation
- Filters markers based on minimum allele frequency (maf)
Examples
julia> genomes = simulategenomes(n=300, n_populations=3, verbose=false);
julia> trials, _ = simulatetrials(genomes=genomes, n_years=1, n_seasons=1, n_harvests=1, n_sites=1, n_replications=1, verbose=false);
julia> phenomes = extractphenomes(trials);
julia> fname_geno = try writedelimited(genomes, fname="test-geno.tsv"); catch; rm("test-geno.tsv"); writedelimited(genomes, fname="test-geno.tsv"); end;
julia> fname_pheno = try writedelimited(phenomes, fname="test-pheno.tsv"); catch; rm("test-pheno.tsv"); writedelimited(phenomes, fname="test-pheno.tsv"); end;
julia> input = GBInput(fname_geno=fname_geno, fname_pheno=fname_pheno, populations=["pop_1", "pop_3"], traits=["trait_1"], verbose=false);
julia> genomes, phenomes, traits_to_skip, populations_to_skip = loadgenomesphenomes(input);
julia> length(unique(genomes.populations)) == length(unique(phenomes.populations)) == 2
true
julia> length(phenomes.traits) == 1
true
julia> rm.([fname_geno, fname_pheno]);
GenomicBreeding.plot
— Methodplot(;
input::GBInput,
skip_genomes::Bool = false,
skip_phenomes::Bool = false,
skip_cvs::Bool = false,
format::String = "svg",
plot_size::Tuple{Int64,Int64} = (600, 450),
overwrite::Bool = false
)::String
Generate and save visualization plots for genomic, phenomic, and cross-validation data.
Arguments
input::GBInput
: Input configuration containing file paths and settingsskip_genomes::Bool
: If true, skip generating genome-related plotsskip_phenomes::Bool
: If true, skip generating phenome-related plotsskip_cvs::Bool
: If true, skip generating cross-validation plotsformat::String
: Output file format for plots (e.g., "svg", "png")plot_size::Tuple{Int64,Int64}
: Dimensions of output plots in pixels (width, height)overwrite::Bool
: If true, overwrite existing plot files
Returns
String
: Path to the output directory containing generated plots
Plot Types
Genomes and Phenomes
- Distribution plots
- Violin plots
- Correlation heatmaps
- Tree plots
- PCA biplots
Cross-validation
- Bar plots
- Box plots
Output Structure
Creates a directory structure under input.fname_out_prefix/plots/
with subdirectories:
genomes/
: Genome-related visualizationsphenomes/
: Phenome-related visualizationscvs/
: Cross-validation visualizations
Example
julia> genomes = GenomicBreedingCore.simulategenomes(n=300, l=1_000, verbose=false); genomes.populations = StatsBase.sample(string.("pop_", 1:3), length(genomes.entries), replace=true);
julia> trials, _ = GenomicBreedingCore.simulatetrials(genomes=genomes, n_years=1, n_seasons=1, n_harvests=1, n_sites=1, n_replications=1, verbose=false);
julia> phenomes = extractphenomes(trials);
julia> fname_geno = try writedelimited(genomes, fname="test-geno.tsv"); catch; rm("test-geno.tsv"); writedelimited(genomes, fname="test-geno.tsv"); end;
julia> fname_pheno = try writedelimited(phenomes, fname="test-pheno.tsv"); catch; rm("test-pheno.tsv"); writedelimited(phenomes, fname="test-pheno.tsv"); end;
julia> input = GBInput(fname_geno=fname_geno, fname_pheno=fname_pheno, SLURM_cpus_per_task=6, SLURM_mem_G=5, fname_out_prefix="GBOutput/test-", verbose=false);
julia> GenomicBreeding.plot(input=input, format="png", plot_size = (700, 525))
"GBOutput/plots"
julia> GenomicBreeding.plot(input=input, format="png", plot_size = (700, 525), overwrite=true, skip_genomes=true)
"GBOutput/plots"
GenomicBreeding.predict
— Methodpredict(input::GBInput)::String
Predict trait values (GEBVs) for a set of genotypes using pre-trained models from GenomicBreeding.fit()
.
Arguments
input::GBInput
: Input configuration containing:fname_geno
: Path to genotype data filefname_allele_effects_jld2s
: Vector of paths to saved model files from previousfit()
callsfname_out_prefix
: Prefix for output filesanalysis
: Set toGenomicBreeding.predict
Returns
String
: Path to the JLD2 file containing predicted phenotypes
Details
Takes a GBInput
object with genotype data and pre-trained models to predict trait values for new individuals. The function:
- Loads genotype data and model parameters
- Predicts trait values using the loaded models
- Saves predictions in a Phenomes struct
- Returns the path to the saved predictions file
Output File Format
- For single trait prediction:
{prefix}{model_name}-predicted_phenomes.jld2
- For multi-trait prediction:
{prefix}{hash}-predicted_phenomes.jld2
Example
julia> genomes = GenomicBreedingCore.simulategenomes(n=300, l=1_000, verbose=false); genomes.populations = StatsBase.sample(string.("pop_", 1:3), length(genomes.entries), replace=true);
julia> proportion_of_variance = fill(0.0, 9, 3); proportion_of_variance[1, :] .= 1.00; # 100% variance on the additive genetic effects
julia> trials, _ = GenomicBreedingCore.simulatetrials(genomes=genomes, n_years=1, n_seasons=1, n_harvests=1, n_sites=1, n_replications=1, proportion_of_variance=proportion_of_variance, verbose=false);
julia> phenomes = extractphenomes(trials);
julia> fname_geno = try writedelimited(genomes, fname="test-geno.tsv"); catch; rm("test-geno.tsv"); writedelimited(genomes, fname="test-geno.tsv"); end;
julia> fname_pheno = try writedelimited(phenomes, fname="test-pheno.tsv"); catch; rm("test-pheno.tsv"); writedelimited(phenomes, fname="test-pheno.tsv"); end;
julia> input = GBInput(fname_geno=fname_geno, fname_pheno=fname_pheno, fname_out_prefix="GBOutput_predict_5/output-5-", analysis=GenomicBreeding.fit, verbose=false);
julia> input.fname_allele_effects_jld2s = GenomicBreeding.fit(input);
julia> input.analysis = GenomicBreeding.predict;
julia> fname_phenomes_predicted = GenomicBreeding.predict(input);
julia> phenomes_predicted = readjld2(Phenomes, fname=fname_phenomes_predicted);
julia> dimensions(phenomes_predicted)
Dict{String, Int64} with 8 entries:
"n_total" => 5400
"n_zeroes" => 0
"n_nan" => 0
"n_entries" => 300
"n_traits" => 18
"n_inf" => 0
"n_populations" => 3
"n_missing" => 0
GenomicBreeding.prepareinputs
— Methodprepareinputs(input::GBInput)::Vector{GBInput}
Create a vector of GBInput
objects for parallel processing based on the input configuration.
Arguments
input::GBInput
: Initial input configuration containing analysis parameters.
Returns
Vector{GBInput}
: Array ofGBInput
objects configured for different combinations of:- Models (RR-BLUP, BayesB, etc.)
- Traits from phenotype data
- Population groups (including bulk and across-population analyses)
Details
For different analysis types, the function generates the following combinations:
- Cross-validation (
cv
): 2 models × traits × (populations + bulk + across-pop) - Model fitting (
fit
): 2 models × traits × (populations + bulk) - Prediction (
predict
): 1 GBInput per allele effects file - GWAS (
gwas
): 1 model × traits × (populations + bulk)
The function handles special cases:
- Skips trait-population combinations with insufficient data
- Adjusts settings for bulk and across-population analyses
- Configures specific parameters for prediction tasks
Examples
julia> genomes = GenomicBreedingCore.simulategenomes(n=300, l=1_000, verbose=false); genomes.populations = StatsBase.sample(string.("pop_", 1:3), length(genomes.entries), replace=true);
julia> trials, _ = GenomicBreedingCore.simulatetrials(genomes=genomes, n_years=1, n_seasons=1, n_harvests=1, n_sites=1, n_replications=1, verbose=false);
julia> phenomes = extractphenomes(trials);
julia> fname_geno = try writedelimited(genomes, fname="test-geno.tsv"); catch; rm("test-geno.tsv"); writedelimited(genomes, fname="test-geno.tsv"); end;
julia> fname_pheno = try writedelimited(phenomes, fname="test-pheno.tsv"); catch; rm("test-pheno.tsv"); writedelimited(phenomes, fname="test-pheno.tsv"); end;
julia> input_cv = GBInput(fname_geno=fname_geno, fname_pheno=fname_pheno, analysis=GenomicBreeding.cv, verbose=false);
julia> input_fit = GBInput(fname_geno=fname_geno, fname_pheno=fname_pheno, analysis=GenomicBreeding.fit, verbose=false);
julia> input_predict = GBInput(fname_geno=fname_geno, fname_pheno=fname_pheno, fname_allele_effects_jld2s=["dummy.jld2"], analysis=GenomicBreeding.predict, verbose=false); writejld2(Fit(n=1, l=1), fname="dummy.jld2");
julia> input_gwas = GBInput(fname_geno=fname_geno, fname_pheno=fname_pheno, analysis=GenomicBreeding.gwas, verbose=false);
julia> inputs_cv = prepareinputs(input_cv); # expect 30 GBInputs = 2 models x 3 traits x (3 populations + 1 bulk + 1 across pops)
julia> inputs_fit = prepareinputs(input_fit); # expect 24 GBInputs = 2 models x 3 traits x (3 populations + 1 bulk)
julia> inputs_predict = prepareinputs(input_predict); # expect 1 GBInput = 1 dummy Fit struct
julia> inputs_gwas = prepareinputs(input_gwas); # expect 24 GBInputs = 1 models x 3 traits x (3 populations + 1 bulk)
julia> length(inputs_cv) == 30
true
julia> length(inputs_fit) == 24
true
julia> length(inputs_predict) == 1
true
julia> length(inputs_gwas) == 24
true
julia> rm.([fname_geno, fname_pheno, "dummy.jld2"]);
GenomicBreeding.prepareoutprefixandoutdir
— Methodprepareoutprefixandoutdir(input::GBInput)::String
Prepare the output directory and sanitize the output filename prefix for genomic breeding analysis results.
This function performs two main tasks:
- Creates the output directory if it doesn't exist
- Sanitizes the output filename prefix by replacing problematic characters with underscores
Arguments
input::GBInput
: Input configuration containing the output file prefix and analysis type
Returns
String
: Sanitized output file prefix path
Details
Problematic characters that are replaced include spaces, newlines, tabs, parentheses, and special characters (&|:=+*%@!
). The function also ensures the prefix ends with the analysis type and a hyphen.
Throws
ArgumentError
: If unable to create the output directory
Example
julia> input = GBInput(fname_geno="some_dir/fname_geno.jld2", fname_pheno="some_dir/fname_pheno.jld2", fname_out_prefix="GBOutput/some@!_%&prefix", verbose=false);
julia> fname_out_prefix = prepareoutprefixandoutdir(input)
"GBOutput/some_____prefix-cv-"
julia> rm(dirname(fname_out_prefix), recursive=true);
GenomicBreeding.submitslurmarrayjobs
— Methodsubmitslurmarrayjobs(; input::GBInput)::String
Submit an array of Slurm jobs for genomic prediction analysis.
Arguments
input::GBInput
: A GBInput struct containing all necessary parameters for job submission and analysis.
Returns
String
: Path to the output directory where results will be stored.
Details
This function handles the submission of parallel genomic prediction jobs to a Slurm cluster. It performs the following steps:
- Validates input parameters and checks for required R packages
- Creates necessary output directories
- Prepares individual job inputs
- Generates Julia and Slurm scripts
- Submits the array job to the Slurm scheduler
The function supports various genomic analyses including:
- Cross-validation (cv)
- Model fitting (fit)
- Prediction (predict)
- GWAS analysis (gwas)
Job Configuration
- Uses Slurm array jobs for parallel execution
- Configurable CPU, memory, and time limit parameters
- Supports both module-based and conda environments
- Interactive confirmation before job submission
Notes
- Requires a working Slurm environment
- BGLR R package must be installed
- User will be prompted to enter "YES" to confirm job submission
- Job array size is controlled by
SLURM_max_array_jobs_running
Example
using GenomicBreeding, StatsBase;
using GenomicBreeding: cv, fit, predict, gwas, ols, rigde, lasso, bayesa, bayesb, bayesc, gwasols, gwaslmm, gwasreml;
genomes = GenomicBreeding.GenomicBreedingCore.simulategenomes(n=300, l=1_000, n_populations=3, verbose=false);
trials, _ = GenomicBreeding.GenomicBreedingCore.simulatetrials(genomes=genomes, n_years=1, n_seasons=1, n_harvests=1, n_sites=1, n_replications=1, verbose=false);
phenomes = extractphenomes(trials);
fname_geno = try writedelimited(genomes, fname="test-geno.tsv"); catch; rm("test-geno.tsv"); writedelimited(genomes, fname="test-geno.tsv"); end;
fname_pheno = try writedelimited(phenomes, fname="test-pheno.tsv"); catch; rm("test-pheno.tsv"); writedelimited(phenomes, fname="test-pheno.tsv"); end;
# Repeated k-fold cross-validation
input_cv = GBInput(fname_geno=fname_geno, fname_pheno=fname_pheno, analysis=cv, SLURM_account_name="dbiof1", SLURM_cpus_per_task=5, SLURM_mem_G=5, fname_out_prefix="GBOutput/test-", verbose=false);
outdir = submitslurmarrayjobs(input_cv); ### You will be asked to enter "YES" to proceed with job submission.
run(`sh -c 'squeue -u "$USER"'`)
run(`sh -c 'tail slurm-*_*.out'`)
run(`sh -c 'grep -i "err" slurm-*_*.out'`)
cvs = loadcvs(input_cv)
df_across_entries, df_per_entry = tabularise(cvs)
sum(df_across_entries.model .== "bayesa") / nrow(df_across_entries)
sum(df_across_entries.model .== "ridge") / nrow(df_across_entries)
sort(combine(groupby(df_across_entries, [:validation_population, :model]), [:cor => mean, :cor => length]), :cor_mean, rev=true)
# Genomic prediction equation full data fit
input_fit = clone(input_cv)
input_fit.analysis = fit
outdir = submitslurmarrayjobs(input_fit);
input_fit.fname_allele_effects_jld2s = begin
files = readdir(outdir)
idx = findall(.!isnothing.(match.(Regex("-fit-"), files)) .&& .!isnothing.(match.(Regex("jld2$"), files)))
joinpath.(outdir, files[idx])
end
fits = loadfits(input_fit)
length(fits)
input_predict = clone(input_fit)
input_predict.analysis = predict
outdir = submitslurmarrayjobs(input_predict)
run(`squeue`)
GenomicBreedingCore.checkdims
— Methodcheckdims(input::GBInput)::Bool
Check dimension compatibility of the fields of the GBInput struct. Returns true
if both models
and gwas_models
fields in the input are not nothing
, false
otherwise.
Arguments
input::GBInput
: Input structure containing genomic data and models
Returns
Bool
:true
if dimensions are compatible,false
otherwise
Examples
julia> input = GBInput(fname_geno="geno1.jld2", fname_pheno="pheno1.jld2");
julia> checkdims(input)
true
julia> input.models = nothing
julia> checkdims(input)
false
GenomicBreedingCore.clone
— Methodclone(x::GBInput)::GBInput
Create a deep copy of a GBInput object, duplicating all field values into a new instance.
This function allows you to create an independent copy of a GBInput object where modifications to the clone won't affect the original object.
Arguments
x::GBInput
: The source GBInput object to be cloned
Returns
::GBInput
: A new GBInput instance with identical field values
Example
julia> input = GBInput(fname_geno="geno1.jld2", fname_pheno="pheno1.jld2");
julia> copy_input = clone(input);
julia> input == copy_input
true
GenomicBreedingCore.jl
GenomicBreedingIO.jl
GenomicBreedingModels.jl
GenomicBreedingPlots.jl