GenomicBreedingIO

Documentation for GenomicBreedingIO.

GenomicBreedingIO.isfuzzymatchMethod
isfuzzymatch(a::String, b::String; threshold::Float64=0.3)::Bool

Determines if two strings approximately match each other using Levenshtein distance.

The function compares two strings and returns true if they are considered similar enough based on the Levenshtein edit distance and a threshold value. The threshold is applied as a fraction of the length of the shorter string. Additionally, the function normalizes specific string inputs (e.g., "#chr" is replaced with "chrom") before comparison.

Arguments

  • a::String: First string to compare
  • b::String: Second string to compare
  • threshold::Float64=0.3: Maximum allowed edit distance as a fraction of the shorter string length

Returns

  • Bool: true if the strings match within the threshold, false otherwise

Examples

julia> isfuzzymatch("populations", "populations")
true

julia> isfuzzymatch("populations", "poplation")
true

julia> isfuzzymatch("populations", "entry")
false
source
GenomicBreedingIO.levenshteindistanceMethod
levenshteindistance(a::String, b::String)::Int64

Calculate the Levenshtein distance (edit distance) between two strings.

The Levenshtein distance is a measure of the minimum number of single-character edits (insertions, deletions, or substitutions) required to change one string into another.

Arguments

  • a::String: First input string
  • b::String: Second input string

Returns

  • Int64: The minimum number of edits needed to transform string a into string b

Examples

julia> levenshteindistance("populations", "populations")
0

julia> levenshteindistance("populations", "poplation")
2

julia> levenshteindistance("populations", "entry")
3
source
GenomicBreedingIO.readdelimitedMethod
readdelimited(
    type::Type{Genomes};
    fname::String,
    sep::String = "\t",
    parse_populations_from_entries::Union{Nothing,Function} = nothing,
    all_alleles_column::Bool = true,
    verbose::Bool = false
)::Genomes

Load genotype data from a delimited text file into a Genomes struct.

Arguments

  • type::Type{Genomes}: Type parameter (always Genomes)
  • fname::String: Path to the input file
  • sep::String: Delimiter character (default: tab)
  • parse_populations_from_entries::Union{Nothing,Function}: Optional function to extract population names from entry names
  • all_alleles_column::Bool: Whether input file contains all alleles column (default: true)
  • verbose::Bool: Whether to show progress bar during loading

File Format

The input file should be structured as follows:

  • Supported extensions: .tsv, .csv, or .txt
  • Comments and headers start with '#'
  • Header format (2 lines where the second line is optional):
    1. Column names:
      • With allalleles: "chrom,pos,allalleles,allele,entry1,entry2,..."
      • Without allalleles: "chrom,pos,allele,entry1,entry_2,..."
    2. Population names (optional): Same format as line 1 but with population names
  • Data columns:
    1. chromosome identifier
    2. position (integer)
    3. all alleles at locus (if all_alleles=true)
    4. specific allele
    5+. allele frequencies for each entry (0.0-1.0 or missing/NA)

Returns

  • Genomes: A populated Genomes struct containing the loaded data

Throws

  • ErrorException: If file doesn't exist or has invalid format
  • ArgumentError: If column names don't match expected format
  • OverflowError: If allele frequencies are outside [0,1] range

Examples

julia> genomes = GenomicBreedingCore.simulategenomes(n=10, verbose=false);

julia> genomes.entries = [string(genomes.populations[i], "-", genomes.entries[i]) for i in eachindex(genomes.populations)];

julia> fname = writedelimited(genomes);

julia> genomes_reloaded = readdelimited(Genomes, fname=fname);

julia> genomes == genomes_reloaded
true

julia> fname = writedelimited(genomes, include_population_header=false);

julia> genomes_reloaded = readdelimited(Genomes, fname=fname);

julia> unique(genomes_reloaded.populations) == ["Unknown_population"]
true

julia> genomes_reloaded = readdelimited(Genomes, fname=fname, parse_populations_from_entries=x -> split(x, "-")[1]);

julia> genomes == genomes_reloaded
true
source
GenomicBreedingIO.readdelimitedMethod
readdelimited(type::Type{Phenomes}; fname::String, sep::String = "\t", verbose::Bool = false)::Phenomes

Load phenotypic data from a delimited text file into a Phenomes struct.

Arguments

  • type::Type{Phenomes}: Type parameter (must be Phenomes)
  • fname::String: Path to the input file
  • sep::String: Delimiter character (default: tab "\t")
  • verbose::Bool: Whether to show progress bar during loading (default: false)

File Format

The file should be a delimited text file with:

  • Header row containing column names
  • First column: Entry identifiers
  • Second column: Population identifiers
  • Remaining columns: Phenotypic trait values (numeric or missing)

Missing values can be specified as "missing", "NA", "na", "N/A", "n/a" or empty string.

Returns

  • Phenomes: A Phenomes struct containing the loaded phenotypic data

Throws

  • ErrorException: If file doesn't exist or has invalid format
  • ArgumentError: If required columns are missing or misnamed
  • ErrorException: If duplicate entries or traits are found
  • ErrorException: If numeric values cannot be parsed

Notes

  • Comments starting with '#' are ignored
  • Empty lines are skipped
  • Mathematical operators (+,-,*,/,%) in trait names are replaced with underscores
  • Performs dimension checks on the loaded data

Examples

julia> phenomes = Phenomes(n=10, t=3); phenomes.entries = string.("entry_", 1:10); phenomes.populations .= "pop1"; phenomes.traits = ["A", "B", "C"]; phenomes.phenotypes = rand(10,3); phenomes.phenotypes[1,1] = missing; phenomes.mask .= true;

julia> fname = writedelimited(phenomes);

julia> phenomes_reloaded = readdelimited(Phenomes, fname=fname);

julia> phenomes == phenomes_reloaded
true
source
GenomicBreedingIO.readdelimitedMethod
readdelimited(type::Type{Trials}; fname::String, sep::String = "\t", verbose::Bool = false)::Trials

Load a Trials struct from a string-delimited file.

Arguments

  • type::Type{Trials}: Type parameter (must be Trials)
  • fname::String: Path to the input file
  • sep::String = "\t": Delimiter character (default is tab)
  • verbose::Bool = false: Whether to display progress information

Required File Structure

The input file must contain the following 10 identifier columns:

  • years: Year identifiers
  • seasons: Season identifiers
  • harvests: Harvest identifiers
  • sites: Site identifiers
  • entries: Entry identifiers
  • populations: Population identifiers
  • replications: Replication identifiers
  • blocks: Block identifiers
  • rows: Row identifiers
  • cols: Column identifiers

All remaining columns are treated as numeric phenotype measurements. Column names are fuzzy-matched to accommodate slight spelling variations.

Returns

  • Trials: A populated Trials struct containing the loaded data

Notes

  • Missing values can be represented as "missing", "NA", "na", "N/A", "n/a", or empty strings
  • Trait names containing mathematical operators (+, -, *, /, %) are converted to underscores
  • Duplicate trait names are not allowed

Throws

  • ErrorException: If the input file doesn't exist or has invalid format
  • ArgumentError: If required columns are missing or ambiguous

Examples

julia> genomes = GenomicBreedingCore.simulategenomes(n=10, verbose=false);

julia> trials, _ = GenomicBreedingCore.simulatetrials(genomes=genomes, sparsity=0.1, verbose=false);

julia> fname = writedelimited(trials);

julia> trials_reloaded = readdelimited(Trials, fname=fname);

julia> trials == trials_reloaded
true
source
GenomicBreedingIO.readjld2Method
readjld2(type::Type; fname::String)::Type

Load a core (Genomes, Phenomes, and Trials), simulation (SimulatedEffects), or model (TEBV) struct from a JLD2 file.

Arguments

  • type::Type: The type of struct to load (Genomes, Phenomes, Trials, SimulatedEffects, or TEBV)
  • fname::String: Path to the JLD2 file to read from

Returns

  • The loaded struct of the specified type

Throws

  • ArgumentError: If the specified file does not exist
  • DimensionMismatch: If the loaded struct is corrupted

Examples

julia> genomes = GenomicBreedingCore.simulategenomes(n=2, verbose=false);

julia> fname = writejld2(genomes);

julia> readjld2(Genomes, fname=fname) == genomes
true

julia> phenomes = Phenomes(n=2, t=2); phenomes.entries = ["entry_1", "entry_2"]; phenomes.traits = ["trait_1", "trait_2"];

julia> fname = writejld2(phenomes);

julia> readjld2(Phenomes, fname=fname) == phenomes
true

julia> trials, _ = simulatetrials(genomes=genomes, verbose=false);

julia> fname = writejld2(trials);

julia> readjld2(Trials, fname=fname) == trials
true
source
GenomicBreedingIO.readvcfMethod
readvcf(; fname::String, field::String = "any", min_depth::Int64 = 5, max_depth::Int64 = 100, verbose::Bool = false)::Genomes

Read genetic data from a VCF (Variant Call Format) file into a Genomes struct.

Arguments

  • fname::String: Path to the VCF file. Can be gzipped (.vcf.gz or .vcf.bgz) or uncompressed (.vcf)
  • field::String="any": Which FORMAT field to extract from VCF. Default "any" tries to automatically detect genotype field
  • min_depth::Int64=5: Minimum read depth threshold for AD (Allele Depth) field
  • max_depth::Int64=100: Maximum read depth threshold for AD field
  • verbose::Bool=false: Whether to print progress and debug information

Returns

  • Genomes: A Genomes struct containing the loaded genetic data with fields:
    • allele_frequencies: Matrix of allele frequencies
    • loci_alleles: Vector of locus-allele combination strings
    • mask: Boolean matrix indicating missing data
    • samples: Vector of sample names

Details

Reads VCF files in parallel using multiple threads. Handles multi-allelic variants and different ploidies. Field priority (when field="any"):

  1. AF (Allele Frequency)
  2. AD (Allele Depth)
  3. GT (Genotype)

Performs various checks on the input data including:

  • File existence
  • No duplicate loci-allele combinations
  • Consistent dimensions in output struct

Throws

  • ErrorException: If file doesn't exist, has duplicates, or output dimensions are invalid

Examples

julia> genomes = GenomicBreedingCore.simulategenomes(n=10, verbose=false);

julia> fname = writevcf(genomes);

julia> fname_gz = writevcf(genomes, gzip=true);

julia> genomes_reloaded = readvcf(fname=fname);

julia> genomes_reloaded_gz = readvcf(fname=fname_gz);

julia> genomes.entries == genomes_reloaded.entries == genomes_reloaded_gz.entries
true

julia> dimensions(genomes) == dimensions(genomes_reloaded) == dimensions(genomes_reloaded_gz)
true

julia> ismissing.(genomes.allele_frequencies) == ismissing.(genomes_reloaded.allele_frequencies) == ismissing.(genomes_reloaded_gz.allele_frequencies)
true
source
GenomicBreedingIO.vcfchunkifyMethod
vcfchunkify(fname::String; n_loci::Int64, verbose::Bool = false)::Tuple{Vector{Int64},Vector{Int64},Vector{Int64},Vector{Int64}}

Divide a VCF file into chunks for parallel processing.

Arguments

  • fname::String: Path to the VCF file (can be .vcf, .vcf.gz, or .vcf.bgz)
  • n_loci::Int64: Total number of loci in the VCF file
  • verbose::Bool=false: If true, prints progress information

Returns

A tuple containing four Vector{Int64} arrays:

  1. Starting loci indices for each thread
  2. Ending loci indices for each thread
  3. Starting file positions for each thread
  4. Ending file positions for each thread

Details

  • Automatically detects if the input file is gzipped
  • Divides the workload evenly across available threads
  • Skips header lines (starting with '#')
  • Handles both regular and gzipped VCF files

Examples

julia> genomes = GenomicBreedingCore.simulategenomes(n=10, verbose=false);

julia> fname = writevcf(genomes);

julia> _, n_loci, n_alt_alleles = vcfcountlocialleles(fname);

julia> idx_loci_per_thread_ini, idx_loci_per_thread_fin, file_pos_per_thread_ini, file_pos_per_thread_fin = vcfchunkify(fname, n_loci=n_loci);

julia> length(idx_loci_per_thread_ini) == length(idx_loci_per_thread_fin) == length(file_pos_per_thread_ini) == length(file_pos_per_thread_fin)
true

julia> (idx_loci_per_thread_ini[1] == 0) && (sum(idx_loci_per_thread_ini .== 0) == 1)
true

julia> (idx_loci_per_thread_fin[end] == n_loci) && (sum(idx_loci_per_thread_fin .== 0) == 0)
true

julia> (sum(file_pos_per_thread_ini .== 0) == 0) && (sum(file_pos_per_thread_fin .== 0) == 0)
true

julia> rm(fname);
source
GenomicBreedingIO.vcfcountlociallelesMethod
vcfcountlocialleles(fname::String; verbose::Bool = false)::Tuple{Int64,Int64}

Count the number of loci and total lines in a VCF file.

Arguments

  • fname::String: Path to the VCF file. Can be either a plain text VCF file or a gzipped VCF file (with extensions .vcf.gz or .vcf.bgz)
  • verbose::Bool: If true, prints progress messages and results to stdout. Defaults to false.

Returns

  • Tuple{Int64,Int64}: A tuple containing:
    • First element: Total number of lines in the file (including headers)
    • Second element: Number of data lines (variants/loci) excluding header lines

Description

Reads through a VCF (Variant Call Format) file and counts:

  1. Total lines in the file (including headers)
  2. Number of data lines (variants/loci) that don't start with '#'

The function automatically detects and handles different file formats:

  • Plain text VCF files (.vcf)
  • Gzipped VCF files (.vcf.gz)
  • BGZipped VCF files (.vcf.bgz)

Examples

julia> genomes = GenomicBreedingCore.simulategenomes(n=10, verbose=false);

julia> fname = writevcf(genomes);

julia> fname_gz = writevcf(genomes, gzip=true);

julia> n_1, p_1, l_1 = vcfcountlocialleles(fname);

julia> n_2, p_2, l_2 = vcfcountlocialleles(fname_gz);

julia> n_1 == n_2 == 10_009
true

julia> p_1 == p_2 == 10_000
true

julia> l_1 == l_2 == 10_000
true

julia> rm.([fname, fname_gz]);
source
GenomicBreedingIO.vcfextractallelefreqs!Method
vcfextractallelefreqs!(genomes::Genomes, pb::Union{Nothing,Progress}, i::Vector{Int64}; 
                      fname::String, line::Vector{String}, line_counter::Int64, 
                      field::String, min_depth::Int64=10, max_depth::Int64=100, 
                      verbose::Bool=false)

Extract allele frequencies from VCF file data and update a Genomes object.

Arguments

  • genomes::Genomes: Object to store genomic data
  • pb::Union{Nothing,Progress}: Progress bar object or nothing
  • i::Vector{Int64}: Single-element vector containing current locus-allele index
  • fname::String: Name of VCF file being processed
  • line::Vector{String}: Current line from VCF file split into fields
  • line_counter::Int64: Current line number in VCF file
  • field::String: Type of field to extract ("AF", "AD", or "GT")
  • min_depth::Int64=10: Minimum read depth threshold for AD field
  • max_depth::Int64=100: Maximum read depth threshold for AD field
  • verbose::Bool=false: Whether to display progress updates

Returns

Nothing; Updates the input parameters in place:

  • genomes: Updated with new allele frequencies and loci information
  • pb: Advanced if verbose=true
  • i: Index incremented based on processed alleles

Description

Processes VCF data to extract allele frequencies of the alternative allele/s using one of three methods:

  • AF field: Direct frequency values from VCF
  • AD field: Calculated from read depths (filtered by mindepth and maxdepth)
  • GT field: Calculated from genotype calls

Updates Genomes object with:

  • Loci-allele identifiers (chromosome, position, alleles)
  • Allele frequencies for each sample

Throws

  • ArgumentError: If field parameter is not "AF", "AD", or "GT"
  • ErrorException: If unable to parse AF or AD fields from VCF

Examples

julia> genomes = GenomicBreedingCore.simulategenomes(n=10, verbose=false);

julia> fname = writevcf(genomes);

julia> _, _, n_alt_alleles = vcfcountlocialleles(fname);

julia> entries, format_lines = vcfextractentriesandformats(fname);

julia> field, n_alleles, _ = vcfextractinfo(fname, format_lines=format_lines);

julia> genomes_instantiated = vcfinstantiateoutput(fname, entries=entries, n_alt_alleles=n_alt_alleles);

julia> sum(ismissing.(genomes_instantiated.allele_frequencies[:, 1])) == length(entries)
true

julia> file = open(fname, "r"); line::Vector{String} = split([readline(file) for i in 1:10][end], "	"); close(file);

julia> vcfextractallelefreqs!(genomes_instantiated, nothing, [0], fname=fname, line=line, line_counter=10, field=field);

julia> sum(ismissing.(genomes_instantiated.allele_frequencies[:, 1])) == 0
true

julia> rm(fname);
source
GenomicBreedingIO.vcfextractentriesandformatsMethod
vcfextractentriesandformats(fname::String; verbose::Bool = false)::Tuple{Vector{String},Vector{String}}

Extract sample entries and format definitions from a VCF file.

Arguments

  • fname::String: Path to the VCF file (can be gzipped with extensions .vcf.gz or .vcf.bgz)
  • verbose::Bool=false: If true, prints progress information to stdout

Returns

A tuple containing:

  1. Vector{String}: List of sample names from the VCF header
  2. Vector{String}: List of FORMAT field definitions from the VCF metadata

Description

Reads a VCF file and extracts two key pieces of information:

  1. Sample names from the header line (columns after FORMAT field)
  2. FORMAT field definitions from metadata lines starting with "##FORMAT"

The function validates the presence and correct order of standard VCF columns: CHROM, POS, ID, REF, ALT, QUAL, FILTER, INFO, and FORMAT

Throws

  • ArgumentError: If VCF has fewer than expected columns or column names don't match VCF format

Examples

julia> genomes = GenomicBreedingCore.simulategenomes(n=10, verbose=false);

julia> fname = writevcf(genomes);

julia> entries, format_lines = vcfextractentriesandformats(fname);

julia> entries
10-element Vector{String}:
 "entry_01"
 "entry_02"
 "entry_03"
 "entry_04"
 "entry_05"
 "entry_06"
 "entry_07"
 "entry_08"
 "entry_09"
 "entry_10"

julia> format_lines
3-element Vector{String}:
 "##FORMAT=<ID=GT,Number=1,Type=String,Description=\"Genotype\">"
 "##FORMAT=<ID=AD,Number=2,Type=Float,Description=\"Allele Depth\">"
 "##FORMAT=<ID=AF,Number=2,Type=Float,Description=\"Allele Frequency\">"

julia> rm(fname);
source
GenomicBreedingIO.vcfextractinfoMethod
vcfextractinfo(fname::String; format_lines::Vector{String}, field::String="any", verbose::Bool=false)::Tuple{String,Int64,Int64}

Extract information about genotype fields from a VCF file.

Arguments

  • fname::String: Path to the VCF file (can be gzipped)
  • format_lines::Vector{String}: Vector containing FORMAT lines from the VCF header
  • field::String="any": Specific field to extract ("GT", "AD", "AF", or "any")
  • verbose::Bool=false: If true, prints progress information

Returns

A tuple containing:

  1. field::String: The identified genotype field
  2. n_alleles::Int64: Maximum number of alleles per locus
  3. ploidy::Int64: Ploidy level (only meaningful for GT field; set to typemax(Int64) for AD and AF fields)

Details

  • If field is "any", searches for fields in priority order: AF > AD > GT
  • For GT field, scans entire file to determine maximum number of alleles and ploidy
  • For AF and AD fields, extracts allele count from format header
  • Supports both gzipped (.gz, .bgz) and uncompressed VCF files

Throws

  • ArgumentError: If specified field is not found in the VCF file
  • ErrorException: If unable to parse number of alleles from format header

Examples

julia> genomes = GenomicBreedingCore.simulategenomes(n=10, verbose=false);

julia> fname = writevcf(genomes);

julia> _, format_lines = vcfextractentriesandformats(fname);

julia> field, n_alleles, ploidy = vcfextractinfo(fname, format_lines=format_lines);

julia> (field == "AF") && (n_alleles == 2) && (ploidy == typemax(Int64))
true

julia> rm(fname);
source
GenomicBreedingIO.vcfinstantiateoutputMethod
vcfinstantiateoutput(fname::String; entries::Vector{String}, n_alt_alleles::Int64, verbose::Bool = false)::Genomes

Create and initialize a Genomes struct from VCF file parameters.

Arguments

  • fname::String: Name of the VCF file being processed
  • entries::Vector{String}: Vector containing entry identifiers
  • n_alt_alleles::Int64: Total number of alternative alleles across all loci
  • verbose::Bool=false: If true, prints progress information

Returns

  • Genomes: An initialized Genomes struct with:
    • dimensions n × p where n is number of entries and p = naltalleles
    • entry names assigned from input entries
    • populations set to "unknown"
    • mask set to true

Throws

  • ErrorException: If duplicate entries are found in the VCF file

Examples

julia> genomes = GenomicBreedingCore.simulategenomes(n=10, verbose=false);

julia> fname = writevcf(genomes);

julia> entries, format_lines = vcfextractentriesandformats(fname);

julia> _, _, n_alt_alleles = vcfcountlocialleles(fname);

julia> genomes_instantiated = vcfinstantiateoutput(fname, entries=entries, n_alt_alleles=n_alt_alleles);

julia> size(genomes_instantiated.allele_frequencies)
(10, 10000)

julia> genomes_instantiated.entries == entries
true

julia> rm(fname);
source
GenomicBreedingIO.vcfparsecoordinatesMethod
vcfparsecoordinates(; line::Vector{String}, line_counter::Int64, field::String)::Union{Nothing,Tuple{Int64,String,Int64,Vector{String}}}

Parse coordinates and allele information from a VCF file line.

Arguments

  • line::Vector{String}: A vector containing the split line from VCF file
  • line_counter::Int64: Current line number being processed in the VCF file
  • field::String: The field name to extract allele frequencies from

Returns

  • Nothing: If the specified field is not found in the line
  • Tuple{Int64,String,Int64,Vector{String}}: A tuple containing:
    • Field index
    • Chromosome name
    • Position
    • Combined reference and alternative alleles

Throws

  • ErrorException: If the position field cannot be parsed as an integer

Note

The function validates the line format and extracts genomic coordinates and allele information from a VCF file line. It handles missing alternative alleles (denoted by ".") and performs necessary type conversions.

Examples

julia> genomes = GenomicBreedingCore.simulategenomes(n=10, verbose=false);

julia> fname = writevcf(genomes);

julia> entries, format_lines = vcfextractentriesandformats(fname);

julia> field, _, _ = vcfextractinfo(fname, format_lines=format_lines);

julia> file = open(fname, "r"); line::Vector{String} = split([readline(file) for i in 1:10][end], "	"); close(file);

julia> idx_field, chrom, pos, refalts = vcfparsecoordinates(line=line, line_counter=10, field=field);

julia> (idx_field == 3) && (chrom == line[1]) && (pos == parse(Int64, line[2])) && (refalts == line[4:5])
true

julia> rm(fname);
source
GenomicBreedingIO.writedelimitedMethod
writedelimited(
    genomes::Genomes;
    fname::Union{Missing,String} = missing,
    sep::String = "\t",
    include_population_header::Bool = true
)::String

Write genomic data to a delimited text file.

Arguments

  • genomes::Genomes: A Genomes struct containing the genomic data to be written
  • fname::Union{Missing,String}: Output filename. If missing, generates an automatic filename with timestamp
  • sep::String: Delimiter character for the output file (default: tab)
  • include_population_header::Bool: Whether to include population information in the header (default: true)

Returns

  • String: Path to the created output file

File Format

The output file contains:

  1. Header lines (prefixed with '#'):
    • First line: chromosome, position, alleles, and entry information
    • Second line (optional): population information
  2. Data rows with the following columns:
    • Column 1: Chromosome identifier
    • Column 2: Position
    • Column 3: All alleles at the locus (pipe-separated)
    • Column 4: Specific allele
    • Remaining columns: Frequency data for each entry

Supported File Extensions

  • '.tsv' (tab-separated, default)
  • '.csv' (comma-separated)
  • '.txt' (custom delimiter)

Throws

  • DimensionMismatch: If the input Genomes struct is corrupted
  • ErrorException: If the output file already exists
  • ArgumentError: If the file extension is invalid or the output directory doesn't exist

Examples

julia> genomes = GenomicBreedingCore.simulategenomes(n=2, verbose=false);

julia> writedelimited(genomes, fname="test_genomes.tsv")
"test_genomes.tsv"
source
GenomicBreedingIO.writedelimitedMethod
writedelimited(phenomes::Phenomes; fname::Union{Missing,String} = missing, sep::String = "	")::String

Write phenotypic data from a Phenomes struct to a delimited text file.

Arguments

  • phenomes::Phenomes: A Phenomes struct containing phenotypic data
  • fname::Union{Missing,String} = missing: Output filename. If missing, generates an automatic filename with timestamp
  • sep::String = " ": Delimiter character for the output file

Returns

  • String: The name of the created file

File Format

  • Header line starts with '#' containing column names
  • First column: Entry names
  • Second column: Population names
  • Remaining columns: Trait values
  • Missing values are represented as "NA"

File Extensions

Supported file extensions:

  • .tsv for tab-separated files (default)
  • .csv for comma-separated files
  • .txt for other delimiters

Throws

  • DimensionMismatch: If the Phenomes struct dimensions are inconsistent
  • ErrorException: If the output file already exists
  • ArgumentError: If the file extension is invalid or the directory doesn't exist

Examples

julia> phenomes = Phenomes(n=2, t=2); phenomes.entries = ["entry_1", "entry_2"]; phenomes.traits = ["trait_1", "trait_2"];

julia> writedelimited(phenomes, fname="test_phenomes.tsv")
"test_phenomes.tsv"
source
GenomicBreedingIO.writedelimitedMethod
writedelimited(trials::Trials; fname::Union{Missing,String} = missing, sep::String = "	", 
              overwrite::Bool = false, verbose::Bool = false)::String

Write a Trials struct to a delimited text file, returning the filename.

Arguments

  • trials::Trials: The trials data structure to be written
  • fname::Union{Missing,String} = missing: Output filename. If missing, generates automatic filename with timestamp
  • sep::String = " ": Delimiter character between fields
  • overwrite::Bool = false: Whether to overwrite existing output file if it exists
  • verbose::Bool = false: Whether to show progress bar during writing

Returns

  • String: The name of the file that was written

File Format

The output file contains one header line and one line per trial entry. Header line is prefixed with '#' and contains column names.

Fixed Columns (1-10)

  1. years
  2. seasons
  3. harvests
  4. sites
  5. entries
  6. populations
  7. replications
  8. blocks
  9. rows
  10. cols

Variable Columns (11+)

  • Additional columns contain phenotype traits values
  • Missing values are written as "NA"

Notes

  • Supported file extensions: .tsv, .csv, or .txt
  • File extension is automatically determined based on separator if filename is missing:
    • \t.tsv
    • , or ;.csv
    • other → .txt
  • Will throw error if file exists and overwrite=false
  • Directory must exist if path is specified in filename

Examples

julia> trials = Trials(n=1, t=2); trials.years = ["year_1"]; trials.seasons = ["season_1"]; trials.harvests = ["harvest_1"]; trials.sites = ["site_1"]; trials.entries = ["entry_1"]; trials.populations = ["population_1"]; trials.replications = ["replication_1"]; trials.blocks = ["block_1"]; trials.rows = ["row_1"]; trials.cols = ["col_1"]; trials.traits = ["trait_1", "trait_2"];

julia> writedelimited(trials, fname="test_trials.tsv")
"test_trials.tsv"
source
GenomicBreedingIO.writejld2Method
writejld2(A::Union{Genomes,Phenomes,Trials,SimulatedEffects,TEBV}; fname::Union{Missing,String} = missing, overwrite::Bool=false)::String

Save genomic breeding core data structures to a JLD2 file (HDF5-compatible format).

Arguments

  • A: A genomic breeding data structure (Genomes, Phenomes, Trials, SimulatedEffects, or TEBV)
  • fname: Optional. Output filename. If missing, generates an automatic name with timestamp
  • overwrite: Optional. If true, overwrites existing file with same name. Default is false

Returns

  • String: Path to the saved JLD2 file

File Naming

  • If fname is not provided, generates name: "output-[Type]-[Timestamp].jld2"
  • If fname is provided, must have ".jld2" extension

Throws

  • DimensionMismatch: If input structure has invalid dimensions
  • ErrorException: If output file already exists and overwrite=false
  • ArgumentError: If invalid file extension or directory path

Notes

  • Files are saved with compression enabled
  • Data is stored as a Dictionary with single key-value pair
  • Key is the string representation of the input type

Examples

julia> genomes = GenomicBreedingCore.simulategenomes(n=2, verbose=false);

julia> writejld2(genomes, fname="test_genomes.jld2")
"test_genomes.jld2"

julia> genomes_reloaded = load("test_genomes.jld2");

julia> genomes_reloaded[collect(keys(genomes_reloaded))[1]] == genomes
true

julia> phenomes = Phenomes(n=2, t=2); phenomes.entries = ["entry_1", "entry_2"]; phenomes.traits = ["trait_1", "trait_2"];

julia> writejld2(phenomes, fname="test_phenomes.jld2")
"test_phenomes.jld2"

julia> phenomes_reloaded = load("test_phenomes.jld2");

julia> phenomes_reloaded[collect(keys(phenomes_reloaded))[1]] == phenomes
true

julia> trials, _ = simulatetrials(genomes=genomes, verbose=false);

julia> writejld2(trials, fname="test_trials.jld2")
"test_trials.jld2"

julia> trials_reloaded = load("test_trials.jld2");

julia> trials_reloaded[collect(keys(trials_reloaded))[1]] == trials
true

julia> simulated_effects = SimulatedEffects();

julia> writejld2(simulated_effects, fname="test_simulated_effects.jld2")
"test_simulated_effects.jld2"

julia> simulated_effects_reloaded = load("test_simulated_effects.jld2");

julia> simulated_effects_reloaded[collect(keys(simulated_effects_reloaded))[1]] == simulated_effects
true

julia> trials, _simulated_effects = GenomicBreedingCore.simulatetrials(genomes = GenomicBreedingCore.simulategenomes(n=10, verbose=false), n_years=1, n_seasons=1, n_harvests=1, n_sites=1, n_replications=10, verbose=false);

julia> tebv = analyse(trials, max_levels=50, verbose=false);

julia> writejld2(tebv, fname="test_tebv.jld2")
"test_tebv.jld2"

julia> tebv_reloaded = load("test_tebv.jld2");

julia> tebv_reloaded[collect(keys(tebv_reloaded))[1]] == tebv
true
source
GenomicBreedingIO.writevcfMethod
writevcf(genomes::Genomes; fname::Union{Missing,String} = missing, ploidy::Int64 = 0, 
         max_depth::Int64 = 100, n_decimal_places::Int64 = 4, gzip::Bool = false)::String

Write genomic data to a Variant Call Format (VCF) file.

Arguments

  • genomes::Genomes: A Genomes object containing the genetic data to be written.
  • fname::Union{Missing,String} = missing: Output filename. If missing, generates a default name with timestamp.
  • ploidy::Int64 = 0: The ploidy level of the organisms (e.g., 2 for diploid).
  • max_depth::Int64 = 100: Maximum depth for allele depth calculation.
  • n_decimal_places::Int64 = 4: Number of decimal places for rounding allele frequencies.
  • gzip::Bool = false: Whether to compress the output file using gzip.

Returns

  • String: The name of the created VCF file.

Description

Creates a VCF v4.2 format file containing genomic variants data. The function processes allele frequencies and depths, calculates genotypes based on ploidy, and formats the data according to VCF specifications. The output includes:

  • Standard VCF header information
  • Sample information with FORMAT fields:
    • GT (Genotype)
    • AD (Allele Depth)
    • AF (Allele Frequency)

Throws

  • DimensionMismatch: If the input Genomes object has inconsistent dimensions
  • ErrorException: If the output file already exists
  • ArgumentError: If the file extension is not '.vcf' or if the specified directory doesn't exist

Examples

julia> genomes_1 = GenomicBreedingCore.simulategenomes(n=2, verbose=false);

julia> writevcf(genomes_1, fname="test_genomes_1.vcf")
"test_genomes_1.vcf"

julia> genomes_2 = GenomicBreedingCore.simulategenomes(n=2, n_alleles=3, verbose=false);

julia> genomes_2.allele_frequencies = round.(genomes_2.allele_frequencies .* 4) ./ 4;

julia> writevcf(genomes_2, fname="test_genomes_2.vcf", ploidy=4)
"test_genomes_2.vcf"

julia> genomes_3 = GenomicBreedingCore.simulategenomes(n=3, verbose=false);

julia> writevcf(genomes_3, fname="test_genomes_3.vcf", gzip=true)
"test_genomes_3.vcf.gz"
source