GenomicBreedingIO

Documentation for GenomicBreedingIO.

GenomicBreedingIO.isfuzzymatch
GenomicBreedingIO.levenshteindistance
GenomicBreedingIO.readdelimited
GenomicBreedingIO.readdelimited
GenomicBreedingIO.readdelimited
GenomicBreedingIO.readjld2
GenomicBreedingIO.readvcf
GenomicBreedingIO.vcfchunkify
GenomicBreedingIO.vcfcountlocialleles
GenomicBreedingIO.vcfextractallelefreqs!
GenomicBreedingIO.vcfextractentriesandformats
GenomicBreedingIO.vcfextractinfo
GenomicBreedingIO.vcfinstantiateoutput
GenomicBreedingIO.vcfparsecoordinates
GenomicBreedingIO.writedelimited
GenomicBreedingIO.writedelimited
GenomicBreedingIO.writedelimited
GenomicBreedingIO.writejld2
GenomicBreedingIO.writevcf

GenomicBreedingIO.isfuzzymatch — Method

isfuzzymatch(a::String, b::String; threshold::Float64=0.3)::Bool

Determines if two strings approximately match each other using Levenshtein distance.

The function compares two strings and returns true if they are considered similar enough based on the Levenshtein edit distance and a threshold value. The threshold is applied as a fraction of the length of the shorter string. Additionally, the function normalizes specific string inputs (e.g., "#chr" is replaced with "chrom") before comparison.

Arguments

a::String: First string to compare
b::String: Second string to compare
threshold::Float64=0.3: Maximum allowed edit distance as a fraction of the shorter string length

Returns

Bool: true if the strings match within the threshold, false otherwise

Examples

julia> isfuzzymatch("populations", "populations")
true

julia> isfuzzymatch("populations", "poplation")
true

julia> isfuzzymatch("populations", "entry")
false

GenomicBreedingIO.levenshteindistance — Method

levenshteindistance(a::String, b::String)::Int64

Calculate the Levenshtein distance (edit distance) between two strings.

The Levenshtein distance is a measure of the minimum number of single-character edits (insertions, deletions, or substitutions) required to change one string into another.

Arguments

a::String: First input string
b::String: Second input string

Returns

Int64: The minimum number of edits needed to transform string a into string b

Examples

julia> levenshteindistance("populations", "populations")
0

julia> levenshteindistance("populations", "poplation")
2

julia> levenshteindistance("populations", "entry")
3

GenomicBreedingIO.readdelimited — Method

readdelimited(
    type::Type{Genomes};
    fname::String,
    sep::String = "\t",
    parse_populations_from_entries::Union{Nothing,Function} = nothing,
    all_alleles_column::Bool = true,
    verbose::Bool = false
)::Genomes

Load genotype data from a delimited text file into a Genomes struct.

Arguments

type::Type{Genomes}: Type parameter (always Genomes)
fname::String: Path to the input file
sep::String: Delimiter character (default: tab)
parse_populations_from_entries::Union{Nothing,Function}: Optional function to extract population names from entry names
all_alleles_column::Bool: Whether input file contains all alleles column (default: true)
verbose::Bool: Whether to show progress bar during loading

File Format

The input file should be structured as follows:

Supported extensions: .tsv, .csv, or .txt
Comments and headers start with '#'
Header format (2 lines where the second line is optional):
1. Column names:
  - With allalleles: "chrom,pos,allalleles,allele,entry1,entry2,..."
  - Without allalleles: "chrom,pos,allele,entry1,entry_2,..."
2. Population names (optional): Same format as line 1 but with population names
Data columns:
1. chromosome identifier
2. position (integer)
3. all alleles at locus (if all_alleles=true)
4. specific allele
5+. allele frequencies for each entry (0.0-1.0 or missing/NA)

Returns

Genomes: A populated Genomes struct containing the loaded data

Throws

ErrorException: If file doesn't exist or has invalid format
ArgumentError: If column names don't match expected format
OverflowError: If allele frequencies are outside [0,1] range

Examples

julia> genomes = GenomicBreedingCore.simulategenomes(n=10, verbose=false);

julia> genomes.entries = [string(genomes.populations[i], "-", genomes.entries[i]) for i in eachindex(genomes.populations)];

julia> fname = writedelimited(genomes);

julia> genomes_reloaded = readdelimited(Genomes, fname=fname);

julia> genomes == genomes_reloaded
true

julia> fname = writedelimited(genomes, include_population_header=false);

julia> genomes_reloaded = readdelimited(Genomes, fname=fname);

julia> unique(genomes_reloaded.populations) == ["Unknown_population"]
true

julia> genomes_reloaded = readdelimited(Genomes, fname=fname, parse_populations_from_entries=x -> split(x, "-")[1]);

julia> genomes == genomes_reloaded
true

GenomicBreedingIO.readdelimited — Method

readdelimited(type::Type{Phenomes}; fname::String, sep::String = "\t", verbose::Bool = false)::Phenomes

Load phenotypic data from a delimited text file into a Phenomes struct.

Arguments

type::Type{Phenomes}: Type parameter (must be Phenomes)
fname::String: Path to the input file
sep::String: Delimiter character (default: tab "\t")
verbose::Bool: Whether to show progress bar during loading (default: false)

File Format

The file should be a delimited text file with:

Header row containing column names
First column: Entry identifiers
Second column: Population identifiers
Remaining columns: Phenotypic trait values (numeric or missing)

Missing values can be specified as "missing", "NA", "na", "N/A", "n/a" or empty string.

Returns

Phenomes: A Phenomes struct containing the loaded phenotypic data

Throws

ErrorException: If file doesn't exist or has invalid format
ArgumentError: If required columns are missing or misnamed
ErrorException: If duplicate entries or traits are found
ErrorException: If numeric values cannot be parsed

Notes

Comments starting with '#' are ignored
Empty lines are skipped
Mathematical operators (+,-,*,/,%) in trait names are replaced with underscores
Performs dimension checks on the loaded data

Examples

julia> phenomes = Phenomes(n=10, t=3); phenomes.entries = string.("entry_", 1:10); phenomes.populations .= "pop1"; phenomes.traits = ["A", "B", "C"]; phenomes.phenotypes = rand(10,3); phenomes.phenotypes[1,1] = missing; phenomes.mask .= true;

julia> fname = writedelimited(phenomes);

julia> phenomes_reloaded = readdelimited(Phenomes, fname=fname);

julia> phenomes == phenomes_reloaded
true

GenomicBreedingIO.readdelimited — Method

readdelimited(type::Type{Trials}; fname::String, sep::String = "\t", verbose::Bool = false)::Trials

Load a Trials struct from a string-delimited file.

Arguments

type::Type{Trials}: Type parameter (must be Trials)
fname::String: Path to the input file
sep::String = "\t": Delimiter character (default is tab)
verbose::Bool = false: Whether to display progress information

Required File Structure

The input file must contain the following 10 identifier columns:

years: Year identifiers
seasons: Season identifiers
harvests: Harvest identifiers
sites: Site identifiers
entries: Entry identifiers
populations: Population identifiers
replications: Replication identifiers
blocks: Block identifiers
rows: Row identifiers
cols: Column identifiers

All remaining columns are treated as numeric phenotype measurements. Column names are fuzzy-matched to accommodate slight spelling variations.

Returns

Trials: A populated Trials struct containing the loaded data

Notes

Missing values can be represented as "missing", "NA", "na", "N/A", "n/a", or empty strings
Trait names containing mathematical operators (+, -, *, /, %) are converted to underscores
Duplicate trait names are not allowed

Throws

ErrorException: If the input file doesn't exist or has invalid format
ArgumentError: If required columns are missing or ambiguous

Examples

julia> genomes = GenomicBreedingCore.simulategenomes(n=10, verbose=false);

julia> trials, _ = GenomicBreedingCore.simulatetrials(genomes=genomes, sparsity=0.1, verbose=false);

julia> fname = writedelimited(trials);

julia> trials_reloaded = readdelimited(Trials, fname=fname);

julia> trials == trials_reloaded
true

GenomicBreedingIO.readjld2 — Method

readjld2(type::Type; fname::String)::Type

Load a core (Genomes, Phenomes, and Trials), simulation (SimulatedEffects), or model (TEBV) struct from a JLD2 file.

Arguments

type::Type: The type of struct to load (Genomes, Phenomes, Trials, SimulatedEffects, or TEBV)
fname::String: Path to the JLD2 file to read from

Returns

The loaded struct of the specified type

Throws

ArgumentError: If the specified file does not exist
DimensionMismatch: If the loaded struct is corrupted

Examples

julia> genomes = GenomicBreedingCore.simulategenomes(n=2, verbose=false);

julia> fname = writejld2(genomes);

julia> readjld2(Genomes, fname=fname) == genomes
true

julia> phenomes = Phenomes(n=2, t=2); phenomes.entries = ["entry_1", "entry_2"]; phenomes.traits = ["trait_1", "trait_2"];

julia> fname = writejld2(phenomes);

julia> readjld2(Phenomes, fname=fname) == phenomes
true

julia> trials, _ = simulatetrials(genomes=genomes, verbose=false);

julia> fname = writejld2(trials);

julia> readjld2(Trials, fname=fname) == trials
true

GenomicBreedingIO.readvcf — Method

readvcf(; fname::String, field::String = "any", min_depth::Int64 = 5, max_depth::Int64 = 100, verbose::Bool = false)::Genomes

Read genetic data from a VCF (Variant Call Format) file into a Genomes struct.

Arguments

fname::String: Path to the VCF file. Can be gzipped (.vcf.gz or .vcf.bgz) or uncompressed (.vcf)
field::String="any": Which FORMAT field to extract from VCF. Default "any" tries to automatically detect genotype field
min_depth::Int64=5: Minimum read depth threshold for AD (Allele Depth) field
max_depth::Int64=100: Maximum read depth threshold for AD field
verbose::Bool=false: Whether to print progress and debug information

Returns

Genomes: A Genomes struct containing the loaded genetic data with fields:
- allele_frequencies: Matrix of allele frequencies
- loci_alleles: Vector of locus-allele combination strings
- mask: Boolean matrix indicating missing data
- samples: Vector of sample names

Details

Reads VCF files in parallel using multiple threads. Handles multi-allelic variants and different ploidies. Field priority (when field="any"):

AF (Allele Frequency)
AD (Allele Depth)
GT (Genotype)

Performs various checks on the input data including:

File existence
No duplicate loci-allele combinations
Consistent dimensions in output struct

Throws

ErrorException: If file doesn't exist, has duplicates, or output dimensions are invalid

Examples

julia> genomes = GenomicBreedingCore.simulategenomes(n=10, verbose=false);

julia> fname = writevcf(genomes);

julia> fname_gz = writevcf(genomes, gzip=true);

julia> genomes_reloaded = readvcf(fname=fname);

julia> genomes_reloaded_gz = readvcf(fname=fname_gz);

julia> genomes.entries == genomes_reloaded.entries == genomes_reloaded_gz.entries
true

julia> dimensions(genomes) == dimensions(genomes_reloaded) == dimensions(genomes_reloaded_gz)
true

julia> ismissing.(genomes.allele_frequencies) == ismissing.(genomes_reloaded.allele_frequencies) == ismissing.(genomes_reloaded_gz.allele_frequencies)
true

GenomicBreedingIO.vcfchunkify — Method

vcfchunkify(fname::String; n_loci::Int64, verbose::Bool = false)::Tuple{Vector{Int64},Vector{Int64},Vector{Int64},Vector{Int64}}

Divide a VCF file into chunks for parallel processing.

Arguments

fname::String: Path to the VCF file (can be .vcf, .vcf.gz, or .vcf.bgz)
n_loci::Int64: Total number of loci in the VCF file
verbose::Bool=false: If true, prints progress information

Returns

A tuple containing four Vector{Int64} arrays:

Starting loci indices for each thread
Ending loci indices for each thread
Starting file positions for each thread
Ending file positions for each thread

Details

Automatically detects if the input file is gzipped
Divides the workload evenly across available threads
Skips header lines (starting with '#')
Handles both regular and gzipped VCF files

Examples

julia> genomes = GenomicBreedingCore.simulategenomes(n=10, verbose=false);

julia> fname = writevcf(genomes);

julia> _, n_loci, n_alt_alleles = vcfcountlocialleles(fname);

julia> idx_loci_per_thread_ini, idx_loci_per_thread_fin, file_pos_per_thread_ini, file_pos_per_thread_fin = vcfchunkify(fname, n_loci=n_loci);

julia> length(idx_loci_per_thread_ini) == length(idx_loci_per_thread_fin) == length(file_pos_per_thread_ini) == length(file_pos_per_thread_fin)
true

julia> (idx_loci_per_thread_ini[1] == 0) && (sum(idx_loci_per_thread_ini .== 0) == 1)
true

julia> (idx_loci_per_thread_fin[end] == n_loci) && (sum(idx_loci_per_thread_fin .== 0) == 0)
true

julia> (sum(file_pos_per_thread_ini .== 0) == 0) && (sum(file_pos_per_thread_fin .== 0) == 0)
true

julia> rm(fname);

GenomicBreedingIO.vcfcountlocialleles — Method

vcfcountlocialleles(fname::String; verbose::Bool = false)::Tuple{Int64,Int64}

Count the number of loci and total lines in a VCF file.

Arguments

fname::String: Path to the VCF file. Can be either a plain text VCF file or a gzipped VCF file (with extensions .vcf.gz or .vcf.bgz)
verbose::Bool: If true, prints progress messages and results to stdout. Defaults to false.

Returns

Tuple{Int64,Int64}: A tuple containing:
- First element: Total number of lines in the file (including headers)
- Second element: Number of data lines (variants/loci) excluding header lines

Description

Reads through a VCF (Variant Call Format) file and counts:

Total lines in the file (including headers)
Number of data lines (variants/loci) that don't start with '#'

The function automatically detects and handles different file formats:

Plain text VCF files (.vcf)
Gzipped VCF files (.vcf.gz)
BGZipped VCF files (.vcf.bgz)

Examples

julia> genomes = GenomicBreedingCore.simulategenomes(n=10, verbose=false);

julia> fname = writevcf(genomes);

julia> fname_gz = writevcf(genomes, gzip=true);

julia> n_1, p_1, l_1 = vcfcountlocialleles(fname);

julia> n_2, p_2, l_2 = vcfcountlocialleles(fname_gz);

julia> n_1 == n_2 == 10_009
true

julia> p_1 == p_2 == 10_000
true

julia> l_1 == l_2 == 10_000
true

julia> rm.([fname, fname_gz]);

GenomicBreedingIO.vcfextractallelefreqs! — Method

vcfextractallelefreqs!(genomes::Genomes, pb::Union{Nothing,Progress}, i::Vector{Int64}; 
                      fname::String, line::Vector{String}, line_counter::Int64, 
                      field::String, min_depth::Int64=10, max_depth::Int64=100, 
                      verbose::Bool=false)

Extract allele frequencies from VCF file data and update a Genomes object.

Arguments

genomes::Genomes: Object to store genomic data
pb::Union{Nothing,Progress}: Progress bar object or nothing
i::Vector{Int64}: Single-element vector containing current locus-allele index
fname::String: Name of VCF file being processed
line::Vector{String}: Current line from VCF file split into fields
line_counter::Int64: Current line number in VCF file
field::String: Type of field to extract ("AF", "AD", or "GT")
min_depth::Int64=10: Minimum read depth threshold for AD field
max_depth::Int64=100: Maximum read depth threshold for AD field
verbose::Bool=false: Whether to display progress updates

Returns

Nothing; Updates the input parameters in place:

genomes: Updated with new allele frequencies and loci information
pb: Advanced if verbose=true
i: Index incremented based on processed alleles

Description

Processes VCF data to extract allele frequencies of the alternative allele/s using one of three methods:

AF field: Direct frequency values from VCF
AD field: Calculated from read depths (filtered by mindepth and maxdepth)
GT field: Calculated from genotype calls

Updates Genomes object with:

Loci-allele identifiers (chromosome, position, alleles)
Allele frequencies for each sample

Throws

ArgumentError: If field parameter is not "AF", "AD", or "GT"
ErrorException: If unable to parse AF or AD fields from VCF

Examples

julia> genomes = GenomicBreedingCore.simulategenomes(n=10, verbose=false);

julia> fname = writevcf(genomes);

julia> _, _, n_alt_alleles = vcfcountlocialleles(fname);

julia> entries, format_lines = vcfextractentriesandformats(fname);

julia> field, n_alleles, _ = vcfextractinfo(fname, format_lines=format_lines);

julia> genomes_instantiated = vcfinstantiateoutput(fname, entries=entries, n_alt_alleles=n_alt_alleles);

julia> sum(ismissing.(genomes_instantiated.allele_frequencies[:, 1])) == length(entries)
true

julia> file = open(fname, "r"); line::Vector{String} = split([readline(file) for i in 1:10][end], "	"); close(file);

julia> vcfextractallelefreqs!(genomes_instantiated, nothing, [0], fname=fname, line=line, line_counter=10, field=field);

julia> sum(ismissing.(genomes_instantiated.allele_frequencies[:, 1])) == 0
true

julia> rm(fname);

GenomicBreedingIO.vcfextractentriesandformats — Method

vcfextractentriesandformats(fname::String; verbose::Bool = false)::Tuple{Vector{String},Vector{String}}

Extract sample entries and format definitions from a VCF file.

Arguments

fname::String: Path to the VCF file (can be gzipped with extensions .vcf.gz or .vcf.bgz)
verbose::Bool=false: If true, prints progress information to stdout

Returns

A tuple containing:

Vector{String}: List of sample names from the VCF header
Vector{String}: List of FORMAT field definitions from the VCF metadata

Description

Reads a VCF file and extracts two key pieces of information:

Sample names from the header line (columns after FORMAT field)
FORMAT field definitions from metadata lines starting with "##FORMAT"

The function validates the presence and correct order of standard VCF columns: CHROM, POS, ID, REF, ALT, QUAL, FILTER, INFO, and FORMAT

Throws

ArgumentError: If VCF has fewer than expected columns or column names don't match VCF format

Examples

julia> genomes = GenomicBreedingCore.simulategenomes(n=10, verbose=false);

julia> fname = writevcf(genomes);

julia> entries, format_lines = vcfextractentriesandformats(fname);

julia> entries
10-element Vector{String}:
 "entry_01"
 "entry_02"
 "entry_03"
 "entry_04"
 "entry_05"
 "entry_06"
 "entry_07"
 "entry_08"
 "entry_09"
 "entry_10"

julia> format_lines
3-element Vector{String}:
 "##FORMAT=<ID=GT,Number=1,Type=String,Description=\"Genotype\">"
 "##FORMAT=<ID=AD,Number=2,Type=Float,Description=\"Allele Depth\">"
 "##FORMAT=<ID=AF,Number=2,Type=Float,Description=\"Allele Frequency\">"

julia> rm(fname);

GenomicBreedingIO.vcfextractinfo — Method

vcfextractinfo(fname::String; format_lines::Vector{String}, field::String="any", verbose::Bool=false)::Tuple{String,Int64,Int64}

Extract information about genotype fields from a VCF file.

Arguments

fname::String: Path to the VCF file (can be gzipped)
format_lines::Vector{String}: Vector containing FORMAT lines from the VCF header
field::String="any": Specific field to extract ("GT", "AD", "AF", or "any")
verbose::Bool=false: If true, prints progress information

Returns

A tuple containing:

field::String: The identified genotype field
n_alleles::Int64: Maximum number of alleles per locus
ploidy::Int64: Ploidy level (only meaningful for GT field; set to typemax(Int64) for AD and AF fields)

Details

If field is "any", searches for fields in priority order: AF > AD > GT
For GT field, scans entire file to determine maximum number of alleles and ploidy
For AF and AD fields, extracts allele count from format header
Supports both gzipped (.gz, .bgz) and uncompressed VCF files

Throws

ArgumentError: If specified field is not found in the VCF file
ErrorException: If unable to parse number of alleles from format header

Examples

julia> genomes = GenomicBreedingCore.simulategenomes(n=10, verbose=false);

julia> fname = writevcf(genomes);

julia> _, format_lines = vcfextractentriesandformats(fname);

julia> field, n_alleles, ploidy = vcfextractinfo(fname, format_lines=format_lines);

julia> (field == "AF") && (n_alleles == 2) && (ploidy == typemax(Int64))
true

julia> rm(fname);

GenomicBreedingIO.vcfinstantiateoutput — Method

vcfinstantiateoutput(fname::String; entries::Vector{String}, n_alt_alleles::Int64, verbose::Bool = false)::Genomes

Create and initialize a Genomes struct from VCF file parameters.

Arguments

fname::String: Name of the VCF file being processed
entries::Vector{String}: Vector containing entry identifiers
n_alt_alleles::Int64: Total number of alternative alleles across all loci
verbose::Bool=false: If true, prints progress information

Returns

Genomes: An initialized Genomes struct with:
- dimensions n × p where n is number of entries and p = naltalleles
- entry names assigned from input entries
- populations set to "unknown"
- mask set to true

Throws

ErrorException: If duplicate entries are found in the VCF file

Examples

julia> genomes = GenomicBreedingCore.simulategenomes(n=10, verbose=false);

julia> fname = writevcf(genomes);

julia> entries, format_lines = vcfextractentriesandformats(fname);

julia> _, _, n_alt_alleles = vcfcountlocialleles(fname);

julia> genomes_instantiated = vcfinstantiateoutput(fname, entries=entries, n_alt_alleles=n_alt_alleles);

julia> size(genomes_instantiated.allele_frequencies)
(10, 10000)

julia> genomes_instantiated.entries == entries
true

julia> rm(fname);

GenomicBreedingIO.vcfparsecoordinates — Method

vcfparsecoordinates(; line::Vector{String}, line_counter::Int64, field::String)::Union{Nothing,Tuple{Int64,String,Int64,Vector{String}}}

Parse coordinates and allele information from a VCF file line.

Arguments

line::Vector{String}: A vector containing the split line from VCF file
line_counter::Int64: Current line number being processed in the VCF file
field::String: The field name to extract allele frequencies from

Returns

Nothing: If the specified field is not found in the line
Tuple{Int64,String,Int64,Vector{String}}: A tuple containing:
- Field index
- Chromosome name
- Position
- Combined reference and alternative alleles

Throws

ErrorException: If the position field cannot be parsed as an integer

Note

The function validates the line format and extracts genomic coordinates and allele information from a VCF file line. It handles missing alternative alleles (denoted by ".") and performs necessary type conversions.

Examples

julia> genomes = GenomicBreedingCore.simulategenomes(n=10, verbose=false);

julia> fname = writevcf(genomes);

julia> entries, format_lines = vcfextractentriesandformats(fname);

julia> field, _, _ = vcfextractinfo(fname, format_lines=format_lines);

julia> file = open(fname, "r"); line::Vector{String} = split([readline(file) for i in 1:10][end], "	"); close(file);

julia> idx_field, chrom, pos, refalts = vcfparsecoordinates(line=line, line_counter=10, field=field);

julia> (idx_field == 3) && (chrom == line[1]) && (pos == parse(Int64, line[2])) && (refalts == line[4:5])
true

julia> rm(fname);

GenomicBreedingIO.writedelimited — Method

writedelimited(
    genomes::Genomes;
    fname::Union{Missing,String} = missing,
    sep::String = "\t",
    include_population_header::Bool = true
)::String

Write genomic data to a delimited text file.

Arguments

genomes::Genomes: A Genomes struct containing the genomic data to be written
fname::Union{Missing,String}: Output filename. If missing, generates an automatic filename with timestamp
sep::String: Delimiter character for the output file (default: tab)
include_population_header::Bool: Whether to include population information in the header (default: true)

Returns

String: Path to the created output file

File Format

The output file contains:

Header lines (prefixed with '#'):
- First line: chromosome, position, alleles, and entry information
- Second line (optional): population information
Data rows with the following columns:
- Column 1: Chromosome identifier
- Column 2: Position
- Column 3: All alleles at the locus (pipe-separated)
- Column 4: Specific allele
- Remaining columns: Frequency data for each entry

Supported File Extensions

'.tsv' (tab-separated, default)
'.csv' (comma-separated)
'.txt' (custom delimiter)

Throws

DimensionMismatch: If the input Genomes struct is corrupted
ErrorException: If the output file already exists
ArgumentError: If the file extension is invalid or the output directory doesn't exist

Examples

julia> genomes = GenomicBreedingCore.simulategenomes(n=2, verbose=false);

julia> writedelimited(genomes, fname="test_genomes.tsv")
"test_genomes.tsv"

GenomicBreedingIO.writedelimited — Method

writedelimited(phenomes::Phenomes; fname::Union{Missing,String} = missing, sep::String = "	")::String

Write phenotypic data from a Phenomes struct to a delimited text file.

Arguments

phenomes::Phenomes: A Phenomes struct containing phenotypic data
fname::Union{Missing,String} = missing: Output filename. If missing, generates an automatic filename with timestamp
sep::String = " ": Delimiter character for the output file

Returns

String: The name of the created file

File Format

Header line starts with '#' containing column names
First column: Entry names
Second column: Population names
Remaining columns: Trait values
Missing values are represented as "NA"

File Extensions

Supported file extensions:

.tsv for tab-separated files (default)
.csv for comma-separated files
.txt for other delimiters

Throws

DimensionMismatch: If the Phenomes struct dimensions are inconsistent
ErrorException: If the output file already exists
ArgumentError: If the file extension is invalid or the directory doesn't exist

Examples

julia> phenomes = Phenomes(n=2, t=2); phenomes.entries = ["entry_1", "entry_2"]; phenomes.traits = ["trait_1", "trait_2"];

julia> writedelimited(phenomes, fname="test_phenomes.tsv")
"test_phenomes.tsv"

GenomicBreedingIO.writedelimited — Method

writedelimited(trials::Trials; fname::Union{Missing,String} = missing, sep::String = "	", 
              overwrite::Bool = false, verbose::Bool = false)::String

Write a Trials struct to a delimited text file, returning the filename.

Arguments

trials::Trials: The trials data structure to be written
fname::Union{Missing,String} = missing: Output filename. If missing, generates automatic filename with timestamp
sep::String = " ": Delimiter character between fields
overwrite::Bool = false: Whether to overwrite existing output file if it exists
verbose::Bool = false: Whether to show progress bar during writing

Returns

String: The name of the file that was written

File Format

The output file contains one header line and one line per trial entry. Header line is prefixed with '#' and contains column names.

Fixed Columns (1-10)

years
seasons
harvests
sites
entries
populations
replications
blocks
rows
cols

Variable Columns (11+)

Additional columns contain phenotype traits values
Missing values are written as "NA"

Notes

Supported file extensions: .tsv, .csv, or .txt
File extension is automatically determined based on separator if filename is missing:
- \t → .tsv
- , or ; → .csv
- other → .txt
Will throw error if file exists and overwrite=false
Directory must exist if path is specified in filename

Examples

julia> trials = Trials(n=1, t=2); trials.years = ["year_1"]; trials.seasons = ["season_1"]; trials.harvests = ["harvest_1"]; trials.sites = ["site_1"]; trials.entries = ["entry_1"]; trials.populations = ["population_1"]; trials.replications = ["replication_1"]; trials.blocks = ["block_1"]; trials.rows = ["row_1"]; trials.cols = ["col_1"]; trials.traits = ["trait_1", "trait_2"];

julia> writedelimited(trials, fname="test_trials.tsv")
"test_trials.tsv"

GenomicBreedingIO.writejld2 — Method

writejld2(A::Union{Genomes,Phenomes,Trials,SimulatedEffects,TEBV}; fname::Union{Missing,String} = missing, overwrite::Bool=false)::String

Save genomic breeding core data structures to a JLD2 file (HDF5-compatible format).

Arguments

A: A genomic breeding data structure (Genomes, Phenomes, Trials, SimulatedEffects, or TEBV)
fname: Optional. Output filename. If missing, generates an automatic name with timestamp
overwrite: Optional. If true, overwrites existing file with same name. Default is false

Returns

String: Path to the saved JLD2 file

File Naming

If fname is not provided, generates name: "output-[Type]-[Timestamp].jld2"
If fname is provided, must have ".jld2" extension

Throws

DimensionMismatch: If input structure has invalid dimensions
ErrorException: If output file already exists and overwrite=false
ArgumentError: If invalid file extension or directory path

Notes

Files are saved with compression enabled
Data is stored as a Dictionary with single key-value pair
Key is the string representation of the input type

Examples

julia> genomes = GenomicBreedingCore.simulategenomes(n=2, verbose=false);

julia> writejld2(genomes, fname="test_genomes.jld2")
"test_genomes.jld2"

julia> genomes_reloaded = load("test_genomes.jld2");

julia> genomes_reloaded[collect(keys(genomes_reloaded))[1]] == genomes
true

julia> phenomes = Phenomes(n=2, t=2); phenomes.entries = ["entry_1", "entry_2"]; phenomes.traits = ["trait_1", "trait_2"];

julia> writejld2(phenomes, fname="test_phenomes.jld2")
"test_phenomes.jld2"

julia> phenomes_reloaded = load("test_phenomes.jld2");

julia> phenomes_reloaded[collect(keys(phenomes_reloaded))[1]] == phenomes
true

julia> trials, _ = simulatetrials(genomes=genomes, verbose=false);

julia> writejld2(trials, fname="test_trials.jld2")
"test_trials.jld2"

julia> trials_reloaded = load("test_trials.jld2");

julia> trials_reloaded[collect(keys(trials_reloaded))[1]] == trials
true

julia> simulated_effects = SimulatedEffects();

julia> writejld2(simulated_effects, fname="test_simulated_effects.jld2")
"test_simulated_effects.jld2"

julia> simulated_effects_reloaded = load("test_simulated_effects.jld2");

julia> simulated_effects_reloaded[collect(keys(simulated_effects_reloaded))[1]] == simulated_effects
true

julia> trials, _simulated_effects = GenomicBreedingCore.simulatetrials(genomes = GenomicBreedingCore.simulategenomes(n=10, verbose=false), n_years=1, n_seasons=1, n_harvests=1, n_sites=1, n_replications=10, verbose=false);

julia> tebv = analyse(trials, max_levels=50, verbose=false);

julia> writejld2(tebv, fname="test_tebv.jld2")
"test_tebv.jld2"

julia> tebv_reloaded = load("test_tebv.jld2");

julia> tebv_reloaded[collect(keys(tebv_reloaded))[1]] == tebv
true

GenomicBreedingIO.writevcf — Method

writevcf(genomes::Genomes; fname::Union{Missing,String} = missing, ploidy::Int64 = 0, 
         max_depth::Int64 = 100, n_decimal_places::Int64 = 4, gzip::Bool = false)::String

Write genomic data to a Variant Call Format (VCF) file.

Arguments

genomes::Genomes: A Genomes object containing the genetic data to be written.
fname::Union{Missing,String} = missing: Output filename. If missing, generates a default name with timestamp.
ploidy::Int64 = 0: The ploidy level of the organisms (e.g., 2 for diploid).
max_depth::Int64 = 100: Maximum depth for allele depth calculation.
n_decimal_places::Int64 = 4: Number of decimal places for rounding allele frequencies.
gzip::Bool = false: Whether to compress the output file using gzip.

Returns

String: The name of the created VCF file.

Description

Creates a VCF v4.2 format file containing genomic variants data. The function processes allele frequencies and depths, calculates genotypes based on ploidy, and formats the data according to VCF specifications. The output includes:

Standard VCF header information
Sample information with FORMAT fields:
- GT (Genotype)
- AD (Allele Depth)
- AF (Allele Frequency)

Throws

DimensionMismatch: If the input Genomes object has inconsistent dimensions
ErrorException: If the output file already exists
ArgumentError: If the file extension is not '.vcf' or if the specified directory doesn't exist

Examples

julia> genomes_1 = GenomicBreedingCore.simulategenomes(n=2, verbose=false);

julia> writevcf(genomes_1, fname="test_genomes_1.vcf")
"test_genomes_1.vcf"

julia> genomes_2 = GenomicBreedingCore.simulategenomes(n=2, n_alleles=3, verbose=false);

julia> genomes_2.allele_frequencies = round.(genomes_2.allele_frequencies .* 4) ./ 4;

julia> writevcf(genomes_2, fname="test_genomes_2.vcf", ploidy=4)
"test_genomes_2.vcf"

julia> genomes_3 = GenomicBreedingCore.simulategenomes(n=3, verbose=false);

julia> writevcf(genomes_3, fname="test_genomes_3.vcf", gzip=true)
"test_genomes_3.vcf.gz"