Package 'Mega2R' reference manual

Title:	Accessing and Processing a 'Mega2' Genetic Database
Description:	Uses as input genetic data that have been reformatted and stored in a 'SQLite' database; this database is initially created by the standalone 'mega2' C++ program (available freely from <https://watson.hgen.pitt.edu/register/>). Loads and manipulates data frames containing genotype, phenotype, and family information from the input 'SQLite' database, and decompresses needed subsets of the genotype data, on the fly, in a memory efficient manner. We have also created several more functions that illustrate how to use the data frames as well as perform useful tasks: these permit one to run the 'pedgene' package to carry out gene-based association tests on family data using selected marker subsets, to run the 'SKAT' package to carry out gene-based association tests using selected marker subsets, to run the 'famSKATRC' package to carry out gene-based association tests on families (optionally) and with rare or common variants using selected marker subsets, to output the 'Mega2R' data as a VCF file and related files (for phenotype and family data), and to convert the data frames into CoreArray Genomic Data Structure (GDS) format.
Authors:	Robert V. Baron [aut], Daniel E. Weeks [aut, cre], University of Pittsburgh [cph]
Maintainer:	Daniel E. Weeks <[email protected]>
License:	GPL-2
Version:	1.1.0
Built:	2025-04-01 05:35:58 UTC
Source:	https://github.com/cran/Mega2R

apply a function to the genotypes (markers) in each gene transcript and/or base pair range

Description

This function generates base pair ranges from its input arguments. Each range specifies a chromosome, a start base pair and end base pair. Typically, a range could be a gene transcript, though it could be a whole chromosome, or a run of base pairs on a chromosome. Once the ranges are generated, applyFnToRanges is called to find all the rows (i.e. markers) from the markers data frame that fall in each range. For these markers, a matrix of the genotypes is generated. Finally, the op function is called for each range with the arguments: markers, range, and 'environment'.

Usage

applyFnToGenes(op           = function (markers, range, envir) {},
               genes_arg    = NULL,
               ranges_arg   = matrix(ncol = 3, nrow = 0),
               chrs_arg     = vector("integer", 0),
               markers_arg  = vector("character", 0),
               type_arg     = "TX",
               fuzz_arg     = 0,
               envir        = ENV)
applyFnToGenes(op           = function (markers, range, envir) {},
               genes_arg    = NULL,
               ranges_arg   = matrix(ncol = 3, nrow = 0),
               chrs_arg     = vector("integer", 0),
               markers_arg  = vector("character", 0),
               type_arg     = "TX",
               fuzz_arg     = 0,
               envir        = ENV)

Arguments

`op`	Is a function of three arguments. It will be called repeatedly by `applyFnToGenes` in a try/catch context. The arguments are: markers Marker data for each marker selected. A marker is a data frame with the following 5 observations: locus_link is the ordinal ranking of this marker among all loci locus_link_fill is the position of corresponding marker genotype data in the unified_genotype_table MarkerName is the text name of the marker chromosome is the integer chromosome number position is the integer base pair position of marker range An indicator of which range argument these markers correspond to. envir An 'environment' holding Mega2R data frames and state data.
`genes_arg`	a character vector of gene names. All the transcripts identified with the specified gene in BioConductor Annotation, TxDb.Hsapiens.UCSC.hg19.knownGene, are selected. This produces multiple "range" elements containing chromosome, start base pair, end base pair. (If the gene name is "", all the transcript will be selected.) Note: BioCoductor Annotation org.Hs.eg.db* is used to convert from gene name to ENTREZ gene id.
`ranges_arg`	an integer matrix of three columns. The columns define a range: a chromosome number, a start base pair value, and an end base pair value.
`chrs_arg`	an integer vector of chromosome numbers. All of the base pairs on each chromosomes will be selected as a single range.
`markers_arg`	a data frame with the following 5 observations: locus_link is the ordinal ranking of this marker among all loci locus_link_fill is the position of corresponding marker genotype data in the unified_genotype_table MarkerName is the text name of the marker chromosome is the integer chromosome number position is the integer base pair position of marker
`type_arg`	a character vector of length 1 that contains "TX" or does not. If it is "TX", which is the default, the TX fields of BioConductor Annotation, TxDb.Hsapiens.UCSC.hg19.knownGene are used to define the base pair ranges and chromosome. Otherwise, the CDS fields are used.
`fuzz_arg`	is an integer vector of length one or two. The first argument is used to reduce the start base pair selected from each transcript and the second to increase the end base pair position. (If only one value is present, it is used for both adjustments.) Note: The values can be positive or negative.
`envir`	an 'environment' that contains all the data frames created from the SQLite database.

Value

None

Note

If you want subsequent calls to op to share information, data can be placed in a data frame that is added to the 'environment'.

Examples

  db = system.file("exdata", "seqsimm.db", package="Mega2R")
  ENV = read.Mega2DB(db)

  show = function(m, r, e) {
      print(r)
      print(m)
      print(head(getgenotypes(m, envir = e)))
  }

   # apply function "show" to all transcripts on genes ELL2 and CARD15

    # donttestcheck: time
    applyFnToGenes(show, genes_arg = c("CEP104"))


   # apply function "show" to all genotypes on chromosomes 11 for two base
   # pair ranges
   applyFnToGenes(show, ranges_arg = matrix(c(1, 5000000, 10000000,
                  1, 10000000, 15000000), ncol = 3, nrow = 2, byrow = TRUE))

   # apply function "show" to all genotypes for first marker in each chromosome
   applyFnToGenes(show, markers_arg = ENV$markers[! duplicated(ENV$markers$chromosome), 3])

   # apply function "show" to all genotypes on chromosomes 24 and 26
   applyFnToGenes(show, chrs_arg=c(24, 26))


db = system.file("exdata", "seqsimm.db", package="Mega2R")
  ENV = read.Mega2DB(db)

  show = function(m, r, e) {
      print(r)
      print(m)
      print(head(getgenotypes(m, envir = e)))
  }

   # apply function "show" to all transcripts on genes ELL2 and CARD15

    # donttestcheck: time
    applyFnToGenes(show, genes_arg = c("CEP104"))


   # apply function "show" to all genotypes on chromosomes 11 for two base
   # pair ranges
   applyFnToGenes(show, ranges_arg = matrix(c(1, 5000000, 10000000,
                  1, 10000000, 15000000), ncol = 3, nrow = 2, byrow = TRUE))

   # apply function "show" to all genotypes for first marker in each chromosome
   applyFnToGenes(show, markers_arg = ENV$markers[! duplicated(ENV$markers$chromosome), 3])

   # apply function "show" to all genotypes on chromosomes 24 and 26
   applyFnToGenes(show, chrs_arg=c(24, 26))

apply a function to the genotypes from a set of markers

Description

A matrix of the genotypes for all the specified markers is generated. Then, the call back function, op, is called with the markers, NULL (for the range), and the 'environment'.

Usage

applyFnToMarkers(op      = function (markers, range, envir) {},
                markers_arg,
                envir = ENV)
applyFnToMarkers(op      = function (markers, range, envir) {},
                markers_arg,
                envir = ENV)

Arguments

op

Is a function of three arguments. It will be called once by applyFnToMarkers in a try/catch context. The arguments are:

markers

Marker data for each marker in geno. A marker is a data frame with the following 5 observations:

locus_link: is the ordinal ranking of this marker among all loci
locus_link_fill: is the position of corresponding marker genotype data in the unified_genotype_table
MarkerName: is the text name of the marker
chromosome: is the integer chromosome number
position: is the integer base pair position of marker

range

NULL: to indicate no explicit range was specified.

envir

An 'environment' holding Mega2R data frames and state data.

markers_arg

a data frame with the following 5 observations:

locus_link: is the ordinal ranking of this marker among all loci
locus_link_fill: is the position of corresponding marker genotype data in the unified_genotype_table
MarkerName: is the text name of the marker
chromosome: is the integer chromosome number
position: is the integer base pair position of marker

envir

an 'environment' that contains all the data frames created from the SQLite database.

Value

None

Examples

  db = system.file("exdata", "seqsimm.db", package="Mega2R")
  ENV = read.Mega2DB(db)
  show = function(m, r, e) {
      print(r)
      print(m)
      print(head(getgenotypes(m, envir = e)))
  }

   # apply function "show" to all genotypes > 5,000,000 bp
   applyFnToMarkers(show, ENV$markers[ENV$markers$position > 5000000,])


db = system.file("exdata", "seqsimm.db", package="Mega2R")
  ENV = read.Mega2DB(db)
  show = function(m, r, e) {
      print(r)
      print(m)
      print(head(getgenotypes(m, envir = e)))
  }

   # apply function "show" to all genotypes > 5,000,000 bp
   applyFnToMarkers(show, ENV$markers[ENV$markers$position > 5000000,])

apply a function to all the genotypes for markers found in several specified ranges

Description

First, for each range, determine the markers that fall between the start and end base pair of the range. Then, for each set of markers generate a matrix of the genotypes of those markers. Finally, the op function is called for each range with the arguments: markers, range, and 'environment'.

Usage

applyFnToRanges(op          = function (markers, range, envir) {},
                ranges_arg  = NULL,
                indices_arg = NULL,
                fuzz_arg    = 0,
                envir       = ENV)
applyFnToRanges(op          = function (markers, range, envir) {},
                ranges_arg  = NULL,
                indices_arg = NULL,
                fuzz_arg    = 0,
                envir       = ENV)

Arguments

`op`	Is a function of three arguments. It will be called repeatedly by `applyFnToRanges` in a try/catch context. The arguments are: markers Marker data for each marker in geno. A marker is a data frame with the following 5 observations: locus_link is the ordinal ranking of this marker among all loci locus_link_fill is the position of corresponding marker genotype data in the unified_genotype_table MarkerName is the text name of the marker chromosome is the integer chromosome number position is the integer base pair position of marker range An indicator of which range argument of `applyFnToRanges` these markers correspond to. envir An 'environment' holding Mega2R data frames and state data.
`ranges_arg`	is a data frame that contains at least 4 observations: a name, a chromosome, a start base pair position and an end base pair position.
`indices_arg`	is a vector of 3 integers that specify the location of chromosome, start base pair column and end base pair column of the ranges_arg data frame. An optional fourth integer indicates the column containing the name of the ranges.
`fuzz_arg`	is an integer vector of length one or two. The first argument is used to reduce the start base pair selected from each range and the second to increase the end base pair position. (If only one value is present, it is used for both changes.) Note: The values can be positive or negative.
`envir`	an 'environment' that contains all the data frames created from the SQLite database.

Value

None

Note

If the ranges_arg and indices_arg are NULL or missing, then the default ranges that have been set by setRanges are used. If setRanges has not been called, a default set of the ranges is used.

Examples

  db = system.file("exdata", "seqsimm.db", package="Mega2R")
  ENV = read.Mega2DB(db)

  show = function(m, r, e) {
      print(r)
      print(m)
      print(head(getgenotypesraw(m, envir = e)))
  }

   # apply function "show" to all genotypes on chromosomes 1 for two base pair
   # ranges
   applyFnToRanges(show,
                   ranges_arg =
                   matrix(c(1, 2244000, 2245000,
                            1, 3762500, 3765000),
                           ncol = 3, nrow = 2, byrow = TRUE),
                   indices_arg = 1:3)

   # apply function "show" to all genotypes on chromosomes 1 for two base pair
   # ranges
   applyFnToRanges(show,
                   ranges_arg =
                   matrix(c(1, 2240000, 2245000, "range1",
                            1, 3760000, 3765000, "range2"),
                           ncol = 4, nrow = 2, byrow = TRUE),
                   indices_arg = 1:4)

db = system.file("exdata", "seqsimm.db", package="Mega2R")
  ENV = read.Mega2DB(db)

  show = function(m, r, e) {
      print(r)
      print(m)
      print(head(getgenotypesraw(m, envir = e)))
  }

   # apply function "show" to all genotypes on chromosomes 1 for two base pair
   # ranges
   applyFnToRanges(show,
                   ranges_arg =
                   matrix(c(1, 2244000, 2245000,
                            1, 3762500, 3765000),
                           ncol = 3, nrow = 2, byrow = TRUE),
                   indices_arg = 1:3)

   # apply function "show" to all genotypes on chromosomes 1 for two base pair
   # ranges
   applyFnToRanges(show,
                   ranges_arg =
                   matrix(c(1, 2240000, 2245000, "range1",
                            1, 3760000, 3765000, "range2"),
                           ncol = 4, nrow = 2, byrow = TRUE),
                   indices_arg = 1:4)

remove tutorial data

Description

This function removes the Mega2R tutorial (inst/exdata) data that was copied to the specified directory.

Usage

clean_mega2rtutorial_data(dir = file.path(tempdir(), "Mega2Rtutorial"))
clean_mega2rtutorial_data(dir = file.path(tempdir(), "Mega2Rtutorial"))

Arguments

dir

The directory to remove the tutorial data to. By default, this is tempdir()/Mega2Rtutorial

Value

None

Examples

clean_mega2rtutorial_data()

clean_mega2rtutorial_data()

computeDosage function

Description

Convert the genotypesraw() allele patterns of 0x10001, 0x10002 (or 0x20001), 0x20002, 0 to the numbers 0, 1, 2, 9 for each marker. (Reverse, the order iff allele "1" has the minor allele frequency.)

Usage

computeDosage(markers_arg, range_arg, envir)
computeDosage(markers_arg, range_arg, envir)

Arguments

markers_arg

a data.frame with the following 5 observations:

locus_link: is the ordinal ranking of this marker among all loci
locus_link_fill: is the position of corresponding genotype data in the unified_genotype_table
MarkerName: is the text name of the marker
chromosome: is the integer chromosome number
position: is the integer base pair position of marker

range_arg

one row of a ranges_arg. The latter is a data frame of at least three integer columns. The columns indicate a range: a chromosome number, a start base pair value, and an end base pair value.

envir

'environment' containing SQLite database and other globals especially the phenotype_table, phe.

Value

a matrix of samples X markers for all the markers that have nonzero changes.

Examples

db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = init_famSKATRC(db, verbose = TRUE)
dimDosage = function(m, r, e) {print(dim(computeDosage(m, r, e)))}
applyFnToRanges(dimDosage, ENV$refRanges[50:60, ], ENV$refIndices, envir=ENV)
# This will use return dosage matrices for the markers in the ranges 50 - 60,
# but is basically ignores the results.

db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = init_famSKATRC(db, verbose = TRUE)
dimDosage = function(m, r, e) {print(dim(computeDosage(m, r, e)))}
applyFnToRanges(dimDosage, ENV$refRanges[50:60, ], ENV$refIndices, envir=ENV)
# This will use return dosage matrices for the markers in the ranges 50 - 60,
# but is basically ignores the results.

read Mega2 SQLite database into R

Description

Read the fields of SQLite data base tables that are required for Mega2R into data frames. These data frames are stored in an 'environment' which is returned. This function also adds some state data, extra data frames, and computed data frames to the 'environment'.

Usage

dbmega2_import(dbname,
               bpPosMap = NULL,
               verbose = FALSE)
dbmega2_import(dbname,
               bpPosMap = NULL,
               verbose = FALSE)

Arguments

`dbname`	file path to SQLite database.
`bpPosMap`	index that specifies which map in the map_table should be used for marker chromosome/position. If it is NULL, the internal variable base_pair_position_index is used instead. `showMapNames()` shows the association between map name and map number.
`verbose`	print out statistics on the name/size of each table read and show column headers. Also, save the verbose value for use by other Mega2R functions.

Value

envir an environment that contains all the data frames made from the SQLite database.

Examples

db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = dbmega2_import(db, verbose = TRUE)

ENV = dbmega2_import(db)
db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = dbmega2_import(db, verbose = TRUE)

ENV = dbmega2_import(db)

DofamSKATRC call back function

Description

Convert the genotypesraw() allele patterns of 0x10001, 0x10002 (or 0x20001), 0x20002, 0 to the numbers 0, 1, 2, 9 for each marker. (Reverse, the order iff allele "1" has the minor allele frequency.) Ignore markers that have no variants. Finally, invoke famSKAT_RC with the converted genotype matrix. Save information about the range and the p.value calculated by famSKAT_RC in envir$famSKATRC_results. If you want to change the argument values to this function they should be changed instead when calling the Mega2famSKATRC function.

Usage

DOfamSKATRC(
  markers_arg,
  range_arg,
  envir,
  pheno = 3,
  id = NULL,
  covariates = NULL,
  sqrtweights_c = NULL,
  sqrtweights_r = NULL,
  binomialimpute = TRUE,
  acc = 1e-06,
  maf = 0.05,
  phi = c(0, 0.2, 0.5, 0.9)
)
DOfamSKATRC(
  markers_arg,
  range_arg,
  envir,
  pheno = 3,
  id = NULL,
  covariates = NULL,
  sqrtweights_c = NULL,
  sqrtweights_r = NULL,
  binomialimpute = TRUE,
  acc = 1e-06,
  maf = 0.05,
  phi = c(0, 0.2, 0.5, 0.9)
)

Arguments

`markers_arg`	a data.frame with the following 5 observations: locus_link is the ordinal ranking of this marker among all loci locus_link_fill is the position of corresponding genotype data in the unified_genotype_table MarkerName is the text name of the marker chromosome is the integer chromosome number position is the integer base pair position of marker
`range_arg`	one row of a ranges_arg. The latter is a data frame of at least three integer columns. The columns indicate a range: a chromosome number, a start base pair value, and an end base pair value.
`envir`	'environment' containing SQLite database and other globals especially the phenotype_table, `phe`.
`pheno`	is an index into the phenotypes_table to select the phenotype. Missing phenotypes are represented by NA.
`id`	a vector of individuals to be included in the test, a subset of the family members. If NULL is given, all members will be used.
`covariates`	a matrix of covariates for the phenotype.
`sqrtweights_c`	weight function for common variants, if NULL use weight set in init_famSKAT
`sqrtweights_r`	weight function for rare variants, if NULL use weight set in init_famSKAT.
`binomialimpute`	if TRUE, impute missing genotypes using a binomial distribution.
`acc`	accuracy used in Davies approximation.
`maf`	threshold used to separate rare from common variants.
`phi`	a vector of ratios ratios; each indicates the contribution of rare variants.

Value

None

Note

This function accumulates output in the data frame, envir$famSKATRC_results. It will print out the lines as they are generated if envir$verbose is TRUE. It does not write the data frame to a file. You must save the data frame. You also must initialize the data frame when necessary.

Examples

db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = init_famSKATRC(db, verbose = TRUE)
ENV$famSKATRC_results = ENV$famSKATRC_results[0, ]
Mega2famSKATRC(gs=1:1, envir=ENV, pheno=3)
# this sets one of the many arguments for DOfamSKATRC
# but basically prepares the ENV for the direct use of DOfamSKATRC (below).


# donttestcheck: try this below instead if there is time
 Mega2famSKATRC(genes=c("CEP104"), envir=ENV, pheno=3 )


# DOfamSKATRC is called within Mega2famSKATRC. init_famSKATRC and Mega2famSKATRC need to be
# called to set up the environment for famSKAT_RC to run.  BUT, you should ignore DOfamSKATRC
# and use Mega2famSKATRC instead.
#
applyFnToRanges(DOfamSKATRC, ENV$refRanges[50:60, ], ENV$refIndices, envir=ENV)
# this will use all the default argument values for DOfamSKATRC

db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = init_famSKATRC(db, verbose = TRUE)
ENV$famSKATRC_results = ENV$famSKATRC_results[0, ]
Mega2famSKATRC(gs=1:1, envir=ENV, pheno=3)
# this sets one of the many arguments for DOfamSKATRC
# but basically prepares the ENV for the direct use of DOfamSKATRC (below).


# donttestcheck: try this below instead if there is time
 Mega2famSKATRC(genes=c("CEP104"), envir=ENV, pheno=3 )


# DOfamSKATRC is called within Mega2famSKATRC. init_famSKATRC and Mega2famSKATRC need to be
# called to set up the environment for famSKAT_RC to run.  BUT, you should ignore DOfamSKATRC
# and use Mega2famSKATRC instead.
#
applyFnToRanges(DOfamSKATRC, ENV$refRanges[50:60, ], ENV$refIndices, envir=ENV)
# this will use all the default argument values for DOfamSKATRC

pedgene call back function

Description

First, ignore call backs that have less than two markers. Second, convert the genotypesraw() patterns of 0x10001, 0x10002 (or 0x20001), 0x20002, 0 from the genotype matrix to the numbers 0, 1, 2, 0 for each marker. (Reverse, the order iff allele "1" has the minor allele frequency.) Next, prepend the pedigree and person columns of the family data to this modified genotype matrix. Finally, invoke pedgene with the family data and genotype matrix for several different weights. Save the kernel and burden, value and p-value for each measurement in envir$pedgene_results.

Usage

DOpedgene(markers_arg, range_arg, envir = ENV)
DOpedgene(markers_arg, range_arg, envir = ENV)

Arguments

markers_arg

a data.frame with the following 5 observations:

locus_link: is the ordinal ranking of this marker among all loci
locus_link_fill: is the position of corresponding genotype data in the unified_genotype_table
MarkerName: is the text name of the marker
chromosome: is the integer chromosome number
position: is the integer base pair position of marker

range_arg

one row of a ranges_arg. The latter is a data frame of at least three integer columns. The columns indicate a range: a chromosome number, a start base pair value, and an end base pair value.

envir

'environment' containing SQLite database and other globals

Value

None

Note

This function appends output to the data frame, envir$pedgene_results. It will print out the lines as they are generated if envir$verbose is TRUE. The data frame envir$pedgene_results is initialized by init_pedgene, and is appended to each time DOpedgene is run.

Examples

db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = init_pedgene(db)
ENV$verbose = TRUE
applyFnToRanges(DOpedgene, ENV$refRanges[50:60,], ENV$refIndices)


# donttestcheck: try this below if there is time
applyFnToGenes(DOpedgene, genes_arg = c("CEP104"))


db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = init_pedgene(db)
ENV$verbose = TRUE
applyFnToRanges(DOpedgene, ENV$refRanges[50:60,], ENV$refIndices)


# donttestcheck: try this below if there is time
applyFnToGenes(DOpedgene, genes_arg = c("CEP104"))

SKAT call back function

Description

Convert the genotypesraw() allele patterns of 0x10001, 0x10002 (or 0x20001), 0x20002, 0 to the numbers 0, 1, 2, 9 for each marker. (Reverse, the order iff allele "1" has the minor allele frequency.) Ignore markers that have no variants (unless allMarkers is TRUE). Finally, invoke SKAT with the converted genotype matrix, Null model saved in envir$obj, and any additionally supplied arguments. Save information about the range and the p.value calculated by SKAT in envir$SKAT_results.

Usage

DOSKAT(markers_arg, range_arg, envir, ...)
DOSKAT(markers_arg, range_arg, envir, ...)

Arguments

`markers_arg`	a data.frame with the following 5 observations: locus_link is the ordinal ranking of this marker among all loci locus_link_fill is the position of corresponding genotype data in the unified_genotype_table MarkerName is the text name of the marker chromosome is the integer chromosome number position is the integer base pair position of marker
`range_arg`	one row of a ranges_arg. The latter is a data frame of at least three integer columns. The columns indicate a range: a chromosome number, a start base pair value, and an end base pair value.
`envir`	'environment' containing SQLite database and other globals
`...`	extra arguments for SKAT

Value

None

Note

This function accumulates output in the data frame, envir$SKAT_results. It will print out the lines as they are generated if envir$verbose is TRUE. It does not write the data frame to a file. You must save the data frame. You also must initialize the data frame when necessary.

Examples

db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = init_SKAT(db, verbose = TRUE, allMarkers = FALSE)
Mega2SKAT(ENV$phe[, 3] - 1 ~ 1, "D", gs=1:1)


# donttestcheck: try this below instead if there is time
Mega2SKAT(ENV$phe[, 3] - 1 ~ 1, "D", kernel = "linear.weighted", 
          weights.beta = c(0.5, 0.5), genes=c("CEP104") )


# DOSKAT is called internally to Mega2SKAT. init_SKAT and Mega2SKAT need to be
# called to set up the environment for DOSKAT to run.  You should ignore DOSKAT
# and use Mega2SKAT instead
#
applyFnToRanges(DOSKAT, ENV$refRanges[50:60, ], ENV$refIndices)

db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = init_SKAT(db, verbose = TRUE, allMarkers = FALSE)
Mega2SKAT(ENV$phe[, 3] - 1 ~ 1, "D", gs=1:1)


# donttestcheck: try this below instead if there is time
Mega2SKAT(ENV$phe[, 3] - 1 ~ 1, "D", kernel = "linear.weighted", 
          weights.beta = c(0.5, 0.5), genes=c("CEP104") )


# DOSKAT is called internally to Mega2SKAT. init_SKAT and Mega2SKAT need to be
# called to set up the environment for DOSKAT to run.  You should ignore DOSKAT
# and use Mega2SKAT instead
#
applyFnToRanges(DOSKAT, ENV$refRanges[50:60, ], ENV$refIndices)

dump tutorial data

Description

This function retrieves data stored in the Mega2rtutorial (inst/exdata). It dumps them in the specified directory.

Usage

dump_mega2rtutorial_data(dir = file.path(tempdir(), "Mega2Rtutorial"))
dump_mega2rtutorial_data(dir = file.path(tempdir(), "Mega2Rtutorial"))

Arguments

dir

The directory to store the tutorial data to. By default, this is tempdir()/Mega2Rtutorial

Value

None

Examples

dump_mega2rtutorial_data()

dump_mega2rtutorial_data()

fetch genotype character matrix for specified markers

Description

This function calls a C++ function that does all the heavy lifting. It passes the arguments necessary for the C++ function: some from the caller's arguments and some from data frames that are in the "global" environment, envir. From the markers_arg argument, it fetches the locus_index and the index in the unified_genotype_table. It also passes the allele nucleotide separator argument. From the "global" environment, envir, it gets a bit vector of compressed genotype information, the alleles for each marker, and some bookkeeping related data. Note: This function also contains a dispatch/switch on the type of compression in the genotype vector. A different C++ function is called when there is compression versus when there is no compression.

Usage

getgenotypes(markers_arg, sepstr = "", envir = ENV)
getgenotypes(markers_arg, sepstr = "", envir = ENV)

Arguments

markers_arg

a data.frame with the following 5 observations:

locus_link: is the ordinal ranking of this marker among all loci
locus_link_fill: is the position of corresponding genotype data in the unified_genotype_table
MarkerName: is the text name of the marker
chromosome: is the integer chromosome number
position: is the integer base pair position of marker

sepstr

separator string inserted between the alleles (default is none). When present, this is typically a space, a tab or "/".

envir

an environment that contains all the data frames created from the SQLite database.

Details

The unified_genotype_table contains one raw vector for each person. In the vector there are two bits for each genotype. This function creates an output matrix by fixing the marker and collecting genotype information for each person and then repeating for all the needed markers. (Currently, this appears slightly faster than a scan which is fixes the person and iterates over markers.)

Value

a matrix of genotypes represented as two allele pairs. The matrix has one column for each marker in markers_arg argument. There is one row for each person in the family (fam) table.

Examples

db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = read.Mega2DB(db)

getgenotypes(ENV$markers)

db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = read.Mega2DB(db)

getgenotypes(ENV$markers)

fetch dosage integer matrix for specified markers

Description

This function calls a C++ function that does all the heavy lifting. It passes the arguments necessary for the C++ function: some from the caller's arguments and some from data frames that are in the "global" environment, envir. From its markers_arg argument, it gets the locus_index and the index in the unified_genotype_table. From the "global" environment, envir, it gets a bit vector of compressed genotype information, and some bookkeeping related data. Note: This function also contains a dispatch/switch on the type of compression in the genotype vector. A different C++ function is called when there is compression versus when there is no compression.

Usage

getgenotypesdos(markers_arg, envir = ENV)
getgenotypesdos(markers_arg, envir = ENV)

Arguments

markers_arg

a data.frame with the following 5 observations:

locus_link: is the ordinal ranking of this marker among all loci
locus_link_fill: is the position of corresponding genotype data in the unified_genotype_table
MarkerName: is the text name of the marker
chromosome: is the integer chromosome number
position: is the integer base pair position of marker

envir

an environment that contains all the data frames created from the SQLite database.

Details

The unified_genotype_table contains one raw vector for each person. In the vector, there are two bits for each genotype. This function creates an output matrix by fixing the marker and collecting genotype information for each person and then repeating for all the specified markers.

Value

a list of 3 values, named "ncol", "zero", "geno".

geno: is a matrix of dosages as integers. The value 0 is given to the Major allele value, 1 is given to the heterozygote value, and 2 is given to the Minor allele. In the matrix, there is usually one column for each marker in the markers_arg argument. But if there would be only the one allele 0 or 2 in the column, the column is ignorednot present. There is one row for each person in the family (fam) table.
ncol: Is the count of the actual number of columns in the geno matrix.
zero: Is a vector with one entry per marker. The value will be 0 if the marker is not in the geno matrix. Otherwise the value is the column number in the geno matrix where the marker data appears.

Examples

db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = read.Mega2DB(db)

getgenotypesdos(ENV$markers[ENV$markers$chromosome == 1,])

db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = read.Mega2DB(db)

getgenotypesdos(ENV$markers[ENV$markers$chromosome == 1,])

process the genotype matrix for specified markers and return the corresponding GenABEL genotype matrix

Description

This function calls a C++ function that does all the heavy lifting. It passes the arguments necessary for the C++ function: some from the caller's arguments and some from data frames that are in the "global" environment, envir. From its markers_arg argument, it gets the locus_index and the index in the unified_genotype_table. From the "global" environment, envir, it gets a bit vector of compressed genotype information, allele information, and some bookkeeping related data. Note: This function also contains a dispatch/switch on the type of compression in the genotype vector. A different C++ function is called when there is compression versus when there is no compression.

Usage

getgenotypesgenabel(markers_arg, envir = ENV)
getgenotypesgenabel(markers_arg, envir = ENV)

Arguments

markers_arg

a data.frame with the following 5 observations:

locus_link: is the ordinal ranking of this marker among all loci
locus_link_fill: is the position of corresponding genotype data in the unified_genotype_table
MarkerName: is the text name of the marker
chromosome: is the integer chromosome number
position: is the integer base pair position of marker

envir

an environment that contains all the data frames created from the SQLite database.

Details

This function reads the genotype data in Mega2 compressed format and converts it to the GenABEL compressed format. The unified_genotype_table contains one raw vector for each person. In the vector, there are two bits for each genotype; each byte has the data for 4 markers. In GenABEL, there is one raw vector per marker, and each byte has the data for 4 persons. The C++ function does the conversion as well as adjusts the bits' contents. For example, in GenABEL the genotype represented by bits == 0, is what Mega2 represents with 2. Doing the conversion in C++ is 10 - 20 times faster than converting the Mega2 data to PLINK .tped files and then having GenABEL read in and process/convert those files.

Value

the GenABEL gwaa.data-class object component that contains the genotype data.

Note

This function is called from Mega2ENVGenABEL; it is not intended to be called by the programmer.

Examples

db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = read.Mega2DB(db)

aa = getgenotypesgenabel(ENV$markers[ENV$markers$chromosome == 1,])

aa

db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = read.Mega2DB(db)

aa = getgenotypesgenabel(ENV$markers[ENV$markers$chromosome == 1,])

aa

fetch genotype integer matrix for specified markers

Description

This function calls a C++ function that does all the heavy lifting. It passes the arguments necessary for the C++ function: some from the caller's arguments and some from data frames that are in the "global" environment, envir. From its markers_arg argument, it gets the locus_index and the index in the unified_genotype_table. From the "global" environment, envir, it gets a bit vector of compressed genotype information, and some bookkeeping related data. Note: This function also contains a dispatch/switch on the type of compression in the genotype vector. A different C++ function is called when there is compression versus when there is no compression.

Usage

getgenotypesraw(markers_arg, envir = ENV)
getgenotypesraw(markers_arg, envir = ENV)

Arguments

markers_arg

a data.frame with the following 5 observations:

locus_link: is the ordinal ranking of this marker among all loci
locus_link_fill: is the position of corresponding genotype data in the unified_genotype_table
MarkerName: is the text name of the marker
chromosome: is the integer chromosome number
position: is the integer base pair position of marker

envir

an environment that contains all the data frames created from the SQLite database.

Details

Value

a matrix of genotypes represented as integers. Each 32 bit integer represents contains two allele values: the high 16 bits contains the index of allele1 and the low 16 bits contains the index of allele2. In the matrix, there is one column for each marker in the markers_arg argument. There is one row for each person in the family (fam) table.

Examples

db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = read.Mega2DB(db)

# two ints in upper/lower half integer representing allele # for all persons in chromosome 1
getgenotypesraw(ENV$markers[ENV$markers$chromosome == 1,])

db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = read.Mega2DB(db)

# two ints in upper/lower half integer representing allele # for all persons in chromosome 1
getgenotypesraw(ENV$markers[ENV$markers$chromosome == 1,])

load Mega2 SQLite database and perform initialization for famSKATRC usage

Description

This populates the R data frames with the specified Mega2 SQLite database. It initializes the fam(ily) table and makes sure the person entries are unique. Finally, it generates a kinship matrix from the family data. It also stores a weighting for the common and rare variant that may be used later if NULL is specified as a weight in Mega2famSKATRC. The common weighting is the function dbeta(maf, 1, 25). The rare weighting is the function dbeta(maf, 0.5, 0.5).

Usage

init_famSKATRC(db = NULL, verbose = FALSE, ALPHA = FALSE, ...)
init_famSKATRC(db = NULL, verbose = FALSE, ALPHA = FALSE, ...)

Arguments

`db`	specifies the path of a Mega2 SQLite database containing study data.
`verbose`	TRUE indicates that diagnostic printouts should be enabled. This value is saved in the returned environment.
`ALPHA`	TRUE indicates that two runs of famSKAT_RC should be enabled. One with ALPHA numeric ID's and one with numeric IDs ... this is temporary. The default is FALSE.
`...`	fed to dbmega2_import(); should be bpPosMap= to select from the maps of base pairs, if the default is not desired.

Value

"environment" containing data frames from an SQLite database and some computed values.

Note

init_famSKATRC creates a new data frame, envir$phe, containing phenotype observations. In addition, it initializes a matrix to aid in translating a genotype allele matrix to a genotype count matrix.

It also initializes the data frame envir$famSKATRC_results to zero rows.

Examples

db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = init_famSKATRC(db, verbose = FALSE)
ls(ENV)

db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = init_famSKATRC(db, verbose = FALSE)
ls(ENV)

load Mega2 SQLite database and perform initialization for pedgene usage

Description

This populates the R data frames from the specified Mega2 SQLite database.

Usage

init_pedgene(db = NULL, verbose = FALSE, traitname = "default", ...)
init_pedgene(db = NULL, verbose = FALSE, traitname = "default", ...)

Arguments

`db`	specifies the path of a Mega2 SQLite database containing study data.
`verbose`	TRUE indicates that diagnostic printouts should be enabled. This value is saved in the returned environment.
`traitname`	Name of the affection status trait to use to set the case/control status; default value = "default".
`...`	fed to dbmega2_import(); should be bpPosMap= to select from the maps of base pairs, if the default is not desired.

Value

"environment" containing data frames from an SQLite database and some computed values.

Note

init_pedgene calculates schaidPed and pedPer that are used later in the Dopedgene calculation. In addition, it initializes a matrix to aid in translating a genotype allele matrix to a genotype count matrix.

It also initializes the dataframe envir$pedgene_results to zero rows.

Examples

db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = init_pedgene(db, traitname = "default")
ls(ENV)

db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = init_pedgene(db, traitname = "default")
ls(ENV)

load Mega2 SQLite database and perform initialization for SKAT usage

Description

This populates the R data frames from the specified Mega2 SQLite database. It then prunes the samples to only include members that have a definite case or control status. Undefined samples are ignored.

Usage

init_SKAT(db = NULL, verbose = FALSE, allMarkers = FALSE, ...)
init_SKAT(db = NULL, verbose = FALSE, allMarkers = FALSE, ...)

Arguments

`db`	specifies the path of a Mega2 SQLite database containing study data.
`verbose`	TRUE indicates that diagnostic printouts should be enabled. This value is saved in the returned environment.
`allMarkers`	TRUE means use all markers in a given transcript even if there is no variation. FALSE means ignore markers that show no variation; this is the default.
`...`	fed to dbmega2_import(); should be bpPosMap= to select from the maps of base pairs, if the default is not desired.

Value

"environment" containing data frames from an SQLite database and some computed values.

Note

init_SKAT creats a data frame, envir$phe, of phenotype observations. In addition, it initializes a matrix to aid in translating a genotype allele matrix to a genotype count matrix.

It also initializes the data frame envir$SKAT_results to zero rows.

Examples

db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = init_SKAT(db, verbose = FALSE, allMarkers = FALSE)
ls(ENV)

db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = init_SKAT(db, verbose = FALSE, allMarkers = FALSE)
ls(ENV)

generate gwaa.data-class object

Description

create a gwaa.data-class object from the data frames in a Mega2 environment. This function is a front end that eventually calls a C++ Rcpp function that reads the genotype data in Mega2 compressed format and converts it to the GenABEL compressed format. The results of Mega2ENVGenABEL are/should be the same as Mega2GenABEL, but the calculation is much faster, typically a factor of 10 to 20.

Usage

Mega2ENVGenABEL(
  markers = NULL,
  force = TRUE,
  makemap = FALSE,
  sort = TRUE,
  envir = ENV
)
Mega2ENVGenABEL(
  markers = NULL,
  force = TRUE,
  makemap = FALSE,
  sort = TRUE,
  envir = ENV
)

Arguments

`markers`	data frame of markers to be processed
`force`	pass value to gwaa conversion function
`makemap`	pass value to gwaa conversion function
`sort`	pass value to gwaa conversion function
`envir`	'environment' containing SQLite database and other globals

Value

gwaa.data-class object created from Mega2R database

Examples

## Not run: 
require("GenABEL")
db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = read.Mega2DB(db)
gwaa = Mega2ENVGenABEL(markers=ENV$markers[1:10,])

str(gwaa)
head(summary(gwaa))

## End(Not run)

## Not run: 
require("GenABEL")
db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = read.Mega2DB(db)
gwaa = Mega2ENVGenABEL(markers=ENV$markers[1:10,])

str(gwaa)
head(summary(gwaa))

## End(Not run)

execute the CRAN famSKAT_RC function on a subset of the gene transcripts

Description

If the gene argument is NULL, execute the famSKAT_RC function on the first gs gene transcripts (default is gs = 1:100). Update the envir$famSKATRC_results data frame with the results. Otherwise, gene is a string vector of genes to process. The special value '*' stands for all the known genes.

Usage

Mega2famSKATRC(gs = 1:100, genes = NULL, envir = ENV, ...)
Mega2famSKATRC(gs = 1:100, genes = NULL, envir = ENV, ...)

Arguments

`gs`	a subrange of the default transcripts (refRanges) over which to calculate the DOfamSKATRC function.
`genes`	a list of genes over which to calculate the DOfamSKATRC function. The value, "", means use all the transcripts in the selected Bioconductor database. If genes is NULL, the gs range of the internal refRanges* will be used.
`envir`	'environment' containing SQLite database and other globals.
`...`	extra arguments that are acceptable to famSKAT_RC. These are listed with the `DOfamSKATRC` function.

Value

The data frame with the results is stored in the environment and named famSKATRC_results, viz. envir$famSKATRC_results

Note

A helper function SKAT3arg is defined for the 3 argument callback function which in turn calls DOfamSKATRC with the appropriate arguments (including those specific to the Mega2famSKATRC function).

Examples

db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = init_famSKATRC(db, verbose = FALSE)
ENV$verbose = FALSE
ENV$famSKATRC_results = ENV$famSKATRC_results[0, ]
Mega2famSKATRC(gs=50:60, envir=ENV, pheno=3)


# donttestcheck: try this below if there is time
 Mega2famSKATRC(genes=c("CEP104"), envir=ENV, pheno=3 )


ENV$famSKATRC_results

db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = init_famSKATRC(db, verbose = FALSE)
ENV$verbose = FALSE
ENV$famSKATRC_results = ENV$famSKATRC_results[0, ]
Mega2famSKATRC(gs=50:60, envir=ENV, pheno=3)


# donttestcheck: try this below if there is time
 Mega2famSKATRC(genes=c("CEP104"), envir=ENV, pheno=3 )


ENV$famSKATRC_results

transcode mega2 to gdsfmt/SNP_ARRAY

Description

Reads the data frames in "envir" and builds a GDSFMT COREARRAY file from them.

Usage

Mega2gdsfmt(
  filename = "test.gds",
  markers = NULL,
  snp.order = FALSE,
  SeqArray = FALSE,
  envir = ENV
)
Mega2gdsfmt(
  filename = "test.gds",
  markers = NULL,
  snp.order = FALSE,
  SeqArray = FALSE,
  envir = ENV
)

Arguments

`filename`	gdsfmt file to create
`markers`	data frame of markers to be processed
`snp.order`	TRUE indicates that the "genotype" data matrix has SNP as the first index which changes more quickly than subsequent indices. FALSE indicates that SAMPLE is the the first index.
`SeqArray`	TRUE uses SeqArray labels for the gdsfmt vector elements. FALSE it uses labels shown in SNPRelate
`envir`	'environment' containing SQLite database and other globals

Value

writes the "filename" file containing the CoreArray data. Then returns an internal pointer, class .gds, to the file data.

Examples

db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = read.Mega2DB(db)
gdsfmtfile = file.path(where_mega2rtutorial_data(), "test.gds")
append_genotype_a = TRUE
append_genotype_b = append_genotype_c = FALSE
gn = Mega2gdsfmt(gdsfmtfile, envir=ENV)
gn

db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = read.Mega2DB(db)
gdsfmtfile = file.path(where_mega2rtutorial_data(), "test.gds")
append_genotype_a = TRUE
append_genotype_b = append_genotype_c = FALSE
gn = Mega2gdsfmt(gdsfmtfile, envir=ENV)
gn

generate gwaa.data-class object from a Mega2R database

Description

Call the Mega2R functions to: create a .tped file, a .tfam file and a .phe file. Then call the GenABEL functions to process these files: the .tped and the .tfam file are processed by convert.snp.tped to produce a tped.raw file. The latter is combined with a .phe (phenotype) file by load.gwaa.data to create a gwaa.data-class object in memory. All these files are deleted when the exits.

Usage

Mega2GenABEL(markers = NULL, mapno = 0, envir = ENV)
Mega2GenABEL(markers = NULL, mapno = 0, envir = ENV)

Arguments

`markers`	data frame of markers to be processed
`mapno`	specify which map index to use for physical distances
`envir`	'environment' containing SQLite database and other globals

Value

gwaa.data-class object generated from the Mega2R database

Examples

## Not run: 
require("GenABEL")
db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = read.Mega2DB(db)
seqsimgwaa = Mega2GenABEL(markers=ENV$markers[1:10,])

str(seqsimgwaa)
head(summary(seqsimgwaa))

## End(Not run)

## Not run: 
require("GenABEL")
db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = read.Mega2DB(db)
seqsimgwaa = Mega2GenABEL(markers=ENV$markers[1:10,])

str(seqsimgwaa)
head(summary(seqsimgwaa))

## End(Not run)

compare two gwaa.data-class objects

Description

Verify by fields, all the fields in two gwaa.data-class objects. Show more detailed marker information iff the coding values are different. (When comparing two gwaa.data-class objects, one native and one created via Mega2R sometimes when an allele frequency is .5 for both alleles, the allele order 1/2 vs 2/1 can not be currently be determined.)

Usage

Mega2GenABELtst(mega_ = mega, gwaa_ = srdta, full = TRUE, envir = ENV)
Mega2GenABELtst(mega_ = mega, gwaa_ = srdta, full = TRUE, envir = ENV)

Arguments

`mega_`	name of first gwaa.data-class object
`gwaa_`	name of second gwaa.data-class object
`full`	if TRUE convert genotypes to text as.character(gwaa_@gtdata) and as.character(mega_@gtdata). Then standardize the order for heterozygous alleles and finally compare. This step is optional because it can be rather slow.
`envir`	'environment' containing SQLite database and other globals

Value

None

Examples

## Not run: 
db = system.file("exdata", "seqsimm.db", package="Mega2R")
require("GenABEL")
ENV = read.Mega2DB(db)

y = Mega2ENVGenABEL()
Mega2GenABELtst(y, y, full = FALSE)

## End(Not run)

## Not run: 
# donttestcheck: if you have more time, try ...
x = Mega2GenABEL()
Mega2GenABELtst(x, y, full = FALSE)

## End(Not run)

## Not run: 
db = system.file("exdata", "seqsimm.db", package="Mega2R")
require("GenABEL")
ENV = read.Mega2DB(db)

y = Mega2ENVGenABEL()
Mega2GenABELtst(y, y, full = FALSE)

## End(Not run)

## Not run: 
# donttestcheck: if you have more time, try ...
x = Mega2GenABEL()
Mega2GenABELtst(x, y, full = FALSE)

## End(Not run)

Execute the pedgene function on a transcript ranges

Description

Execute the pedgene function on the first gs default gene transcript ranges (gs = 1:100). Update the envir$pedgene_results data frame with the results.

Usage

Mega2pedgene(gs = 1:100, genes = NULL, envir = ENV)
Mega2pedgene(gs = 1:100, genes = NULL, envir = ENV)

Arguments

`gs`	a subrange of the default transcript ranges over which to calculate the Dopedgene function.
`genes`	a list of genes over which to calculate the DOpedgene function. The value, "", means use all the transcripts in the selected Bioconductor database. If genes is NULL, the gs range of the internal refRanges* will be used.
`envir`	'environment' containing SQLite database and other globals

Value

None the data frame with the results is stored in the environment and named pedgene_results, viz. envir$pedgene_results

Examples

db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = init_pedgene(db)
ENV$verbose = TRUE
Mega2pedgene(gs = 50:60)

db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = init_pedgene(db)
ENV$verbose = TRUE
Mega2pedgene(gs = 50:60)

Mega2R SQLite3 tables

Description

This character vector indicates the names of the Mega2 SQLite3 database tables to load. (Not all of the existing tables are loaded.)

Usage

TBLS
TBLS

Format

An object of class character of length 15.

Author(s)

Robert V Baron

Mega2R SQLite3 table filter

Description

This list contains named values. The name corresponds to an SQLite database table. The value is a character string of column names from the "named" table that should be fetched. A table is in this list, if not all the database table columns are needed. The columns for each table are separated by commas.

Usage

TBLSFilter
TBLSFilter

Format

An object of class list of length 7.

Note

For the data base tables not in this list, all columns are stored in the corresponding data frame.

Author(s)

Robert V Baron

Mega2R version

Description

This string indicates the current release of Mega2R

Usage

Mega2RVersion
Mega2RVersion

Format

An object of class character of length 1.

Author(s)

Robert V Baron

execute the CRAN SKAT function on a subset of the gene transcripts

Description

Execute the SKAT function on the first gs default gene transcripts (gs = 1:100). Update the envir$SKAT_results data frame with the results.

Usage

Mega2SKAT(f, ty, gs = 1:100, genes = NULL, skat = SKAT::SKAT, envir = ENV, ...)
Mega2SKAT(f, ty, gs = 1:100, genes = NULL, skat = SKAT::SKAT, envir = ENV, ...)

Arguments

`f`	SKAT_Null_Model formula. If this is non NULL, envir$obj is initialized by calling SKAT_Null_Model(f, out_type = ty). If you need to specify additional arguments to the Model viz. (data, Adjustment, n.Resampling, type.Resampling) or need to use a different model viz. SKAT_NULL_emmaX, SKAT_Null_Model_ChrX set the formula to NULL, then before Mega2SKAT is called, build the model you need and assign it to ENV$obj.
`ty`	type of phenotype C/D = Continuous/Binary 5 (internal type 1/2)
`gs`	a subrange of the default transcripts (refRanges) over which to calculate the DOSKAT function.
`genes`	a list of genes over which to calculate the DOSKAT function. The value, "", means use all the transcripts in the selected Bioconductor database. If genes is NULL, the gs range of the internal refRanges* will be used.
`skat`	alternate SKAT function, viz. SKATBinary, SKAT_CommonRare. If it is also necessary is to pass additional arguments to the SKAT function, they may be added to the end of the Mega2SKAT function and will be passed. See examples
`envir`	'environment' containing SQLite database and other globals
`...`	extra arguments for SKAT

Value

None the data frame with the results is stored in the environment and named SKAT_results, viz. envir$SKAT_results

Note

The SKAT_Null_Model is called if the formula, f, is not NULL. A helper function SKAT3arg is defined for the 3 argument callback function which in turn calls DOSKAT with the appropriate arguments (including those additional to the Mega2SKAT function).

Examples

db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = init_SKAT(db, verbose = FALSE, allMarkers = FALSE)
ENV$verbose = FALSE
ENV$SKAT_results = ENV$SKAT_results[0, ]
Mega2SKAT(ENV$phe[, 3] - 1 ~ 1, "D", kernel = "linear.weighted", 
          weights.beta = c(0.5, 0.5), gs=50:60 )


# donttestcheck: try this below if there is time
 Mega2SKAT(ENV$phe[, 3] - 1 ~ 1, "D", kernel = "linear.weighted", 
           weights.beta = c(0.5, 0.5), genes=c("CEP104") )


ENV$SKAT_results

db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = init_SKAT(db, verbose = FALSE, allMarkers = FALSE)
ENV$verbose = FALSE
ENV$SKAT_results = ENV$SKAT_results[0, ]
Mega2SKAT(ENV$phe[, 3] - 1 ~ 1, "D", kernel = "linear.weighted", 
          weights.beta = c(0.5, 0.5), gs=50:60 )


# donttestcheck: try this below if there is time
 Mega2SKAT(ENV$phe[, 3] - 1 ~ 1, "D", kernel = "linear.weighted", 
           weights.beta = c(0.5, 0.5), genes=c("CEP104") )


ENV$SKAT_results

generate a VCF file set for a collection of markers

Description

Generate a VCF file from the specified Mega2 SQLite database. The file is named "prefix".vcf If the markers argument is.null(), the entire envir$markers set is used, otherwise markers argument MUST be rows of the markers (envir$markers) data frame. In addition, several other files are generated to hold additional database information: "prefix".fam, "prefix".freq, "prefix".map, "prefix".phe, and "prefix".pen, which contain the pedigree, allele frequency, marker genetic and physical map position, member phenotype and phenotype penetrance data.

Usage

Mega2VCF(
  prefix,
  markers = NULL,
  mapno = 0,
  alleleOrder = "default",
  envir = ENV
)
Mega2VCF(
  prefix,
  markers = NULL,
  mapno = 0,
  alleleOrder = "default",
  envir = ENV
)

Arguments

`prefix`	prefix of output files including the VCF file (see Description section above). This prefix can include a path.
`markers`	markers selected to be in the VCF output file
`mapno`	specify which map index to use for genetic distances. The function `showMapNames()` will print out the internal map numbers corresponding to all the maps in the Mega2 database.
`alleleOrder`	how to order alleles in VCF file. 'default' is Mega2order, 'minor' is minor allele freq first, 'major' is major allele freq first, and 'name' is ascending ascii character order of allele name.
`envir`	'environment' containing SQLite database and other globals

Value

None

Note

This code in this package illustrates how to extract the various kinds of data in the Mega2 data frames to use for further processing. Some of the data internal representations are a bit quirky but the code "explains" it all.

Examples

db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = read.Mega2DB(db)
vcfdir = file.path(where_mega2rtutorial_data(), "vcfr")
if (!dir.exists(vcfdir)) dir.create(vcfdir)
vcffile = file.path(where_mega2rtutorial_data(), "vcfr", "vcf.01")
Mega2VCF(vcffile, ENV$markers[ENV$markers$chromosome == 1, ][1:10,], envir = ENV)
list.files(vcfdir)

db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = read.Mega2DB(db)
vcfdir = file.path(where_mega2rtutorial_data(), "vcfr")
if (!dir.exists(vcfdir)) dir.create(vcfdir)
vcffile = file.path(where_mega2rtutorial_data(), "vcfr", "vcf.01")
Mega2VCF(vcffile, ENV$markers[ENV$markers$chromosome == 1, ][1:10,], envir = ENV)
list.files(vcfdir)

assemble pedigree information into a data frame

Description

Generate a data frame with a row for each person. The observations are:

pedigree	family pedigree name
person	person name
father	father of person
mother	mother of person
sex	sex of person
trait	value of case/control phenotype for person

Usage

mkfam(brkloop = FALSE, traitname = "default", envir = ENV)
mkfam(brkloop = FALSE, traitname = "default", envir = ENV)

Arguments

`brkloop`	I haven't needed to set this TRUE yet. Maybe never will. If loops are broken, a person will be replaced by a dopple ganger in the same family with a different father/mother. The number of persons per family will be different when there are broken loops. Also, the person_link numbers will be different for all the persons after the first loop is broken.
`traitname`	Name of the trait to use as case/control value; by default, "default"
`envir`	An 'environment' that contains all the data frames created from the SQLite database.

Value

data frame that is described above

Note

The columns of this data frame come by selecting the values after merging the data frames: pedigree_table, person_table, and trait_table.

Also, the father and mother columns from person_table are translated from the row index in the person_table to the corresponding name.

This function stores the data frame in the 'environment' and also returns it. The function setfam() stores the data frame into the 'environment' and adjusts the genotype_table and the phenotype_table.

Examples

db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = read.Mega2DB(db)

fam = mkfam()

fam

db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = read.Mega2DB(db)

fam = mkfam()

fam

create "markers" data frame

Description

Create the markers data frame. It contains 5 observations:

locus_link:: locus offset of this marker
locus_link_fill:: locus offset plus an accumulating fudge factor that jumps with each new chromosome because the count of markers per chromosome is force to be a multiple of 4. (This value corresponds to the offset of the marker in the unified_genotype_table.)
MarkerName:: name of the marker
chromosome:: chromosome number of the marker
position:: base pair position of the marker (selected by bpPosMap[below])

Usage

mkMarkers(bpPosMap = 1, envir = ENV)
mkMarkers(bpPosMap = 1, envir = ENV)

Arguments

`bpPosMap`	An integer that indicates the map (index) to use to merge the chromosome/position fields from the map_table data frame to the marker_table data frame. See showMapNames() for the string name to index mapping.
`envir`	an environment that contains all the data frames created from the SQLite database.

Details

Select a map (index) from the map_table to merge with the select marker_table data frame to make the marker data frame. See showMapNames() for the string name to index mapping.

Value

None

Examples

db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = read.Mega2DB(db, verbose = FALSE)

mkMarkers(1)

ENV$markers

db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = read.Mega2DB(db, verbose = FALSE)

mkMarkers(1)

ENV$markers

generate a phenotype data frame

Description

Convert data in phenotype_table to a data frame of columns that are phenotypes. The columns may be affection status or quantitative values

Usage

mkphenotype(envir)
mkphenotype(envir)

Arguments

envir

"environment" containing SQLite database and other globals

Value

is a data frame with FID column, then IID column, and then an additional column for each phenotype.

Examples

db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = read.Mega2DB(db)
out = mkphenotype()

out
db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = read.Mega2DB(db)
out = mkphenotype()

out

load Mega2 database and initialize family data frame and markers data frame

Description

Call dbmega2_import() with the specified database and create an 'environment', with the SQLite table data loaded into data frames. Also run mkfam() to create the pedigree data frame fam and then store it with setfam(). setfam() modifies the unified_genotype_table (and phenotype_table) to match the family members that remain.

Usage

read.Mega2DB(db, ...)
read.Mega2DB(db, ...)

Arguments

`db`	specify SQLite database to load
`...`	additional arguments to pass to `dbmega2_import`

Value

an 'environment' that contains all the data frames created from the SQLite database.

Note

By default, mkfam will remove one of each person that was replicated to break loops in the pedigree, see mkfam for details. If you want to leave loops broken, the code is available, but you will have to write your own version of read.Mega2DB with a different invokation of mkfam().

Examples

db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = read.Mega2DB(db, verbose = TRUE)

db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = read.Mega2DB(db, verbose = TRUE)

set default name of transcription database and name of database mapping gene name to entrez gene id

Description

This function takes two string parameters: one to specify entrez gene ids to transcripts, the other to map gene names to entrez gene id's.

Usage

setAnnotations(txdb, entrezGene, envir = ENV)
setAnnotations(txdb, entrezGene, envir = ENV)

Arguments

`txdb`	name of Bioconductor transcription database.
`entrezGene`	name of Bioconductor mapping of gene name or gene alias to entrez gene id
`envir`	an 'environment' that contains all the data frames created from the SQLite database.

Value

None

Note

Mega2R will take care to load the necessary databases, but you will have to install them from Bioconductor. This is explained at length in the package Vignette.

Examples

db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = read.Mega2DB(db)

setAnnotations("TxDb.Hsapiens.UCSC.hg19.knownGene", "org.Hs.eg.db")

ENV$txdb
ENV$entrezGene

db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = read.Mega2DB(db)

setAnnotations("TxDb.Hsapiens.UCSC.hg19.knownGene", "org.Hs.eg.db")

ENV$txdb
ENV$entrezGene

replace the pedigree data frame

Description

You should first modify the fam data frame to filter the members you need to remove. (For example, you might want to delete members that have an unknown case/control status.) This function takes a new data frame of pedigree information and replaces the fam data frame in the 'environment' with it. Additionally, changing fam data frame will filter the genotypes data frame to only contain persons matching those in the fam data frame. setfam also filters for the phenotype data records.

Usage

setfam(fam, envir = ENV)
setfam(fam, envir = ENV)

Arguments

`fam`	data frame of family information filtered from fam data frame (generated by `mkfam`).
`envir`	an 'environment' that contains all the data frames created from the SQLite database.

Value

None

Examples

db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = read.Mega2DB(db)

fam = mkfam()
# remove founders
fam = fam[ !( (fam[ , 5] == fam[ , 6]) & (fam[ , 5] == 0)), ]
setfam(fam)

ENV$fam
db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = read.Mega2DB(db)

fam = mkfam()
# remove founders
fam = fam[ !( (fam[ , 5] == fam[ , 6]) & (fam[ , 5] == 0)), ]
setfam(fam)

ENV$fam

set default range data: chromosome and start/end base pair

Description

This function sets the default list of ranges used by applyFnToRanges. applyFnToRanges examines each range and the set of markers that fall within the range will be processed.

Usage

setRanges(ranges, indices, envir = ENV)
setRanges(ranges, indices, envir = ENV)

Arguments

`ranges`	a data frame that contains at least 4 observations: a name, a chromosome, a start base pair position and an end base pair position.
`indices`	a vector of 3 or 4 integers that specify the chromosome column, start base pair, column and end base pair column of range data frame and lastly the name column. If the vector only contains 3 integers, a name will be generated from the three range elements and it will be appended to the ranges and the last range column will be added to the indices.
`envir`	an 'environment' that contains all the data frames created from the SQLite database.

Value

None

Examples

db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = read.Mega2DB(db)

ranges = matrix(c(1, 2240000, 2245000,
                  1, 2245000, 2250000,
                  1, 3760000, 3761000,
                  1, 3761000, 3762000,
                  1, 3762000, 3763000,
                  1, 3763000, 3764000,
                  1, 3764000, 3765000,
                  1, 3765000, 3763760,
                  1, 3763760, 3767000,
                  1, 3767000, 3768000,
                  1, 3768000, 3769000,
                  1, 3769000, 3770000),
                 ncol = 3, nrow = 12, byrow = TRUE)

setRanges(ranges, 1:3)

ENV$refRanges

ranges = matrix(c(1, 2240000, 2245000,
                  1, 2245000, 2250000,
                  1, 3760000, 3761000,
                  1, 3761000, 3762000,
                  1, 3762000, 3763000,
                  1, 3763000, 3764000,
                  1, 3764000, 3765000,
                  1, 3765000, 3763760,
                  1, 3763760, 3767000,
                  1, 3767000, 3768000,
                  1, 3768000, 3769000,
                  1, 3769000, 3770000),
                 ncol = 3, nrow = 12, byrow = TRUE)
ranges = data.frame(ranges)
ranges$name = LETTERS[1:12]
names(ranges) = c("chr", "start", "end", "name")

setRanges(ranges, 1:4)

ENV$refRanges
db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = read.Mega2DB(db)

ranges = matrix(c(1, 2240000, 2245000,
                  1, 2245000, 2250000,
                  1, 3760000, 3761000,
                  1, 3761000, 3762000,
                  1, 3762000, 3763000,
                  1, 3763000, 3764000,
                  1, 3764000, 3765000,
                  1, 3765000, 3763760,
                  1, 3763760, 3767000,
                  1, 3767000, 3768000,
                  1, 3768000, 3769000,
                  1, 3769000, 3770000),
                 ncol = 3, nrow = 12, byrow = TRUE)

setRanges(ranges, 1:3)

ENV$refRanges

ranges = matrix(c(1, 2240000, 2245000,
                  1, 2245000, 2250000,
                  1, 3760000, 3761000,
                  1, 3761000, 3762000,
                  1, 3762000, 3763000,
                  1, 3763000, 3764000,
                  1, 3764000, 3765000,
                  1, 3765000, 3763760,
                  1, 3763760, 3767000,
                  1, 3767000, 3768000,
                  1, 3768000, 3769000,
                  1, 3769000, 3770000),
                 ncol = 3, nrow = 12, byrow = TRUE)
ranges = data.frame(ranges)
ranges$name = LETTERS[1:12]
names(ranges) = c("chr", "start", "end", "name")

setRanges(ranges, 1:4)

ENV$refRanges

show the association between mapno and mapname

Description

Mega2R allows several different physical and genetic maps to be stored and used to select positions. This function shows the association between map number and map name.

Usage

showMapNames(envir = ENV)
showMapNames(envir = ENV)

Arguments

envir

an environment that contains all the data frames created from the SQLite database.

Value

None

Examples

db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = read.Mega2DB(db)

showMapNames()

db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = read.Mega2DB(db)

showMapNames()

show Mega2R environment, viz. data frames and related info.

Description

Mega2R uses an environment to store the data frames when it reads SQLite database tables. This function shows the data frames and their sizes; it also shows the count of samples and markers in the database. Note: It is not necessary to provide an argument, if the environment is named ENV.

Usage

showMega2ENV(envir = ENV)
showMega2ENV(envir = ENV)

Arguments

envir

an environment that contains all the data frames created from the SQLite database.

Value

None

Examples

db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = read.Mega2DB(db)

showMega2ENV()

db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = read.Mega2DB(db)

showMega2ENV()

show the association between index no and phenotype

Description

Mega2R stores several phenotypes, both affective and quantitative. This function displays the mapping between phenotype (name), index, and the phenotype type (affection or quantitative).

Usage

showPhenoNames(envir = ENV)
showPhenoNames(envir = ENV)

Arguments

envir

an environment that contains all the data frames created from the SQLite database.

Value

None

Examples

db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = read.Mega2DB(db)

showPhenoNames()

db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = read.Mega2DB(db)

showPhenoNames()

regenerate fam data frame with unique values in member column

Description

Reads the fam data frame in "envir" and returns a new one with unique entries in the member column

Usage

uniqueFamMember(envir = ENV)
uniqueFamMember(envir = ENV)

Arguments

envir

'environment' containing SQLite database and other globals

Value

a data frame with columns the same as the "fam" data frame but with the member column containing unique entries

Examples

db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = read.Mega2DB(db)
setfam(uniqueFamMember(envir = ENV))

db = system.file("exdata", "seqsimm.db", package="Mega2R")
ENV = read.Mega2DB(db)
setfam(uniqueFamMember(envir = ENV))

show directory of tutorial data

Description

This function shows the directory the Mega2Rtutorial (inst/exdata) was copied to.

Usage

where_mega2rtutorial_data(dir = file.path(tempdir(), "Mega2Rtutorial"))
where_mega2rtutorial_data(dir = file.path(tempdir(), "Mega2Rtutorial"))

Arguments

dir

The directory to store the tutorial data to. By default, this is tempdir()/Mega2Rtutorial

Value

dir tutorial to hold vignette

Examples

directory = where_mega2rtutorial_data()

directory = where_mega2rtutorial_data()

Package 'Mega2R'

Help Index

apply a function to the genotypes (markers) in each gene transcript and/or base pair range

Description

Usage

Arguments

Value

Note

Examples

apply a function to the genotypes from a set of markers

Description

Usage

Arguments

Value

Examples

apply a function to all the genotypes for markers found in several specified ranges

Description

Usage

Arguments

Value

Note

Examples

remove tutorial data

Description

Usage

Arguments

Value

Examples

computeDosage function

Description

Usage

Arguments

Value

See Also

Examples

read Mega2 SQLite database into R

Description

Usage

Arguments

Value

Examples

DofamSKATRC call back function

Description

Usage

Arguments

Value

Note

See Also

Examples

pedgene call back function

Description

Usage

Arguments

Value

Note

See Also

Examples

SKAT call back function

Description

Usage

Arguments

Value

Note

See Also

Examples

dump tutorial data

Description

Usage

Arguments

Value

Examples

fetch genotype character matrix for specified markers

Description

Usage

Arguments

Details

Value

Examples

fetch dosage integer matrix for specified markers

Description