| Title: | Omic Analysis uem915 |
|---|---|
| Description: | Omic Analysis uem915. |
| Authors: | Florent Dumont [aut, cre] (ORCID: <https://orcid.org/0000-0002-4439-5070>) |
| Maintainer: | Florent Dumont <[email protected]> |
| License: | GPL-3 |
| Version: | 1.0.2 |
| Built: | 2026-05-11 09:20:38 UTC |
| Source: | https://github.com/fdumbioinfo/uem915 |
Principal Component Analysis
acp( dat, factor = NULL, samplename = NULL, pc1 = 1, pc2 = 2, center = TRUE, scale = TRUE, title = "ACP", legendtitle = "TREATMENT" )acp( dat, factor = NULL, samplename = NULL, pc1 = 1, pc2 = 2, center = TRUE, scale = TRUE, title = "ACP", legendtitle = "TREATMENT" )
dat |
matrix numeric |
factor |
factor |
samplename |
character |
pc1 |
numeric |
pc2 |
numeric |
center |
logical TRUE |
scale |
logical TRUE |
title |
character |
legendtitle |
character |
no values
Florent Dumont [email protected]
# not run # acp( mat1 , sif1 )# not run # acp( mat1 , sif1 )
Annotate a list of symbols or IDs
annot( symbollist, species = NULL, ortholog = F, dboutput = "ncbi", idtype = NULL )annot( symbollist, species = NULL, ortholog = F, dboutput = "ncbi", idtype = NULL )
symbollist |
character list of IDs or Symbols |
species |
character for species hs mm rn dr |
ortholog |
logical return homo sapiens ortholog of species |
dboutput |
character database used for Symbol annotation ncbi or ebi |
idtype |
character annotation database ID type among SYMBOL (by defaut) GENE, ENST, ENSG, ENSP, UNIPROT |
supported is : symbol, ncbi gene, ensembl gene , transcrit, protein, uniprot swissrot, uniprot trembl species : hs homo sapien , mm mus musculus , rn rattus norvegicus, dr danio rerio
data.frame
Florent Dumont [email protected]
# not run # annot(SymbolList)# not run # annot(SymbolList)
Boxplot
boxplot( dat, factor, outline = FALSE, title = "Boxplot", legendtitle = "TREATMENT", outlier = T, coefiqr = 1.5, ggplot = FALSE )boxplot( dat, factor, outline = FALSE, title = "Boxplot", legendtitle = "TREATMENT", outlier = T, coefiqr = 1.5, ggplot = FALSE )
dat |
matrix numeric |
factor |
factor |
outline |
logical display outliers FALSE by default |
title |
character |
legendtitle |
character |
outlier |
boolean |
coefiqr |
numeric |
ggplot |
logical use graphics library or ggplot FALSE by default |
To make boxplot from matrix.
plot
Florent Dumont [email protected]
# not run # mat1 %>% boxplot( factor = sif1$F3 )# not run # mat1 %>% boxplot( factor = sif1$F3 )
boxplot for one var
boxplot1(dat, ylab = "y", xlab = "TREATMENT", log = T)boxplot1(dat, ylab = "y", xlab = "TREATMENT", log = T)
dat |
data.frame |
ylab |
character |
xlab |
character |
log |
logical if TRUE data are delog in base 2 |
plot
Florent Dumont [email protected]
# not run# not run
MSigDB enrichment analysis
ena( SymbolList = NULL, geneannot = NULL, species = "hs", bg = 25000, filtergeneset = "all", overlapmin = 2, enaScoremin = 1, top = 80, labsize = 11, dpibarplot = "screen", path = ".", dirname = NULL )ena( SymbolList = NULL, geneannot = NULL, species = "hs", bg = 25000, filtergeneset = "all", overlapmin = 2, enaScoremin = 1, top = 80, labsize = 11, dpibarplot = "screen", path = ".", dirname = NULL )
SymbolList |
character Symbol or NCBI gene ID |
geneannot |
data.frame |
species |
character hs mm rn dr ss |
bg |
numeric |
filtergeneset |
regexp to filter geneset database |
overlapmin |
numeric for minimum overlap between geneset and list |
enaScoremin |
numeric for minimum ratio ena |
top |
numeric top features to plot |
labsize |
numeric size of function in barplot |
dpibarplot |
character barplot resolution |
path |
character for relative path of output directory |
dirname |
character name for output |
file with enrichment analysis results
Florent Dumont [email protected]
# not run # ena( Symbollist , filtergeneset = "reactome")# not run # ena( Symbollist , filtergeneset = "reactome")
load magrittr, dplyr, gplots, ggplot2, foreach, parallel, doParallel
env()env()
make a hierarchical clustering classification
hc( dat, factor = NULL, title = "Hierarchical Clustering", plot = TRUE, method = "complete", legendtitle = "TREATMENT", cexlabel = 0.6 )hc( dat, factor = NULL, title = "Hierarchical Clustering", plot = TRUE, method = "complete", legendtitle = "TREATMENT", cexlabel = 0.6 )
dat |
matrix numeric |
factor |
factor |
title |
character |
plot |
logical |
method |
character to choose agglomerative method of clustering dendrogramm |
legendtitle |
character |
cexlabel |
numeric |
Possible agglomerative method are the same hclust fonction : "complete" method by default
a dendrogram
Florent Dumont [email protected]
# not run # hc(dat)# not run # hc(dat)
To make a heatmap
heatmap( dat, factor, method = "complete", dendrogram = "both", k = NULL, labCol = "", cexCol = 0.85, labRow = "", cexRow = 0.35, cexlegend = 0.65, keysize = 0.9, keycolor = c("darkgreen", "orange", "darkred"), parmar = c(5, 4, 5, 6) )heatmap( dat, factor, method = "complete", dendrogram = "both", k = NULL, labCol = "", cexCol = 0.85, labRow = "", cexRow = 0.35, cexlegend = 0.65, keysize = 0.9, keycolor = c("darkgreen", "orange", "darkred"), parmar = c(5, 4, 5, 6) )
dat |
matrix numeric |
factor |
factor |
method |
character |
dendrogram |
character to display 'none', 'row', 'column' or 'both' (by default) dendrograms |
k |
numeric number of clusters to colorize for rows |
labCol |
character |
cexCol |
numeric |
labRow |
Character |
cexRow |
numeric |
cexlegend |
numeric |
keysize |
numeric |
keycolor |
character of 3 for low mid high value of the key |
parmar |
numeric 4 values for margin sizes |
To make a heatmap from a matrix or a data.frame
no returned value
Florent Dumont [email protected]
# not run # library(magrittr) # data(sif1) # data(mat1) # mat1 %>% heatmap(sif1$F3)# not run # library(magrittr) # data(sif1) # data(mat1) # mat1 %>% heatmap(sif1$F3)
import tab file in data.frame
input(filename, sep = "\t", quote = "")input(filename, sep = "\t", quote = "")
filename |
character path to the file to read |
sep |
character for field separator |
quote |
character for field quote |
wrapper of read.table function for tabular separated files
data.frame
Florent Dumont [email protected]
# not run # input( "filename" ) -> dt# not run # input( "filename" ) -> dt
quantile normalization and log2
norm(dat, method = NULL, log = TRUE)norm(dat, method = NULL, log = TRUE)
dat |
data.frame |
method |
character apply quantile normalization by default see details |
log |
logical apply log base 2 |
for .method see limma normalizeBetweenArrays method
data.frame
Florent Dumont [email protected]
# not run # norm(dt)# not run # norm(dt)
Biostatistics analysis: QCs, ANOVA, threshold filtering, venn diagramm, cluster analysis for significant rows, pattern search
Graphics: volcanoplots, heatmaps, lineplots, boxplots (with Kruskal-Wallis test)
Functional analysis: MSigDB enrichment analysis, stringDB protein interaction network, topGO analysis (gene ontology)
See example section to test workflow with internal data.
omic( dat, sif, annot = NULL, species = "hs", model = NULL, paired = NULL, nested = NULL, batch = NULL, addfactor = NULL, qcs = TRUE, threshold = c(c(2, 6, 11), c(2, 6, 11) + 120), padj = "none", pattern = TRUE, venn = TRUE, cluster = TRUE, nc = c(2, 3, 6, 12), heatmap = TRUE, maxheatmap = NULL, volcanoplot = TRUE, lineplot = TRUE, boxplotrow = TRUE, ena = TRUE, enamin = 2, filtergeneset = "all", enaScoremin = 1.1, bg = 25000, sample = NULL, dopar = NULL, path = ".", dirname = NULL, zip = FALSE, remove = FALSE )omic( dat, sif, annot = NULL, species = "hs", model = NULL, paired = NULL, nested = NULL, batch = NULL, addfactor = NULL, qcs = TRUE, threshold = c(c(2, 6, 11), c(2, 6, 11) + 120), padj = "none", pattern = TRUE, venn = TRUE, cluster = TRUE, nc = c(2, 3, 6, 12), heatmap = TRUE, maxheatmap = NULL, volcanoplot = TRUE, lineplot = TRUE, boxplotrow = TRUE, ena = TRUE, enamin = 2, filtergeneset = "all", enaScoremin = 1.1, bg = 25000, sample = NULL, dopar = NULL, path = ".", dirname = NULL, zip = FALSE, remove = FALSE )
dat |
data.frame normalize data table |
sif |
data.frame sample information file including model factors |
annot |
data.frame annotation with Symbol column for functional analysis |
species |
character available species: bt ce dr dm gg hs mm pt rn ss xt |
model |
character anova model factors (see details) |
paired |
character factor for paired design |
nested |
character factor for nested design |
batch |
character factor for batch effect design |
addfactor |
character additionnal factors |
qcs |
logical quality controls |
threshold |
numeric vector from 1 to 160 (see details) |
padj |
character fdr by defaut for Benjamini-Hochberg false discovery correction |
pattern |
logical search relevant pattern accross comparisons (see details) |
venn |
logical venn diagram |
cluster |
logical row hierarchical clustering using pearson correlation |
nc |
numeric number of clusters to cut in dendrogramm |
heatmap |
logical do heatmaps for all lists |
maxheatmap |
numeric max rows for heatmap |
volcanoplot |
logical make volcanoplot for each threshold |
lineplot |
logical do lineplot for significant features |
boxplotrow |
logical do boxplot for significant features with Kruskal |
ena |
logical msigdb enrichement analysis (over-representation analysis) |
enamin |
numeric min list size for functional analysis |
filtergeneset |
character regular expression to filter collection geneset (e.g. "reactome|tft") |
enaScoremin |
numeric for minimum ratio ena |
bg |
numeric background used for over-representation test |
sample |
numeric subset analysis |
dopar |
numeric core number |
path |
character results directory path |
dirname |
character results directory name |
zip |
logical compress results directory if TRUE |
remove |
logical remove uncompress results directory if TRUE |
Use uem915::env() to load required libraries before uem915::omic() (see example)
Accepted values for threshold param (1 to 150): see list -> uem915:::thresholdlist %>% lapply("[",c(1,2)) %>% unlist %>% matrix(ncol=2,byrow = T) %>% data.frame %>% setNames(c("pval","fc"))
Pattern: search relevant profiles among up and down comparison combinations
filtergeneset param: see geneset collections -> moalannotgene::genesetdb %>% lapply(names)
Experimental design examples:
model = "TREAMTENT" for 1-way anova
model = "TREATMENT+TIME+TREATMENT*TIME" for 3-ways anova with interaction
model = "TREATMENT", paired = "CASE" for 2-ways paired anova
model = "TREATMENT", batch = "BATCH" for 2-ways anova with remove batch effect (<=> paired anova)
model = "TREATMENT+PHENOTYPE" for 2-ways anova
model = "TREATMENT", addfactor = "PHENOTYPE" for 2-ways anova but venn, cluster and pattern are not applied to addfactor
model = "TREATMENT", nested = "TREATMENTinCASE" for 2-ways nested design
model = "TREATMENT", paired = "CASE", batch = "BATCH" for 3-ways paired anova with remove batch effect
Limitations:
Complete block designs only
Use dopar = 2 to decrease computing resources
sample param will subsets random rows in dat and decrease analysis time.
Annotation updates: 05112023 for gene and ensembl, MSigDB 7.5.1, StringDB 12.0
Input format:
Use uem915::input() to load external data from tsv files
norm data table must contains IDs in first column
norm data (in columns) and sample information (in rows) must have same order
norm data and annotation rows must have same order
use uem915::annot() function to annotate IDs (Symbols, ensembl and gene ids accepted)
omic results directory
Florent Dumont [email protected]
# # test omic() with internal dataset GSE65055: # # loading libraries # library(uem915) # uem915::env() # # loading norm data # moal:::GSE65055normdata -> normdata # normdata %>% head # # loading sample information file # moal:::GSE65055sampledata -> sampledata # sampledata %>% head # # ordering factor levels # sampledata$ANEUPLOIDY %>% ordered( c("Control","T13","T18","T21") ) -> sampledata$ANEUPLOIDY # sampledata$TISSUE %>% as.factor -> sampledata$TISSUE # # annotation # normdata$rowID %>% moal::annot( species = "hs", idtype = "GENE" ) -> annotdata # # omic analysis # moal::omic( # dat = normdata, sif = sampledata, annot = annotdata, species = "hs", # model = "ANEUPLOIDY", batch = "TISSUE", threshold = c(6,126), # heatmap = T, lineplot = T, boxplotrow = T, # venn = F, cluster = F, pattern = F, # ena = T, network = F, topgo = F, # sample = NULL, dopar = NULL, zip = F, # dirname = "test", path = "." )# # test omic() with internal dataset GSE65055: # # loading libraries # library(uem915) # uem915::env() # # loading norm data # moal:::GSE65055normdata -> normdata # normdata %>% head # # loading sample information file # moal:::GSE65055sampledata -> sampledata # sampledata %>% head # # ordering factor levels # sampledata$ANEUPLOIDY %>% ordered( c("Control","T13","T18","T21") ) -> sampledata$ANEUPLOIDY # sampledata$TISSUE %>% as.factor -> sampledata$TISSUE # # annotation # normdata$rowID %>% moal::annot( species = "hs", idtype = "GENE" ) -> annotdata # # omic analysis # moal::omic( # dat = normdata, sif = sampledata, annot = annotdata, species = "hs", # model = "ANEUPLOIDY", batch = "TISSUE", threshold = c(6,126), # heatmap = T, lineplot = T, boxplotrow = T, # venn = F, cluster = F, pattern = F, # ena = T, network = F, topgo = F, # sample = NULL, dopar = NULL, zip = F, # dirname = "test", path = "." )
export data.frame in tab file
output(dt, filename)output(dt, filename)
dt |
data.frame |
filename |
character |
Florent Dumont [email protected]
# not run # output( dt )# not run # output( dt )
do descriptive statistics : histogram, boxplot , hierarchical clustering and ACP for column
qc( dat, sif = NULL, inputdata = F, histo = TRUE, boxplot = TRUE, hc = TRUE, acp = TRUE, dirname = NULL, path = "." )qc( dat, sif = NULL, inputdata = F, histo = TRUE, boxplot = TRUE, hc = TRUE, acp = TRUE, dirname = NULL, path = "." )
dat |
matrix numeric |
sif |
data.frame |
inputdata |
logical to export input data or not |
histo |
logical do histogram if TRUE by defaut |
boxplot |
logical do boxplot if TRUE by defaut |
hc |
logical do hierarchical clustering if TRUE by defaut |
acp |
logical do principal component analysis if TRUE by defaut |
dirname |
character |
path |
character |
return pval for each factor of anova model function use doparalle
data.frame
Florent Dumont [email protected]
# not run # qc(dat, metadata)# not run # qc(dat, metadata)
replace value by group median
replacegroupmed(dat, value = 0, factor)replacegroupmed(dat, value = 0, factor)
dat |
character vector of mzxml file |
value |
numeric value to substitute |
factor |
character |
replace value by column median if group size is one and replace by row group median if group size > 1
data.frame
Florent Dumont [email protected]
# not run # replacegroupmed(dat)# not run # replacegroupmed(dat)
To make a Venn diagramm of 2, 3 or 4 lists
venn( list = NULL, listnames = NULL, returnlist = F, title = "Venn Diagram", plot = T, export = F, path = ".", dirname = "venn" )venn( list = NULL, listnames = NULL, returnlist = F, title = "Venn Diagram", plot = T, export = F, path = ".", dirname = "venn" )
list |
list of 2 , 3 or 4 character vector or list of two data.frame to compare |
listnames |
character list names to display on graph |
returnlist |
logical |
title |
character title to display on graph |
plot |
logical to display the plot or not |
export |
logical export list in file |
path |
character |
dirname |
character name of the directory created when export = T |
until 4 list
venn plot and new lists generated by venn.
Florent Dumont [email protected]
# not run # library(magrittr) # list( # c(letters[6:20] , letters[25] ) , # letters[1:15] , # c( letters[2:5] , letters[8:23] ) ) %>% venn# not run # library(magrittr) # list( # c(letters[6:20] , letters[25] ) , # letters[1:15] , # c( letters[2:5] , letters[8:23] ) ) %>% venn