Title: | Consensus Clustering |
---|---|
Description: | Clustering, or cluster analysis, is a widely used technique in bioinformatics to identify groups of similar biological data points. Consensus clustering is an extension to clustering algorithms that aims to construct a robust result from those clustering features that are invariant under different sources of variation. For the reference, please cite the following paper: Yousefi, Melograna, et. al., (2023) <doi:10.3389/fmicb.2023.1170391>. |
Authors: | Behnam Yousefi [aut, cre, cph] |
Maintainer: | Behnam Yousefi <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.5.0 |
Built: | 2025-02-26 02:59:11 UTC |
Source: | https://github.com/cran/ConsensusClustering |
Convert adjacency function to the affinity matrix
adj_conv(adj.mat, alpha = 1)
adj_conv(adj.mat, alpha = 1)
adj.mat |
Adjacency matrix. The elements must be within [-1, 1]. |
alpha |
soft threshold value (see details). |
adj = exp(-(1-adj)^2/(2*alpha^2)) ref: Luxburg (2007), "A tutorial on spectral clustering", Stat Comput
the matrix if affinity values.
Adj_mat = rbind(c(0.0,0.9,0.0), c(0.9,0.0,0.2), c(0.0,0.2,0.0)) adj_conv(Adj_mat)
Adj_mat = rbind(c(0.0,0.9,0.0), c(0.9,0.0,0.2), c(0.0,0.2,0.0)) adj_conv(Adj_mat)
Covert data matrix to adjacency matrix
adj_mat(X, method = "euclidian")
adj_mat(X, method = "euclidian")
X |
a matrix of samples by features. |
method |
method for distance calculation:
|
calculated adjacency matrix from the data matrix using the specified methods
X = gaussian_clusters()$X Adj = adj_mat(X, method = "euclidian")
X = gaussian_clusters()$X Adj = adj_mat(X, method = "euclidian")
Count the number of clusters based on stability score.
cc_cluster_count(CM, plot.cdf = TRUE, plot.logit = FALSE)
cc_cluster_count(CM, plot.cdf = TRUE, plot.logit = FALSE)
CM |
list of consensus matrices each for a specific number of clusters.
It can be the output of |
plot.cdf |
binary value to plot the cumulative distribution functions of |
plot.logit |
binary value to plot the logit model of cumulative distribution functions of |
Count the number of clusters given a list of consensus matrices each for a specific number of clusters.
Using different methods: "LogitScore", "PAC", "deltaA", "CMavg"
results as a list:
"LogitScore", "PAC", "deltaA", "CMavg"
,
"Kopt_LogitScore", "Kopt_PAC", "Kopt_deltaA", "Kopt_CMavg"
X = gaussian_clusters()$X Adj = adj_mat(X, method = "euclidian") CM = consensus_matrix(Adj, max.cluster=3, max.itter=10) Result = cc_cluster_count(CM, plot.cdf=FALSE)
X = gaussian_clusters()$X Adj = adj_mat(X, method = "euclidian") CM = consensus_matrix(Adj, max.cluster=3, max.itter=10) Result = cc_cluster_count(CM, plot.cdf=FALSE)
Relabeling clusters based on cluster similarities
cluster_relabel(x1, x2)
cluster_relabel(x1, x2)
x1 |
clustering vector 1 Zero elements are are considered as unclustered samples |
x2 |
clustering vector 2 Zero elements are are considered as unclustered samples |
When performing performing several clustering, the cluster labels may no match with each other. To perform maximum voting, the clustering need to be relabels based on label similarities.
dataframe of relabeled clusters
X = gaussian_clusters()$X x1 = kmeans(X, 5)$cluster x2 = kmeans(X, 5)$cluster clusters = cluster_relabel(x1, x2)
X = gaussian_clusters()$X x1 = kmeans(X, 5)$cluster x2 = kmeans(X, 5)$cluster clusters = cluster_relabel(x1, x2)
Calculate the Co-cluster matrix for a given set of clustering results.
coCluster_matrix(X, verbos = TRUE)
coCluster_matrix(X, verbos = TRUE)
X |
clustering matrix of Nsamples x Nclusterings. Zero elements are are considered as unclustered samples |
verbos |
binary value for verbosity (default = |
Co-cluster matrix or consensus matrix (CM) is a method for consensus mechanism explaned in Monti et al. (2003).
The normalized matrix of Co-cluster frequency of any pairs of samples (Nsamples x Nsamples)
Clustering = cbind(c(1,1,1,2,2,2), c(1,1,2,1,2,2)) coCluster_matrix(Clustering, verbos = FALSE)
Clustering = cbind(c(1,1,1,2,2,2), c(1,1,2,1,2,2)) coCluster_matrix(Clustering, verbos = FALSE)
Build connectivity matrix
connectivity_matrix(clusters)
connectivity_matrix(clusters)
clusters |
a vector of clusterings. Zero elements mean that the sample was absent during clustering |
Connectivity matrix (M) is a binary matrix N-by-N M[i,j] = 1 if sample i and j are in the same cluster ref: Monti et al. (2003) "Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data", Machine Learning
Connectivity matrix
con_mat = connectivity_matrix(c(1,1,1,2,2,2))
con_mat = connectivity_matrix(c(1,1,1,2,2,2))
Calculate consensus matrix for data perturbation consensus clustering
consensus_matrix( X, max.cluster = 5, resample.ratio = 0.7, max.itter = 100, clustering.method = "hclust", adj.conv = TRUE, verbos = TRUE )
consensus_matrix( X, max.cluster = 5, resample.ratio = 0.7, max.itter = 100, clustering.method = "hclust", adj.conv = TRUE, verbos = TRUE )
X |
adjacency matrix a Nsample x Nsample |
max.cluster |
maximum number of clusters |
resample.ratio |
the data ratio to use at each itteration. |
max.itter |
maximum number of itterations at each |
clustering.method |
base clustering method: |
adj.conv |
binary value to apply soft thresholding (default= |
verbos |
binary value for verbosity (default= |
performs data perturbation consensus clustering and obtain consensus matrix
Monti et al. (2003) consensus clustering algorithm
This function will be removed in the future release and is replaced by consensus_matrix_data_prtrb()
list of consensus matrices for each k
X = gaussian_clusters()$X Adj = adj_mat(X, method = "euclidian") CM = consensus_matrix(Adj, max.cluster=3, max.itter=10, verbos = FALSE)
X = gaussian_clusters()$X Adj = adj_mat(X, method = "euclidian") CM = consensus_matrix(Adj, max.cluster=3, max.itter=10, verbos = FALSE)
Calculate consensus matrix for data perturbation consensus clustering
consensus_matrix_data_prtrb( X, max.cluster = 5, resample.ratio = 0.7, max.itter = 100, clustering.method = "hclust", adj.conv = TRUE, verbos = TRUE )
consensus_matrix_data_prtrb( X, max.cluster = 5, resample.ratio = 0.7, max.itter = 100, clustering.method = "hclust", adj.conv = TRUE, verbos = TRUE )
X |
adjacency matrix a Nsample x Nsample |
max.cluster |
maximum number of clusters |
resample.ratio |
the data ratio to use at each itteration. |
max.itter |
maximum number of itterations at each |
clustering.method |
base clustering method: |
adj.conv |
binary value to apply soft thresholding (default= |
verbos |
binary value for verbosity (default= |
performs data perturbation consensus clustering and obtain consensus matrix Monti et al. (2003) consensus clustering algorithm
list of consensus matrices for each k
X = gaussian_clusters()$X Adj = adj_mat(X, method = "euclidian") CM = consensus_matrix_data_prtrb(Adj, max.cluster=3, max.itter=10, verbos = FALSE)
X = gaussian_clusters()$X Adj = adj_mat(X, method = "euclidian") CM = consensus_matrix_data_prtrb(Adj, max.cluster=3, max.itter=10, verbos = FALSE)
Calculate consensus matrix for multi-data consensus clustering
consensus_matrix_multiview( X, max.cluster = 5, sample.set = NA, clustering.method = "hclust", adj.conv = TRUE, verbos = TRUE )
consensus_matrix_multiview( X, max.cluster = 5, sample.set = NA, clustering.method = "hclust", adj.conv = TRUE, verbos = TRUE )
X |
list of adjacency matrices for different cohorts (or views). |
max.cluster |
maximum number of clusters |
sample.set |
vector of samples the clustering is being applied on. |
clustering.method |
base clustering method: |
adj.conv |
binary value to apply soft threshold (default= |
verbos |
binary value for verbosity (default= |
performs multi-data consensus clustering and obtain consensus matrix Monti et al. (2003) consensus clustering algorithm
description list of consensus matrices for each k
data = multiview_clusters (n = c(40,40,40), hidden.dim = 2, observed.dim = c(2,2,2), sd.max = .1, sd.noise = 0, hidden.r.range = c(.5,1)) X_observation = data[["observation"]] Adj = list() for (i in 1:length(X_observation)) Adj[[i]] = adj_mat(X_observation[[i]], method = "euclidian") CM = consensus_matrix_multiview(Adj, max.cluster = 4, verbos = FALSE)
data = multiview_clusters (n = c(40,40,40), hidden.dim = 2, observed.dim = c(2,2,2), sd.max = .1, sd.noise = 0, hidden.r.range = c(.5,1)) X_observation = data[["observation"]] Adj = list() for (i in 1:length(X_observation)) Adj[[i]] = adj_mat(X_observation[[i]], method = "euclidian") CM = consensus_matrix_multiview(Adj, max.cluster = 4, verbos = FALSE)
Generate clusters of data points from Gaussian distribution with randomly generated parameters
gaussian_clusters( n = c(50, 50), dim = 2, sd.max = 0.1, sd.noise = 0.01, r.range = c(0.1, 1) )
gaussian_clusters( n = c(50, 50), dim = 2, sd.max = 0.1, sd.noise = 0.01, r.range = c(0.1, 1) )
n |
vector of number of data points in each cluster
The length of |
dim |
number of dimensions |
sd.max |
maximum standard deviation of clusters |
sd.noise |
standard deviation of the added noise |
r.range |
the range (min, max) of distance of cluster centers from the origin |
a list of data points (X) and cluster labels (class)
data = gaussian_clusters() X = data$X y = data$class
data = gaussian_clusters() X = data$X y = data$class
Generate clusters of data points from Gaussian distribution with given parameters
gaussian_clusters_with_param(n, center, sigma)
gaussian_clusters_with_param(n, center, sigma)
n |
vector of number of data points in each cluster
The length of |
center |
matrix of centers Ncluster x dim |
sigma |
list of covariance matrices dim X dim. The length of sigma should be equal to the number of clusters. |
matrix of Nsamples x (dim + 1). The last column is cluster labels.
center = rbind(c(0,0), c(1,1)) sigma = list(diag(c(1,1)), diag(2,2)) gaussian_clusters_with_param(c(10, 10), center, sigma)
center = rbind(c(0,0), c(1,1)) sigma = list(diag(c(1,1)), diag(2,2)) gaussian_clusters_with_param(c(10, 10), center, sigma)
Generate clusters of data points from Gaussian-mixture-model distributions with randomly generated parameters
gaussian_mixture_clusters( n = c(50, 50), dim = 2, sd.max = 0.1, sd.noise = 0.01, r.range = c(0.1, 1), mixture.range = c(1, 4), mixture.sep = 0.5 )
gaussian_mixture_clusters( n = c(50, 50), dim = 2, sd.max = 0.1, sd.noise = 0.01, r.range = c(0.1, 1), mixture.range = c(1, 4), mixture.sep = 0.5 )
n |
vector of number of data points in each cluster
The length of |
dim |
number of dimensions |
sd.max |
maximum standard deviation of clusters |
sd.noise |
standard deviation of the added noise |
r.range |
the range (min, max) of distance of cluster centers from the origin |
mixture.range |
range (min, max) of the number of Gaussian-mixtures. |
mixture.sep |
scaler indicating the separability between the mixtures. |
a list of data points (X) and cluster labels (class)
data = gaussian_mixture_clusters() X = data$X y = data$class
data = gaussian_mixture_clusters() X = data$X y = data$class
Generation mechanism for data perturbation consensus clustering
generate_data_prtrb( X, cluster.method = "pam", k = 3, resample.ratio = 0.7, rep = 10, distance.method = "euclidian", adj.conv = TRUE, func )
generate_data_prtrb( X, cluster.method = "pam", k = 3, resample.ratio = 0.7, rep = 10, distance.method = "euclidian", adj.conv = TRUE, func )
X |
input data Nsample x Nfeatures |
cluster.method |
base clustering method: |
k |
number of clusters |
resample.ratio |
the data ratio to use at each itteration. |
rep |
maximum number of itterations at each |
distance.method |
method for distance calculation:
|
adj.conv |
binary value to apply soft thresholding (default= |
func |
user-definrd function required if |
Performs clustering on the purturbed samples set Monti et al. (2003) consensus clustering algorithm
matrix of clusterings Nsample x Nrepeat
X = gaussian_clusters()$X Clusters = generate_data_prtrb(X)
X = gaussian_clusters()$X Clusters = generate_data_prtrb(X)
Generate a set of data points from Gaussian distribution
generate_gaussian_data(n, center = 0, sigma = 1, label = NA)
generate_gaussian_data(n, center = 0, sigma = 1, label = NA)
n |
number of generated data points |
center |
data center of desired dimension |
sigma |
covariance matrix |
label |
cluster label |
Generated data points from Gaussian distribution with given parameters
generate_gaussian_data(10, center=c(0,0), sigma=diag(c(1,1)), label=1)
generate_gaussian_data(10, center=c(0,0), sigma=diag(c(1,1)), label=1)
Multiple method generation
generate_method_prtrb( X, cluster.method = "pam", range.k = c(2, 5), sample.k.method = "random", rep = 10, distance.method = "euclidian", func )
generate_method_prtrb( X, cluster.method = "pam", range.k = c(2, 5), sample.k.method = "random", rep = 10, distance.method = "euclidian", func )
X |
input data Nsample x Nfeatures |
cluster.method |
base clustering method: |
range.k |
vector of minimum and maximum values for k |
sample.k.method |
method for the choice of k at each repeat |
rep |
number of repeats |
distance.method |
method for distance calculation:
|
func |
user-definrd function required if |
At each repeat, k is selected randomly or based on the best silhouette width from a discrete uniform distribution between range.k[1] and range.k[2]. Then clustering is applied and result is returned.
matrix of clusterings Nsample x Nrepeat
X = gaussian_clusters()$X Clusters = generate_method_prtrb(X)
X = gaussian_clusters()$X Clusters = generate_method_prtrb(X)
Multiview generation
generate_multiview( X, cluster.method = "pam", range.k = c(2, 5), sample.k.method = "random", rep = 10, distance.method = "euclidian", sample.set = NA, func )
generate_multiview( X, cluster.method = "pam", range.k = c(2, 5), sample.k.method = "random", rep = 10, distance.method = "euclidian", sample.set = NA, func )
X |
list of input data matrices of Sample x feature or distance matrices.
The length of |
cluster.method |
base clustering method: |
range.k |
vector of minimum and maximum values for k |
sample.k.method |
method for the choice of k at each repeat |
rep |
number of repeats |
distance.method |
method for distance calculation:
|
sample.set |
vector of samples the clustering is being applied on. can be names or indices.
If |
func |
user-definrd function required if |
At each repeat, k is selected randomly or based on the best silhouette width from a discrete uniform distribution between range.k[1] and range.k[2]. Then clustering is applied and result is returned.
matrix of clusterings Nsample x Nrepeat
data = multiview_clusters (n = c(40,40,40), hidden.dim = 2, observed.dim = c(2,2,2), sd.max = .1, sd.noise = 0, hidden.r.range = c(.5,1)) X_observation = data[["observation"]] Clusters = multiview_pam_gen(X_observation)
data = multiview_clusters (n = c(40,40,40), hidden.dim = 2, observed.dim = c(2,2,2), sd.max = .1, sd.noise = 0, hidden.r.range = c(.5,1)) X_observation = data[["observation"]] Clusters = multiview_pam_gen(X_observation)
Hierarchical clustering from adjacency matrix
hir_clust_from_adj_mat( adj.mat, k = 2, alpha = 1, adj.conv = TRUE, method = "ward.D" )
hir_clust_from_adj_mat( adj.mat, k = 2, alpha = 1, adj.conv = TRUE, method = "ward.D" )
adj.mat |
adjacency matrix |
k |
number of clusters (default=2) |
alpha |
soft threshold (considered if |
adj.conv |
binary value to apply soft thresholding (default=TRUE) |
method |
distance method (default: |
apply PAM (k-medoids) clustering on the adjacency matrix
vector of clusters
Adj_mat = rbind(c(0.0,0.9,0.0), c(0.9,0.0,0.2), c(0.0,0.2,0.0)) hir_clust_from_adj_mat(Adj_mat)
Adj_mat = rbind(c(0.0,0.9,0.0), c(0.9,0.0,0.2), c(0.0,0.2,0.0)) hir_clust_from_adj_mat(Adj_mat)
Build indicator matrix
indicator_matrix(clusters)
indicator_matrix(clusters)
clusters |
a vector of clusterings. Zero elements mean that the sample was absent during clustering |
Indicator matrix (I) is a binary matrix N-by-N I[i,j] = 1 if sample i and j co-exist for clustering ref: Monti et al. (2003) "Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data", Machine Learning
Indicator matrix
ind_mat = indicator_matrix(c(1,1,1,0,0,1))
ind_mat = indicator_matrix(c(1,1,1,0,0,1))
Similarity between different clusters
label_similarity(x1, x2)
label_similarity(x1, x2)
x1 |
clustering vector 1 Zero elements are are considered as unclustered samples |
x2 |
clustering vector 2 Zero elements are are considered as unclustered samples |
When performing several clustering, the cluster labels may not match with each other. To find correspondences between clusters, the similarity between different labels will be calculated.
matrix of similarities between clustering labels
X = gaussian_clusters()$X x1 = kmeans(X, 5)$cluster x2 = kmeans(X, 5)$cluster Sim = label_similarity(x1, x2)
X = gaussian_clusters()$X x1 = kmeans(X, 5)$cluster x2 = kmeans(X, 5)$cluster Sim = label_similarity(x1, x2)
Logit function
Logit(x)
Logit(x)
x |
numerical scaler input |
Logit(x) = log(1*x/(1-x))
y = Logit(0.5)
y = Logit(0.5)
Consensus mechanism based on majority voting
majority_voting(X)
majority_voting(X)
X |
clustering matrix of Nsamples x Nclusterings. Zero elements are are considered as unclustered samples |
Perform majority voting as a consensus mechanism.
the vector of consensus clustering result
X = gaussian_clusters()$X x1 = kmeans(X, 5)$cluster x2 = kmeans(X, 5)$cluster x3 = kmeans(X, 5)$cluster clusters = majority_voting(cbind(x1,x2,x3))
X = gaussian_clusters()$X x1 = kmeans(X, 5)$cluster x2 = kmeans(X, 5)$cluster x3 = kmeans(X, 5)$cluster clusters = majority_voting(cbind(x1,x2,x3))
Multiple cluster generation
multi_cluster_gen(X, func, rep = 10, param, method = "random")
multi_cluster_gen(X, func, rep = 10, param, method = "random")
X |
input data Nsample x Nfeatures or a distance matrix |
func |
custom function that accepts |
rep |
number of repeats |
param |
vector of parameters |
method |
method for the choice of k at each repeat |
At each repeat, k is selected randomly or based on the best silhouette width from a discrete uniform distribution between range.k[1] and range.k[2]. Then clustering is applied and result is returned.
matrix of clusterings Nsample x Nrepeat
X = gaussian_clusters()$X cluster_func = function(X, k){return(stats::kmeans(X, k)$cluster)} Clusters = multi_cluster_gen(X, cluster_func, param = c(2,3))
X = gaussian_clusters()$X cluster_func = function(X, k){return(stats::kmeans(X, k)$cluster)} Clusters = multi_cluster_gen(X, cluster_func, param = c(2,3))
Multiple K-means generation
multi_kmeans_gen(X, rep = 10, range.k = c(2, 5), method = "random")
multi_kmeans_gen(X, rep = 10, range.k = c(2, 5), method = "random")
X |
input data Nsample x Nfeatures |
rep |
number of repeats |
range.k |
vector of minimum and maximum values for k |
method |
method for the choice of k at each repeat |
At each repeat, k is selected randomly or based on the best silhouette width from a discrete uniform distribution between range.k[1] and range.k[2]. Then k-means clustering is applied and result is returned.
matrix of clusterings Nsample x Nrepeat
X = gaussian_clusters()$X Clusters = multi_kmeans_gen(X)
X = gaussian_clusters()$X Clusters = multi_kmeans_gen(X)
Multiple PAM (K-medoids) generation
multi_pam_gen( X, rep = 10, range.k = c(2, 5), is.distance = FALSE, method = "random" )
multi_pam_gen( X, rep = 10, range.k = c(2, 5), is.distance = FALSE, method = "random" )
X |
input data Nsample x Nfeatures or distance matrix. |
rep |
number of repeats |
range.k |
vector of minimum and maximum values for k |
is.distance |
binary balue indicating if the input |
method |
method for the choice of k at each repeat |
At each repeat, k is selected randomly or based on the best silhouette width from a discrete uniform distribution between range.k[1] and range.k[2]. Then PAM clustering is applied and result is returned.
matrix of clusterings Nsample x Nrepeat
X = gaussian_clusters()$X Clusters = multi_pam_gen(X)
X = gaussian_clusters()$X Clusters = multi_pam_gen(X)
Multiview cluster generation
multiview_cluster_gen( X, func, rep = 10, param, is.distance = FALSE, sample.set = NA )
multiview_cluster_gen( X, func, rep = 10, param, is.distance = FALSE, sample.set = NA )
X |
List of input data matrices of Sample x feature or distance matrices.
The length of |
func |
custom function that accepts |
rep |
number of repeats |
param |
vector of parameters |
is.distance |
binary balue indicating if the input |
sample.set |
vector of samples the clustering is being applied on. can be names or indices.
if |
At each repeat, k is selected randomly or based on the best silhouette width from a discrete uniform distribution between range.k[1] and range.k[2]. Then clustering is applied and result is returned.
matrix of clusterings Nsample x (Nrepeat x Nviews)
data = multiview_clusters (n = c(40,40,40), hidden.dim = 2, observed.dim = c(2,2,2), sd.max = .1, sd.noise = 0, hidden.r.range = c(.5,1)) X_observation = data[["observation"]] cluster_func = function(X,rep,param){return(multi_kmeans_gen(X,rep=rep,range.k=param))} Clusters = multiview_cluster_gen(X_observation, func = cluster_func, rep = 10, param = c(2,4))
data = multiview_clusters (n = c(40,40,40), hidden.dim = 2, observed.dim = c(2,2,2), sd.max = .1, sd.noise = 0, hidden.r.range = c(.5,1)) X_observation = data[["observation"]] cluster_func = function(X,rep,param){return(multi_kmeans_gen(X,rep=rep,range.k=param))} Clusters = multiview_cluster_gen(X_observation, func = cluster_func, rep = 10, param = c(2,4))
Generate multiview clusters from Gaussian distributions with randomly generated parameters
multiview_clusters( n = c(50, 50), hidden.dim = 2, observed.dim = c(2, 2, 3), sd.max = 0.1, sd.noise = 0.01, hidden.r.range = c(0.1, 1) )
multiview_clusters( n = c(50, 50), hidden.dim = 2, observed.dim = c(2, 2, 3), sd.max = 0.1, sd.noise = 0.01, hidden.r.range = c(0.1, 1) )
n |
vector of number of data points in each cluster
The length of |
scaler value of dimensions of the hidden state |
|
observed.dim |
vector of number of dimensions of the generate clusters.
The length of |
sd.max |
maximum standard deviation of clusters |
sd.noise |
standard deviation of the added noise |
the range (min, max) of distance of cluster centers from the origin in the hidden space. |
a list of data points (X) and cluster labels (class)
data = multiview_clusters()
data = multiview_clusters()
Multiview K-means generation
multiview_kmeans_gen(X, rep = 10, range.k = c(2, 5), method = "random")
multiview_kmeans_gen(X, rep = 10, range.k = c(2, 5), method = "random")
X |
List of input data matrices of Sample x feature. The length of |
rep |
number of repeats |
range.k |
vector of minimum and maximum values for k |
method |
method for the choice of k at each repeat |
At each repeat, k is selected randomly or based on the best silhouette width from a discrete uniform distribution between range.k[1] and range.k[2]. Then k-means clustering is applied and result is returned.
matrix of clusterings Nsample x (Nrepeat x Nviews)
data = multiview_clusters (n = c(40,40,40), hidden.dim = 2, observed.dim = c(2,2,2), sd.max = .1, sd.noise = 0, hidden.r.range = c(.5,1)) X_observation = data[["observation"]] Clusters = multiview_kmeans_gen(X_observation)
data = multiview_clusters (n = c(40,40,40), hidden.dim = 2, observed.dim = c(2,2,2), sd.max = .1, sd.noise = 0, hidden.r.range = c(.5,1)) X_observation = data[["observation"]] Clusters = multiview_kmeans_gen(X_observation)
Multiview PAM (K-medoids) generation
multiview_pam_gen( X, rep = 10, range.k = c(2, 5), is.distance = FALSE, method = "random", sample.set = NA )
multiview_pam_gen( X, rep = 10, range.k = c(2, 5), is.distance = FALSE, method = "random", sample.set = NA )
X |
List of input data matrices of Sample x feature or distance matrices.
The length of |
rep |
number of repeats |
range.k |
vector of minimum and maximum values for k |
is.distance |
binary balue indicating if the input |
method |
method for the choice of k at each repeat |
sample.set |
vector of samples the clustering is being applied on. can be names or indices.
if |
At each repeat, k is selected randomly or based on the best silhouette width from a discrete uniform distribution between range.k[1] and range.k[2]. Then PAM clustering is applied and result is returned.
matrix of clusterings Nsample x (Nrepeat x Nviews)
data = multiview_clusters (n = c(40,40,40), hidden.dim = 2, observed.dim = c(2,2,2), sd.max = .1, sd.noise = 0, hidden.r.range = c(.5,1)) X_observation = data[["observation"]] Clusters = multiview_pam_gen(X_observation)
data = multiview_clusters (n = c(40,40,40), hidden.dim = 2, observed.dim = c(2,2,2), sd.max = .1, sd.noise = 0, hidden.r.range = c(.5,1)) X_observation = data[["observation"]] Clusters = multiview_pam_gen(X_observation)
PAM (k-medoids) clustering from adjacency matrix
pam_clust_from_adj_mat(adj.mat, k = 2, alpha = 1, adj.conv = TRUE)
pam_clust_from_adj_mat(adj.mat, k = 2, alpha = 1, adj.conv = TRUE)
adj.mat |
adjacency matrix |
k |
number of clusters (default=2) |
alpha |
soft threshold (considered if |
adj.conv |
binary value to apply soft thresholding (default=TRUE) |
apply PAM (k-medoids) clustering on the adjacency matrix
vector of clusters
Adj_mat = rbind(c(0.0,0.9,0.0), c(0.9,0.0,0.2), c(0.0,0.2,0.0)) pam_clust_from_adj_mat(Adj_mat)
Adj_mat = rbind(c(0.0,0.9,0.0), c(0.9,0.0,0.2), c(0.0,0.2,0.0)) pam_clust_from_adj_mat(Adj_mat)
Spectral clustering from adjacency matrix
spect_clust_from_adj_mat( adj.mat, k = 2, max.eig = 10, alpha = 1, adj.conv = TRUE, do.plot = FALSE )
spect_clust_from_adj_mat( adj.mat, k = 2, max.eig = 10, alpha = 1, adj.conv = TRUE, do.plot = FALSE )
adj.mat |
adjacency matrix |
k |
number of clusters (default=2) |
max.eig |
maximum number of eigenvectors in use (dafaut = 10). |
alpha |
soft threshold (considered if |
adj.conv |
binary value to apply soft thresholding (default = |
do.plot |
binary value to do plot (dafaut = |
apply PAM (k-medoids) clustering on the adjacency matrix
vector of clusters
Adj_mat = rbind(c(0.0,0.9,0.0), c(0.9,0.0,0.2), c(0.0,0.2,0.0)) hir_clust_from_adj_mat(Adj_mat)
Adj_mat = rbind(c(0.0,0.9,0.0), c(0.9,0.0,0.2), c(0.0,0.2,0.0)) hir_clust_from_adj_mat(Adj_mat)