In QDMR, the threshold for DMRs was determined from the methylation probability model.
To model the effect of experimental variability, we simulated distribution of entropy from uniformly methylated regions. We computed the fold change between replicate-dependent difference from the average level across replicates and the theoretical maximum range of methylation. The fold change follows a normal distribution with mean equal to zero and some unknown, but 'small', standard deviation (SD). Therefore, the experimental variability will be estimated by appropriate methylation levels.
To model a uniformly methylated region, we assume that a region exhibits an average methylation level across all samples and then allow the methylation levels in individual samples to follow a narrow distribution of random fold changes from the mean level. The entropy in current work is independent of the average methylation across all samples because it is derived from the processed methylation value. Therefore, the biological variability modeled in this approach exhibited the average methylation level across all samples. Tthe fold change between sample-dependent difference from the average level and the theoretical maximum range of methylation was defined as (m-mean)/(MAX-MIN). It was assumed in this study that the fold change follows a normal distribution with mean equal to zero and some unknown, but ‘small’ SD. Thus, SD can be used to indicate the degree of the biological variation.
If SD equals to zero, the methylation levels in all samples will be the same, and equal to the . The larger the SD is, the greater the methylation difference across multiple samples is. Setting SD=0.07 means a relatively small amount of variation with methylation levels between 43 and 57 in 68% of the samples, between 36 and 64 in 95% of the samples, between 29 and 71 in 99% of the samples.
Take the determination of DMR threshold for 16 samples as an example. In total 80 000 (5000 rows and 16 columns) random values were generated from the normal distribution model with mean=0 and SD=0.07. And 5000 uniformly methylated regions across 16 samples were modeled. Then entropy for each of these regions was calculated. The entropy value at p = 0.05 (one-sided) from the distribution of 5000 entropies, which was normal, was determined as a threshold. This process was repeated 10 times, and therefore 10 thresholds with mean (SD) equals to 5.326 (0.022) were produced. This mean was determined as the threshold for DMR identification. Regions with entropy that is lower than are defined as DMRs while remaining regions are not differentially methylated regions (N-DMRs).
With this method, the thresholds were produced for samples that vary in number from 2 to 100 (list below).
|