Apply filter to remove genes with consistently low intensity values or low variance across the samples. This technique is especially useful in the case of genome-wide arrays as often a minor of all genes are expressed at all in the cell type under consideration.
Inter-quartile range provides a measure of the spread of the middle 50% of the intensity scores for each gene. The IQR is defined as the 75th percentile - the 25th percentile. The interquartile range plays an important role in the graphical method known as the boxplot. The advantage of using the IQR is that it is easy to compute and extreme scores in the distribution have much less impact but its strength is also a weakness in that it suffers as a measure of variability because it discards too much data. The basic idea is to study variability while eliminating scores that are likely to be accidents. The boxplot allows for this for this distinction and is an important tool for exploring data.
Typically the IQR filtering is applied with the option to cut off genes for which their intensity scores have inter-quartile range of less than 0.5 on the log base 2 scale.
A quite useful way is to apply both IQR filter coupled with the filter for cutting off genes with consistently low intensity values.
Gene selection by IQR seems to lead to a higher concentration of differentially expressed genes, whereas for the intensity-based criterion, the effect is less pronounced
- Gentleman, R.; Carey, V.; Huber, W.; Irizarry, R.; Dudoit, S. (Eds.) (2005),Bioinformatics and Computational Biology Solutions Using R and Bioconductor . Springer Publications pp 232-233