In a test example from an ocean time series sampling program, data preprocessing identified several outliers which upon re-examination were found to be because of systematic errors. Clustering analysis of the ARISA from different times based on the dynamic programming algorithm binned data revealed important features of the biodiversity of the microbial communities.
A number of community profiling approaches have been widely used to study the microbial community composition and its variations in environmental ecology. Automated Ribosomal Intergenic Spacer Analysis (ARISA) is one such technique. ARISA has been used to study microbial communities using 16S-23S rRNA intergenic spacer length heterogeneity at different times and places. Owing to errors in sampling, random mutations in PCR amplification, and probably mostly variations in readings from the equipment used to analyze fragment sizes, the data read directly from the fragment analyzer should not be used for down stream statistical analysis. No optimal data preprocessing methods are available. A commonly used approach is to bin the reading lengths of the 16S-23S intergenic spacer. We have developed a dynamic programming algorithm based binning method for ARISA data analysis which minimizes the overall differences between replicates from the same sampling location and time.