Optimal choice of word length when comparing two Markov sequences using a chi-square statistic.


-statistic has been suggested to compare two sequences. However, it is not known how to best choose the word length k in such studies.

Our studies provide guidelines on choosing the optimal word length for the comparison of Markov sequences.

)+1 for both long sequences and next generation sequencing (NGS) read data. The orders of the Markov chains may be unknown and several methods have been developed to estimate the orders of Markov chains based on both long sequences and NGS reads. We study the power loss of the statistics when the estimated orders are used. It is shown that the power loss is minimal for some of the estimators of the orders of Markov chains.

