.NET Reference Guide | Aho-Corasick String Searching | InformIT
Aho-Corasick String Searching
Last updated Nov 6, 2009.
If you want to search for multiple strings in a large amount of text, the algorithm of choice is the Aho-Corasick algorithm, published by Alfred V. Aho and Margaret J. Corasick in 1975. It’s a kind of dictionary matching algorithm that makes a single pass through the input text, matching all occurrences of all strings in that single pass. There is some small up-front cost for preprocessing the strings being searched for, but that’s easily offset by the algorithm’s reduced time complexity. Whereas the brute force algorithm has N * M complexity, Aho-Corasick is linear in the length of the patterns plus the length of the text to be searched, plus the number of output matches.