An OIer who don't know suffix automaton, is like a gamer that doesn't play Genshin Impact. - Said by me
Suffix Automaton (SAM) is the ultimate algorithm for strings in Olympiad of Informatics. But its nonetheless useless because of obvious reasons. I personally prefers suffix array but SAM is too powerful so why not have a try?
This blog entry is very un-detailed because its for personal use. Please refer to https://oi-wiki.org/string/sam/ for a detailed analysis of SAM.
Brief
There are two automatons in Olympiad of Informatics.
- ACAM
- SAM
I think in nature, the -AMs store the information of the strings in a
compressed Directed Acyclic Graph
, while having another
Tree
to show the connection between each substring.
In other words, you can either choose the
Directed Acyclic Graph
or the Tree
to solve
the problem. The latter is often better.
Notations
\(\text{len}(s)\) length of
string
s.
\(\delta(v,s)\) transition from
node
\(v\) by a path of
string
\(s\).
\(\text{endpos}(s)\) The
set
of end points in the original string that has
string
\(s\) as a
substring.
\(\text{parent}(v)\) parent of
node
\(v\) in the tree
\(\text{max}(v)\) the longest
substring represented by the node
\(v\).
\(\text{min}(v)\) the shortest
substring represented by the node
\(v\).
DAG
SAM
stores the information about the suffix strings
using a similar structure as the Trie
tree. However,
"useless" nodes are merged so that it becomes a DAG
.
SAM
's DAG
contains all substrings of the
original string. Represented by a path from the root node \(\varphi\).
Denote the transition from one node to another node in the
DAG
as \(\delta(v,c)\)
where \(v\) is a node and \(c\) is a character. This also means to add
a character at the back of the strings at node \(v\).
Yes, one node can represent multiple strings in SAM
which is very special. This is very likely because SAM
is
highly compressed with memory complexity of \(O(N)\).
But what rule do we merge two nodes? We compare the \(\text{endpos}(u)\) and \(\text{endpos}(v)\). If they are the same, they are merged.
Tree
\(\text{parent}(v) = f\) if and only if
\[\text{endpos}(\max(v)) \subsetneq \text{endpos}(\max(f))\]
or if and only if
\[\text{len}(\min (v)) = \text{len}(\max(f))+1\]
And the \(\text{parent}(v)\) function forms a tree.
Complexity
Completely Linear.
Generalised
How about multiple strings? A incorrect way is to connect the
multiple strings with a #
. But the time complexity could be
very bad.
Here is a Method by 刘研绎,2015《后缀自动机在字典树上的拓展》
Steps:
Build a
trie
treeBFS
the root of thetrie
tree and record the parent node for each nodeBuild
SAM
using theBFS
order.
Code
Accepted on Luogu 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
using namespace std;
const int maxn = 2e6+5;
int a[maxn],c[maxn],sz[maxn];
int ans,n;
string s;
struct SAM {
int lst, pter;
int ch[maxn*2][26];
int fa[maxn*2];
int l[maxn*2];
void ins (int c) {
int p = lst;int np = ++pter;
lst = np;l[np] = l[p]+1;
while (p&&!ch[p][c]) {
ch[p][c] = np;
p = fa[p];
}
sz[np] = 1;
if (!p) {
fa[np] = 1;
return;
}
int q = ch[p][c];
if (l[p]+1==l[q]) {
fa[np] = q;
return;
}
int nq = ++pter;
l[nq] = l[p]+1;
memcpy(ch[nq],ch[q],sizeof(ch[q]));
fa[nq] = fa[q];
fa[q] = fa[np] = nq;
while (ch[p][c]==q) {
ch[p][c] = nq;
p = fa[p];
}
}
void build () {
cin>>s;
int len = s.size();
s = "#"+s;
lst = pter = 1;
for (int i=1;i<=len;i++) ins(s[i]-'a');
}
void calc () {
for (int i=1;i<=pter;i++) c[l[i]]++;
for (int i=1;i<=pter;i++) c[i] += c[i-1];
for (int i=1;i<=pter;i++) a[c[l[i]]--] = i;
for (int i=pter;i>0;i--){
int p = a[i];
sz[fa[p]] += sz[p];
if (sz[p]>1) ans = max(ans,sz[p]*l[p]);
}
cout<<ans<<endl;
}
};
SAM sam;
signed main () {
sam.build();
sam.calc();
}
Reference
https://oi-wiki.org/string/general-sam/
https://oi-wiki.org/string/sam/