KarL05/Aiyiyi's Blog
Generalized Suffix Automaton Study Notes

An OIer who don't know suffix automaton, is like a gamer that doesn't play Genshin Impact. - Said by me

Suffix Automaton (SAM) is the ultimate algorithm for strings in Olympiad of Informatics. But its nonetheless useless because of obvious reasons. I personally prefers suffix array but SAM is too powerful so why not have a try?

This blog entry is very un-detailed because its for personal use. Please refer to https://oi-wiki.org/string/sam/ for a detailed analysis of SAM.


Brief

There are two automatons in Olympiad of Informatics.

  1. ACAM
  2. SAM

I think in nature, the -AMs store the information of the strings in a compressed Directed Acyclic Graph, while having another Tree to show the connection between each substring.

In other words, you can either choose the Directed Acyclic Graph or the Tree to solve the problem. The latter is often better.


Notations

\(\text{len}(s)\) length of string s.

\(\delta(v,s)\) transition from node \(v\) by a path of string \(s\).

\(\text{endpos}(s)\) The set of end points in the original string that has string \(s\) as a substring.

\(\text{parent}(v)\) parent of node \(v\) in the tree

\(\text{max}(v)\) the longest substring represented by the node \(v\).

\(\text{min}(v)\) the shortest substring represented by the node \(v\).


DAG

SAM stores the information about the suffix strings using a similar structure as the Trie tree. However, "useless" nodes are merged so that it becomes a DAG.

SAM's DAG contains all substrings of the original string. Represented by a path from the root node \(\varphi\).

Denote the transition from one node to another node in the DAG as \(\delta(v,c)\) where \(v\) is a node and \(c\) is a character. This also means to add a character at the back of the strings at node \(v\).

Yes, one node can represent multiple strings in SAM which is very special. This is very likely because SAM is highly compressed with memory complexity of \(O(N)\).

But what rule do we merge two nodes? We compare the \(\text{endpos}(u)\) and \(\text{endpos}(v)\). If they are the same, they are merged.


Tree

\(\text{parent}(v) = f\) if and only if

\[\text{endpos}(\max(v)) \subsetneq \text{endpos}(\max(f))\]

or if and only if

\[\text{len}(\min (v)) = \text{len}(\max(f))+1\]

And the \(\text{parent}(v)\) function forms a tree.


Complexity

Completely Linear.


Generalised

How about multiple strings? A incorrect way is to connect the multiple strings with a #. But the time complexity could be very bad.

Here is a Method by 刘研绎,2015《后缀自动机在字典树上的拓展》

Steps:

  1. Build a trie tree

  2. BFS the root of the trie tree and record the parent node for each node

  3. Build SAM using the BFS order.


Code

Accepted on Luogu

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
#include"bits/stdc++.h"
#define int long long
using namespace std;

const int maxn = 2e6+5;
int a[maxn],c[maxn],sz[maxn];
int ans,n;
string s;

struct SAM {
int lst, pter;
int ch[maxn*2][26];
int fa[maxn*2];
int l[maxn*2];
void ins (int c) {
int p = lst;int np = ++pter;
lst = np;l[np] = l[p]+1;
while (p&&!ch[p][c]) {
ch[p][c] = np;
p = fa[p];
}
sz[np] = 1;
if (!p) {
fa[np] = 1;
return;
}
int q = ch[p][c];
if (l[p]+1==l[q]) {
fa[np] = q;
return;
}
int nq = ++pter;
l[nq] = l[p]+1;
memcpy(ch[nq],ch[q],sizeof(ch[q]));
fa[nq] = fa[q];
fa[q] = fa[np] = nq;
while (ch[p][c]==q) {
ch[p][c] = nq;
p = fa[p];
}
}
void build () {
cin>>s;
int len = s.size();
s = "#"+s;
lst = pter = 1;
for (int i=1;i<=len;i++) ins(s[i]-'a');
}
void calc () {
for (int i=1;i<=pter;i++) c[l[i]]++;
for (int i=1;i<=pter;i++) c[i] += c[i-1];
for (int i=1;i<=pter;i++) a[c[l[i]]--] = i;
for (int i=pter;i>0;i--){
int p = a[i];
sz[fa[p]] += sz[p];
if (sz[p]>1) ans = max(ans,sz[p]*l[p]);
}
cout<<ans<<endl;
}
};
SAM sam;

signed main () {
sam.build();
sam.calc();
}


Reference

https://oi-wiki.org/string/general-sam/

https://oi-wiki.org/string/sam/