Basic Graph Algorithms
参考:
- Chapter 5: Basic Graph Algorithms
- Chapter 6: Depth-First Search
- Wikipedia: Glossary of graph theory
- CS 137 - Graph Theory - Lecture 2
- COMPSCI 330: Design and Analysis of Algorithms - Lecture #5
- DFS Edge Classification
- Lecture 8: DFS and Topological Sort
- Lecture 21: Shortest Paths
- Lecture 20: Minimum Spanning Trees
- Math 575 - Problem Set 9
1. DefinitionPermalink
1.1 Simple GraphPermalink
Definition: Graphs without loops and parallel edges are often called simple graphs.
- loop: an edge from a vertex to itself
- parallel edges: multiple edges with the same endpoints
- 注意 endpoint 表示的是 端点,并不一定是终点 (ending point),也可以是起点
Non-simple graphs are sometimes called multigraphs.
Despite the formal definitional gap, most algorithms for simple graphs extend to non-simple graphs with little or no modification.
The definition of a graph as a pair of sets,
We may (informally) use
In this note we use the following terms:
- “Graph”, which generally means an undirected graph
- “Digraph”, which is used for directed graphs specifically
1.2 Neighbor and DegreePermalink
In this note we use the following notations for edges:
can be a directed or undirected edge represents a directed edge only
Definition: If there is an edge
- For a undirected graph:
==of
’s neighbors
- For a digraph:
- If there is an edge
, then is a predecessor of is a successor of
==of
’s predecessors ==of
’s successors
- If there is an edge
注意 predecessor 这个词并没有对应的动词。
1.3 SubgraphPermalink
Definition: A graph
1.4 Walk, Path, and CyclePermalink
Definition: A walk is a sequence of vertices,
and are the endpoints of the walk are internal vertices- The length of a walk is the number of edges, thus
.
A walk can consist of only a single node, with length
Definition: A walk is closed if
Definition: A path is a walk that does not repeat vertices.
- So by definition, a path must be an open walk.
Definition: A cycle is a closed walk that does not repeat vertices except the endpoints.
- A cycle has at least 1 edge; it is not allowed to have a length of
.
Cycles of
is a loop is a digon (a pair of parallel undirected edges in a multigraph, or a pair of antiparallel edges in a digraph) is a triangle.
1.5 Connectivity and ComponentPermalink
Definition: A graph
Definition: A graph
- By “maximal”, it means you cannot add any element to
of the subgraph while keeping it connected. - So the term “connected components” is actually redundant; components are connected by definition.
- Two vertices are in the same component
there is a path between them.
Practically, the definition of component is only suitable for undirected graphs.
Definition: A digraph is strongly connected if
1.6 AcyclicityPermalink
Definition: An undirected graph is acyclic if it contains no cycles, i.e. amongst all its subgraphs, none is a cycle. Undirected acyclic graphs are also called forests.
注意 forest 不一定就是 disconnected 的,有可能 disconnected 有可能 connected
Definition: A tree is a undirected, connected acyclic graph, or equivalently, one component of a forest.
Definition: A spanning tree of a graph
Lemma: A graph has a spanning tree
Definition: A spanning forest of
Definition: A digraph is acyclic if it does not contain a directed cycle. Directed acyclic graphs are often called DAGs.
Any vertex in a DAG that has no incoming vertices is called a source; any vertex with no outgoing edges is called a sink. Every DAG has at least one source and one sink, but may have more than one of each. For example, in the graph with only vertices but no edges, every vertex is a source and every vertex is a sink.
2. Traversing Connected Graphs / DFS / BFSPermalink
To keep things simple, we’ll consider only undirected graphs here, although the algorithms also work for directed graphs with minimal changes.
The simplest graph-traversal algorithm is DFS, depth-first search. This algorithm can be written either recursively or iteratively. It’s exactly the same algorithm either way; the only difference is that we can actually see the “recursion” stack in the non-recursive version.
# v: a source vertex
def RecursiveDFS(v):
if v is unmarked:
mark v
for each edge {v,w}:
RecursiveDFS(w)
# s: a source vertex
def IterativeDFS(s):
PUSH(s)
while the stack is not empty:
v <- POP
if v is unmarked:
mark v
for each edge {v,w}:
PUSH(w)
N.B. IterativeDFS
最后一个 for
可以稍微改良一下:
# s: a source vertex
def IterativeDFS(s):
PUSH(s)
while the stack is not empty:
v <- POP
if v is unmarked:
mark v
for each edge {v,w}:
if w is not visited:
PUSH(w)
The generic graph traversal algorithm stores a set of candidate edges in some data structure that I’ll call a “bag”. The only important properties of a bag are that we can put stuff into it and then later take stuff back out. A stack is a particular type of bag, but certainly not the only one.
N.B. If you replace the stack above with a queue, you will get BFS, breadth-first search. And the breadth-first spanning tree formed by the parent edges contains shortest paths from the start vertex s
to every other vertex in its connected component.
另外一个需要注意的是,还有一种改良方法是:stack 存储 (v, parent(v))
。这么做并不是为了提升效率,而是为了方便遍历后迅速定位 path.
# s: a source vertex
def Traverse(s):
put (None, s) in bag # source vertex 的 parent 是空集
while the bag is not empty:
take (p, v) from the bag
if v is unmarked:
mark v
parent(v) <- p
for each edge {v,w}:
put (v, w) into the bag
For any node v
, the path of parent edges (v, parent(v), parent(parent(v)), . . .)
eventually leads back to s
.
If Traverse(s)
only visits the nodes in the connected component of the start vertex s
. If we want to visit all the nodes in every component, run Traverse
on every node. (Let’s call it TraverseAll
.) Since Traverse
computes a spanning tree of one component, TraverseAll
computes a spanning forest of
3. BFS/DFS 与 Spanning TreePermalink
首先补充两个概念。Graph 里我们说 predecessor 和 successor,Tree 里面我们用 ancestor 和 descendant;不仅如此,我们还有 proper ancestor 和 proper descendant.
Suppose
Definition: If
is an ancestor of could be the ancestor of itself
is an descendant of could be the descendant of itself.
is a proper ancestor of if is an ancestor of and is not itself. is a proper descendant of if is an descendant of and is not itself.
Both DFS and BFS can generate spanning trees from graphs.
Suppose RecursiveDFS(s)
where s
is the root of the
Lemma 1. For any node v
, the vertices that are marked during the execution of RecursiveDFS(v)
are the proper descendants of v
in
Lemma 2. For every edge {v,w}
in v
is an ancestor of w
in v
is a descendant of w
in
Proof: Assume without loss of generality that v
is marked before w
. Then w
is unmarked when RecursiveDFS(v)
is invoked, but marked when RecursiveDFS(v)
returns, so the previous lemma implies that w
is a proper descendant of v
in
4. Preorder and Postorder LabelingPermalink
def PrePostLabel(G):
for all vertices v:
unmark v
clock <- 0
for all vertices v:
if v is unmarked:
clock <- LabelComponent(v, clock)
def LabelComponent(v, clock):
mark v
prev(v) <- clock
clock <- clock + 1
for each edge {v,w}:
if w is unmarked:
clock <- LabelComponent(w, clock)
post(v) <- clock
clock <- clock + 1
return clock
举个例子,假设有个 triangle abc
,按 a → b → c
执行的情况应该是这样的:
mark a √
prev(a) = 0;
mark b √
prev(b) = 1;
mark c √
prev(c) = 2;
post(c) = 3;
post(b) = 4;
post(a) = 5;
return 6;
a | b | c | |
---|---|---|---|
prev | 0 | 1 | 2 |
post | 5 | 4 | 3 |
其实这里我觉得把 prev(a)
看做 “start time of accessing a” 或者 “time before accessing a”、把 post(a)
看做 “finish time of accessing a
” 或者 “time after accessing a
” 更好理解一点。
Consider prev(a) < prev(b)
. Moreover, Lemma 1 implies that if post(a) > post(b)
, and otherwise (“otherwise” 并不是是指 “prev(b) > post(a)
(此时 prev(b)
的赋值必定在 post(a)
之后)。
Thus, for any two vertices [prev(a), post(a)]
and [prev(b), post(b)]
are either disjoint or nested; in particular, if edge
5. Acyclicity in Directed GraphsPermalink
Lemma 2 implies that any depth-first spanning tree
-
tree edges, which appear in
, and -
back(ward) edges, which connect some node in
to one of its ancestors.
其实这里可以进一步细分。考虑到
We call an edge
- if
is an edge in , a tree edge.- 意思即是
中所有的 edge 都叫 的 tree edge
- 意思即是
- if
is an edge in :- a forward edge if
is an ancestor of in- 注意 ancestor 是在
中判断的,所以简单说就是在 是 且在 中有
- 注意 ancestor 是在
- a back(ward) edge if
is an descendant of in- 注意 descendant 是在
中判断的,所以简单说就是在 是 但在 中是 - 这样就发现了一个 cycle
- 注意 descendant 是在
- a cross edge otherwise
- 简单说就是在
是 但在 中 既不是 的 ancestor 也不是 descendant
- 简单说就是在
- a forward edge if
我觉得还是举个例子说明下比较好:
- 右图的实线是
,所以实线的 edge 都是 tree edge - 存在
属于 但不属于 ,且在 中 是 的 ancestor,所以 是 forward edge - 存在
属于 但不属于 ,且在 中 是 的 descendant,所以 是 backward edge - 存在
属于 但不属于 ,且在 中 既不是 的 ancestor 也不是 descendant,所以 是 cross edge
edge type of |
relationship of |
prev / post order |
---|---|---|
tree / forward | prev(a) < prev(b) < post(b) < post(a) | |
back(ward) | prev(b) < prev(a) < post(a) < post(b) | |
cross | prev(a) < post(a) < prev(b) < post(b), 假设 |
obs: There is a backward edge in
除了这个算法外,书上还提供了一个在 DFS
的过程中 detect cycle 的算法:
def IsAcyclic(G):
add a source vertex s
for all vertices v ≠ s:
add edge s → v
status(v) <- New
return IsAcyclicDFS(s)
def IsAcyclicDFS(v):
status(v) <- Active
for each edge v → w:
if status(w) == Active:
return False
else if status(w) == New:
if IsAcyclicDFS(w) == False
return False
status(v) <- Done
return True
6. Topological SortPermalink
A topological ordering of a digraph
A topological ordering is clearly impossible if the graph
Idea: Run DFS
on every vertex
Correctness of the Idea: By Lemma 2, for every edge
Improvement: However actually you don’t need to sort. 我们看下 DFSAll
的写法:
def DFS(v):
mark v
prev(v) <- clock++
for each edge v → w:
if w is unmarked:
parent(w) <- v
DFS(w)
post(v) <- clock++
def DFSAll(G):
clock <- 0
for each vertex v:
if v is not marked
DFS(v)
从最后的结果来看,finishing time, i.e. for each vertex v
的遍历顺序而改变。也就是说,DFSALL(G)
和 DFS(source)
从结果上来看本质是一样的,而且都是
考虑个简单的图 a → b → c
:
- 假设
DFSAll(G)
的顺序是c
、 、 ,得到的结果是post(c)=1
、post(b)=3
、post(a)=5
- 假设
DFSAll(G)
的顺序是b
(所以包括了c
)、a
,得到的结果是post(c)=2
、post(b)=3
、post(a)=5
- 假设
DFSAll(G)
的顺序是a
(所以包括了b
和c
),得到的结果是post(c)=3
、post(b)=4
、post(a)=5
所以你其实只用记录下
RT: 所以 Topological Sort 的时间复杂度和 DFSAll
是一样的,都是
7. Strongly Connected Components (SCC)Permalink
In a digraph
In a digraph
If
-
Reflexivity:
, i.e. is strongly conneted to itself. -
Transitivity: if
and , then . -
Symmetry: if
, then .
In a digraph
- 注意这里 maximal 并不是 SCC 的特殊要求,而是因为 component 本身就是 maximal subgraph
- Undirected graph 的 component 在平面上很好认,一定是一个独立且完整的 subgraph,比如一整段的 path 或是一整个 cycle
- SCC 则不同,独立的一整段 path 或是一个 cycle 可能只有一部分是 strongly connected 的,但是这一部分也符合 SCC 的定义,也可以是 SCC
We can see SCCs as equivalence classes of
For any digraph
Let
So we can come up with an idea to find all SCC in
- find a sink SCC,
of , and remove it from ; - find a sink SCC,
of , and remove it; - …
- till
is empty.
所以问题转换为 “to find a sink SCC”。又因为,只要找到了一个 sink SCC 的 vertex,就能 DFS
得到 sink SCC,所以问题进一步转换为 “to find a vertex in a sink SCC”。
- Digress: How to find a SCC given a vertex
in graph , i.e. to find a SCC of containing ?-
DFS(v,G)
-
DFS(v,rev(G))
- Then
-
Lemma 4. The vertex with largest finishing time (i.e. post(x)
) lies in a source SCC of G.
Lemma 5. For any edge post(a) < post(b)
, then
It is easy to check that any directed
Actually you even don’t have to repeat the “find-remove” procedure. The Kosaraju algorithm could find all SCCs this way:
- Step 1:
DFSAll(rev(G))
outputs vertices in the reverse (descendant) order of finishing time. Suppose the order isa → b ... → z
:- source → sink order in
- i.e
post(a) > post(b) > ... > post(z)
in .
- i.e
- sink → source order in
- i.e.
post(a) < post(b) < ... < post(z)
in .
- i.e.
- source → sink order in
- Step 2:
DFSAll(G)
in the order above- Each call to
DFS
in the loop ofDFSAll(G)
visits exactly one strong component of :- If
cannot reach any vertex, then is an SCC itself.- Enter the next iteration in
DFSALL(G)
callingDFS
.
- Enter the next iteration in
- If
can reach a vertexi
, becausepost(a) > post(i)
, then andi
are strongly connected.- The current iteration in
DFSALL(G)
callingDFS
goes further and will find an SCC at last.
- The current iteration in
- If
- Each call to
8. Shortest PathsPermalink
Given a weighted digraph
A more general problem is called single source shortest path or SSSP: find the shortest path from
N.B. Throughout this post, we will explicitly consider only directed graphs. All of the algorithms described in this lecture also work for undirected graphs with some minor modifications, but only if negative edges are prohibited. On the other hand, it’s OK for directed graphs to have negative edges in this problem. However, negative cycles, which make this problem meaningless, are prohibited.
8.1 The Only SSSP AlgorithmPermalink
Let’s define:
-
dist(b)
is the length of tentative shortest path, or if there is no such path.- tentative: [ˈten(t)ətiv], not certain or fixed
-
pred(b)
is the predecessor of in the tentative shortest path, or NULL if there is no such vertex.
def is_tense(a → b):
return dist(a) + w(a → b) < dist(b)
# `is_tense` means dist(b) better go through a → b because it's shorter
def relax(a → b):
dist(b) <- dist(a) + w(a → b)
pred(b) <- a
# `relax` means "yes! now I go through a → b"
If no edge in x
, dist(x)
is the length of the predecessor path s → ... → pred(pred(x)) → pred(x) → x
, which is, at the same time, the shortest
def InitSSSP(s):
dist(s) <- 0
pred(s) <- NULL
for all vertices v != s:
dist(v) <- ∞
pred(v) <- NULL
def GenericSSSP(s):
InitSSSP(s)
put s in the bag
while the bag is not empty:
take u from the bag
for all edges u → v:
if u → v is tense:
Relax(u → v)
put v in the bag
Just as with graph traversal, different “bag” data structures for the give us different algorithms. There are three obvious choices to try: a stack, a queue, and a priority queue. Unfortunately, if we use a stack, the resulting algorithm performs
8.2 Dijkstra’s AlgorithmPermalink
If we implement the bag using a priority queue, where the priority of a vertex v
is its tentative distance dist(v)
, we obtain Dijkstra’s Algorithm
- A priority queue is different from a “normal” queue, because instead of being a “first-in-first-out” data structure, values come out in order by priority.
- This means every time we “take
u
from the bag”, we take the one with minimaldist(x)
.- 其实可以看做是一种 greedy algorithm
Dijkstra’s algorithm is particularly well-behaved if the graph has NO negative-weight edges. In this case, it’s not hard to show (by induction, of course) that the vertices are scanned in increasing order of their shortest-path distance from s
. It follows that each vertex is scanned at most once, and thus that each edge is relaxed at most once.
Since the priority of each vertex in the priority queue is its tentative distance from s
, the algorithm performs a decreasePriority
operation every time an edge is relaxed. Thus, the algorithm performs at most decreasePriority
. Similarly, there are at most enqueue
and getMinPriority
operations. Thus, if we implement the priority queue with a Fibonacci heap, the total running time of Dijkstra’s algorithm is
8.3 Shimbel’s AlgorithmPermalink
If we replace the heap in Dijkstra’s algorithm with a FIFO queue, we obtain Shimbel’s Algorithm. Shimbel’s algorithm is efficient even if there are negative edges, and it can be used to quickly detect the presence of negative cycles. If there are no negative edges, however, Dijkstra’s algorithm is faster.
def ShimbelSSSP(s):
InitSSSP(s)
repeat V times: # 其实 repeat (V-1) 次就够了
for every edge u → v:
if u → v is tense:
Relax(u → v)
A simple inductive argument implies the following invariant for every repeat index i
and vertex v
: After i
phases of the algorithm, dist(v)
is at most the length of the shortest walk from s
to v
consisting of at most i
edges.
Repeat s
到某个 t
的路径最多有 relax
s
到某个 t
的路径一般不会把
更多的解释见:Bellman-Ford algorithm - Why can edges be updated out of order?。
In each phase, we scan each vertex at most once, so we relax each edge at most once, so the running time of a single phase is
9. All-Pairs Shortest Paths (APSP)Permalink
-
dist(u, v)
is the length of the shortest path. -
pred(u, v)
is the second-to-last vertex on the shortest path, i.e. the vertex beforev
on the path.
The output of our shortest path algorithms will be a pair of
9.1 Intuition 1: Run SSSP for every vertexPermalink
- Shimbel’s:
- Dijkstra’s:
For graphs with negative edge weights, Dijkstra’s algorithm can take exponential time, so we can’t get this improvement directly.
9.2 Johnson’s AlgorithmPermalink
Johnson’s 的出发点很简单:How to get rid of negative edges while keeping the SP? 解决了这个问题,就可以 run Dijkstra’s for every vertex 了。
这里我们做一个 reweighting:
所以这么 rewighting 并不会影响我们找最小的
你有没有觉得 is_tense(a → b)
? 所以我们跑一遍 Shimbel,让
至于为什么是 Shimbel 而不是 Dijkstra?Because Shimbel doesn’t care if the edge weights are negative.
def JohnsonAPSP(G=(V,E), w):
# Step 1: O(V)
add a source s, connect it to all v ∈ V, with edge length 0
# Step 2: O(EV)
c[] <- Shimbel(G+s)
# Step 3: O(E). Reweighting
for all u → v ∈ E:
w(u → v) = c(u) + w(u → v) − c(v)
# Step 4: O(VE + V^2 log V). Dijkstra for every vertex
for all v ∈ V:
Dijkstra(v)
# Step 5: O(E). Recover from reweighting
for all u → v ∈ E:
dist(u,v) = dist(u,v) + c(v) − c(u)
RT:
9.3 Intuition 2: Dynamic ProgrammingPermalink
Unfortunately, this recurrence doesn’t work! To compute dist(u,v)
, we may need to compute dist(u,x)
for every other vertex x
. But to compute dist(u,x)
, we may need to compute dist(u,v)
. We’re stuck in an infinite loop.
To avoid this circular dependency, we need an additional parameter that decreases at each recursion, eventually reaching zero at the base case. One possibility is to include the number of edges in the shortest path as this third magic parameter, just as we did in the dynamic programming formulation of Shimbel’s algorithm. Let dist(u, v, k)
denote the length of the shortest path from u
to v
that uses at most k
edges. Since we know that the shortest path between any two vertices has at most V − 1
vertices, dist(u, v, V−1)
is the actual shortest-path distance. As in the single-source setting, we have the following recurrence:
def DP_APSP(V, E, w):
for all u ∈ V:
for all v ∈ V:
if u = v:
dist[u,v,0] = 0
else:
dist[u,v,0] = ∞
for k = 1 to V-1:
for all u ∈ V:
dist[u,u,k] = 0
for all v ∈ V and v ≠ u:
dist[u,v,k] = ∞
for all x → v ∈ E:
if dist[u,v,k] > dist[u,x,k−1] + w(x → v)
dist[u,v,k] = dist[u,x,k−1] + w(x → v)
RT:
Improvement: Shimbel APSP
Just as in the dynamic programming development of Shimbel’s single-source algorithm, we don’t actually need the inner loop over vertices v
, and we only need a two-dimensional table. After the k
^th iteration of the main loop in the following algorithm, dist[u,v]
lies between the true shortest path distance from u
to v
and the value dist[u,v,k]
computed in the previous algorithm.
def Shimbel_APSP(V, E, w):
for all u ∈ V:
for all v ∈ V:
if u = v:
dist[u,v] = 0
else:
dist[u,v] = ∞
for k = 1 to V-1:
for all u ∈ V:
for all x → v ∈ E:
if dist[u,v] > dist[u,x] + w(x → v):
dist[u,v] = dist[u,x] + w(x → v)
RT:
9.4 Intuition 3: DP + Divide and ConquerPermalink
def DC_DP_APSP(V, E, w):
for all u ∈ V:
for all v ∈ V:
dist[u,v,0] = w(u → v)
for k = 1 to (log V):
for all u ∈ V:
for all v ∈ V:
dist[u,u,k] = ∞
for all x ∈ V:
if dist[u,v,k] > dist[u,x,k−1] + dist[x,v,k−1]:
dist[u,v,k] = dist[u,x,k−1] + dist[x,v,k−1]
RT:
Improvement: Shimbel Version reduces the size of DP table
def DC_Shimbel_APSP(V, E, w):
for all u ∈ V:
for all v ∈ V:
dist[u,v] = w(u → v)
for k = 1 to (log V)
for all u ∈ V:
for all v ∈ V:
for all x ∈ V:
if dist[u,v] > dist[u,x] + dist[x,v]:
dist[u,v] = dist[u,x] + dist[x,v]
RT:
9.5 Floyd-Warshall: an DP AlgPermalink
All the DP algs above are still a factor of
Number the vertices arbitrarily from 1 to
def FloydWarshall(V, E, w):
for all u ∈ V:
for all v ∈ V:
dist[u,v,0] = w(u → v)
for r = 1 to V
for all u ∈ V:
for all v ∈ V:
dist[u,v,r-1] = min(dist[u,v,r-1], dist[u,r,r-1] + dist[r,v,r-1])
Just like our earlier algorithms, we can simplify the algorithm by removing the third dimension of the memoization table. Also, because the vertex numbering was chosen arbitrarily, there’s no reason to refer to it explicitly in the pseudocode.
def FloydWarshall(V, E, w):
for all u ∈ V:
for all v ∈ V:
dist[u,v] = w(u → v)
for all r ∈ V:
for all u ∈ V:
for all v ∈ V:
if dist[u,v] > dist[u,r] + dist[r,v]
dist[u,v] = dist[u,r] + dist[r,v]
RT:
9.6 APSP in unweighted undirected graphs: Seidel’s AlgPermalink
待续
10. Minimum Spanning TreesPermalink
和 Shortest Path 一样,MST 的 minimum 指的也是 tree 的各个 edges 的 weight 之和,
P.S. 注意用词:
- minimum 是可以用作形容词的,which is a quantitative representation of the smallest amount needed
- minimal is a qualitative characteristic. 虽然有的时候也用来表示定量。
To keep things simple, I’ll assume that all the edge weights are distinct:
10.1 The Only MST AlgorithmPermalink
The generic minimum spanning tree algorithm MAINTAINS an acyclic subgraph
The intermediate spanning forest
is USELESS if both its endpoints are in the same component of .- 举个简单的例子:
有一个 component 是一个 triangleABC
,但是只走了A → B
和B → C
两条边,那么剩下的A → C
就是条 useless edge。
- 举个简单的例子:
- For each component of
, we associate a SAFE edge—the minimum-weight edge with exactly one endpoint in that component.- Different components might or might not have different safe edges.
- 算法的执行过程就是不断地给
添加 safe edges,使其从 forest 变成 tree。
- Some edges are neither safe nor useless—we call these edges undecided.
Lemma 1. MST contains every safe edge.
注意 safe edges 是
Proof: In fact we prove the following stronger statement: For any subset
Let
注意不可能有
而
Contradiction.
Lemma 2. MST contains no useless edge.
Proof: Adding any useless edge to
注意 useless edge 的 endpoints 是在一个 component 中的,而 component 又是 connected 的,所以这两个 endpoints 一定是连通的,你再把这两个 endpoints 连接起来,一定会有 cycle.
Our generic minimum spanning tree algorithm repeatedly adds one or more safe edges to the evolving forest
10.2 Borvka’s AlgorithmPermalink
The Borvka algorithm can be summarized in one line:
Borvka: Add ALL the safe edges and recurse.
We can find all the safe edge in the graph in
def Borvka(G):
F = (V, Ef)
Ef <- Φ # 空集
while F is not connected
S <- all safe edges
Ef <- Ef + S
return F
RT:
- To find all safe edges, just examine every edge (check with
). - To test whether
is connected, actually you can just count the # of edges, using the conclusion:- A forest with
components and nodes has graph edges.- I proved a related observation in homework 1 – If a connected graph with
vertices has edges, it’s a tree.
- I proved a related observation in homework 1 – If a connected graph with
- A forest with
- # of iterations of the while-loop?
- Each iteration reduces the number of components of
by at least a factor of two—the worst case occurs when the components coalesce in pairs.- coalesce: [ˌkoʊəˈles], (of separate elements) To join into a single mass or whole.
- 最差的情况是 component 两两合并,比如
A
合并B
、C
合并D
… - 最好的情况应该是类似一个链式反应,比如
A
合并B
、然后接着合并C
、接着合并D
…
- Since
initially has components, the while loop iterates at most times.
- Each iteration reduces the number of components of
- In total,
.
In short, if you ever need to implement a minimum-spanning-tree algorithm, use Borvka. On the other hand, if you want to prove things about minimum spanning trees effectively, you really need to know the next two algorithms as well.
10.3 Prim’s AlgorithmPermalink
Initially,
Jarník: Repeatedly add
’s safe edge to .
def Prim(G):
T <- {v} # 任意的 vertex
while |T| < |V|
e <- a safe edge with one endpoint in T
T <- T + e
return T
To implement Jarník’s algorithm, we keep all the edges adjacent to
Similar to Dijkstra’s algorithm, if we implement the priority queue with a Fibonacci heap, the total running time would be
10.4 Kruskal’s AlgorithmPermalink
Kruskal: Scan all edges in increasing weight order; if an edge is safe, add it to F.
def Kruskal(G):
sort E w.r.t w
F = (V, Ef)
Ef <- Φ # 空集
for i <- 1 to E
if E[i] is not useless
Ef <- Ef + E[i]
return F
RT:
11. Matroids (待续)Permalink
11.1 Definitions (待续)Permalink
A matroid
-
Non-emptiness: The empty set
is in . (Thus, is not itself empty.) -
Heredity: If a set
is an element of , then any subset of is also in . -
Exchange: (a.k.a Augmentation) If
and are two sets in where , then there an element such that is in .
- The sets in
are typically called independent sets.- Therefore, the three axioms can also be stated as:
-
Non-emptiness: The empty set
is independent. -
Heredity: If a set
is independent, then any subset of is also independent. -
Exchange (a.k.a Augmentation): If
and are two independent sets where , then there an element such that is also independent.
-
Non-emptiness: The empty set
- Therefore, the three axioms can also be stated as:
- The union of all sets in
is called the ground set.- In set theory, a collection,
, of subsets of a given set is called a family of subsets of . - Therefore, a matroid is a family of subsets of its ground set.
- In set theory, a collection,
- A maximal independent set is called a basis.
- “Maximal” means it is not a proper subset of any other independent set. E.g.
- Ground set
, and are all bases.
- Ground set
- The exchange property implies that every basis of a matroid has the same cardinality (i.e. size).
- The rank of a matroid is the size of its bases.
- “Maximal” means it is not a proper subset of any other independent set. E.g.
- A subset of the ground set that is not in
is a dependent set.- E.g. ground set
itself is a dependent set above.
- E.g. ground set
- A dependent set is called a circuit if any of its proper subset is independent.
- E.g. ground set
itself is a circuit above.
- E.g. ground set
Here are several other examples of matroids; some of these we will see again later.
-
Linear matroid: Let
be any matrix. A subset is independent if and only if the corresponding subset of columns of is linearly independent. -
Uniform matroid
: A subset is independent if and only if . Any subset of of size is a basis; any subset of size is a circuit. -
Matching matroid: Let
be an arbitrary undirected graph. A subset is independent if there is a matching in that covers .
TODO: Lecture note 和笔记本上还有些例子待补充。
11.2 Matroid Optimization Problem (待续)Permalink
Now suppose each element of the ground set of a matroid
There goes a greedy alg:
# Given ground set U and matroid I
def GreedyMatroidOPT(U, I, w):
B = Φ # 空集
# O(n log n)
sort U w.r.t w in decreasing order
# O(n)
for each u ∈ U:
# F(n)
if B + {u} ∈ I: # i.e. B + {u} is independent
B = B + {u}
return B
有没有觉得很像 Kruskal’s Algorithm?
RT:
TODO: 补充 proof
12. Matching (待续)Permalink
12.1 The Maximum Matching ProblemPermalink
Let
A vertex
In the maximum matching problem we are asked to find a matching
12.2 Alternating and Augmenting Paths (待续)Permalink
Let
E.g.
If
An augmenting path
Lemma 2.3 If
这里你要把
下面有三条 alternating paths,粗线条的是 matching:
-
CASE 1: If
starts and ends in vertices unmatched by M (i.e. is -augmenting) (e.g., the top path in figure), then , i.e., is a larger matching. -
CASE 2: If
starts with an edge that does not belong to and ends with an edge of (e.g., the middle path in figure), then . -
CASE 3: If
starts and ends with edges of (see the bottom path in figure 2), then .
TODO: proof of CASE 1
Lemma 2.5 Let
Lemma 2.6 Let
TODO: proof
Theorem 2.7 Let
TODO: proof
Theorem 2.7 suggests the following simple algorithm for finding a maximum matching:
def MaxMatching(G):
M = Φ # 空集
while exists an M-aug path P:
M = M ⴲ P
return M
- The
while
loop would take iteration. - For bipartite graphs,
AltBFS
alg takes time to find an -aug path.
TODO: bipartite graphs & AltBFS
alg
13. Testing Polynomial Identity (待续)Permalink
13.1 The Schwartz-Zippel Algorithm (待续)Permalink
检测两个 polynomial
首先来看 polynomial degree 的概念:
The degree of a polynomial is the highest degree of its terms when the polynomial is expressed in its canonical form consisting of a linear combination of monomials. The degree of a term is the sum of the exponents (指数) of the variables that appear in it.
比如
- term
的 degree 是 - term
的 degree 是 - term
的 degree 是 - 所以 polynomial 的 degree 等于
The idea of the algorithm is very simple: assign values
Claim 2.1 If
假设我们有
TODO: 补充证明
Digress: Permutation / Matrix Determinant (行列式)Permalink
onto function: A function
one-to-one function: A function
bijection: A functions that is both one-to-one and onto. Such functions are also referred to as bijective (双射).
A permutation of a set
TODO: Prove that
Given an
E.g.
; ;
13.2 Application to Bipartite MatchingPermalink
Although bipartite matching is easy to solve in deterministic polynomial time using flow techniques, it remains an important open question whether there exists an efficient parallel deterministic algorithm for this problem.
Bipartite matching is the following problem: Given a bipartite graph
Definition 2.2 (Tutte matrix) The Tutte matrix
Claim 2.3
而且这个 perfect matching 可以被
14. Eulerian GraphPermalink
Eulerian path: a path,
Eulerian cycle: a closed Eulerian path.
Eulerian Graph:
Thm:
Proof: Suppose every degree is even. We will show that there is an Euler cycle by induction on the number of edges in the graph.
The base case is for a graph
Now suppose we have a graph
Let
Since
We describe an Euler cycle in
- Starting at
; - Follow
until reaching , follow the entire ending back at ;- follow
until reaching , follow the entire , ending back at and so on;- follow
until reaching , and follow the entire , ending back at .
- follow
- follow
- Finish off
, ending at .
Thm: Odd-order complete graph is Eulerian.
- Let
be the complete graph of vertices. - Then
is Eulerian if and only if is odd. - If
is even, then is traversable iff .- If a graph is not traversable, it cannot be Eulerian.
15. Hamiltonian PathPermalink
Hamiltonian path: a simple path that contains all vertices.
Hamiltonian cycle: a simple cycle that contains all vertices.
- “simple” means “containing each edge exactly once”.
Hamiltonian path/cycle problems are of determining whether a Hamiltonian path/cycle exists in a given graph (whether directed or undirected). Both problems are NP-complete.
There is a simple relation between the problems of finding a Hamiltonian path and a Hamiltonian cycle.
- In one direction, the Hamiltonian path problem for graph
is equivalent to the Hamiltonian cycle problem in a graph obtained from by adding a new vertex and connecting it to all vertices of . Thus, finding a Hamiltonian path cannot be significantly slower (in the worst case, as a function of the number of vertices) than finding a Hamiltonian cycle. - In the other direction, the Hamiltonian cycle problem for a graph
is equivalent to the Hamiltonian path problem in the graph obtained by copying one vertex of , , that is, letting have the same neighbourhood as , and by adding two dummy vertices of degree one, and connecting them with and , respectively.- The Hamiltonian cycle problem is also a special case of TSP (travelling salesman problem).
A brute-force alg: try every permutation of vertices (that’s
An BellmanHeldKarp
. (已知
Base Case:
DP table: (# of
RT: time to fill a cell
留下评论