MST Review
Input: Undirected graph G = (V, E),edges costs
.
Output: Min-cost spanning tree(no cycles, connected).
Assumptions : G is connected, distinct edge costs.
Cut Property: If e is the cheapest edge crossing some cut(A, B), then e belongs to the MST.
Kruskal’s MST Algorithm
Sort edges in order of increasing cost.
[Rename edges 1,2,…,m so that
.]
For
to
O(m) time
If
has no cycles. O(n) time to check for cycle.
[ Use BFS or DFS in the graph(V,T)which contains
n-1 edges.
Add to .
Return
.
Running time of straightforward implementation:
(m = # of edges, n= # of vertices] O(mlogn)+O(mn) = O(mn).
Plan: Data structure for O(1)-time cycle checks
time.
The Union-Find Data structure.
Maintain partition of a set of objects.
FIND(X): Return name of group that X belongs to.
UNION(
,
): Fuse groups
into a single one.
Objects = vertices
Groups = Connected components chosen edges T.
Adding new edge(u,v) to T.
Motivation: O(1)-time cycle checks in Kruskal’s algorithm.
Idea#1: -Maintain one linked structure per connected component of(V,T).
Each component has an arbitrary leader vertex.
Invariant: Each vertex points to the leaders of its component.
Key point: Given edge(u,v), can check if u&v already in same component in O(1) time.
[if and only if leader pointers of u, v match,i,e.,FIND(u) = FIND(v) ]
O(1)-time cycle checks!
Runing Time of Fast Implementation
Scored:
O(mlogn) time for sorting .
O(m) times for cycle checks[O(1) per iteration]
O(nlog(n)) time overall for leader pointer updates.
total (Matching Prim’s algorithm)
Clustering [unsupervised learning]
Informal goal: Given n “points”,[Web pages, images, genome fragments,etc] classfify into “coherent groups”;
Assumptions:
(1) As input, given a (dis) similarity measure
a distance
between each point pair.
(2)Symmetric [d(p,q) = d(q,p)]
Examples: Euclidean distance, genome similarity, etc.
Goal: Same cluster
“nearby”
Max-Spacing k-Clusterings
Assume: We know
# of clusters desired.
[In practice, can experiment with a range of values]
Call points p & q separated is they’re assigned to different clusters.
Definition: The spacing of a k-clustering is
.
Problem statement: Given a distance measure
and
, compute the
with maximum spacing.
A Greedy Algorithm
Initially, each point in a sepatate cluster
Repeat until only k clusters.
Let
closest pair of separated points.
(determines the current spacing)
Merge the cluster containg p & q into a single cluster.
Note: Just like Krusskal’s MST algorithm, but stopped early.
Points
vertices, distances
edges costs, point pairs.
edges.
Called single-link clustering.