15.1 Graph Foundation

Created Date: 2025-05-18

15.1.1 What is Graph?

A graph is a type of nonlinear data structure, consisting of vertices and edges. A graph \(G\) can be abstractly represented as a collection of a set of vertices \(V\) and a set of edges \(E\). The following example shows a graph containing 5 vertices and 7 edges:

\(V = \{1, 2, 3, 4, 5\}\)

\(E = \{(1, 2), (1, 3), (1, 5), (2, 3), (2, 4), (2, 5), (4, 5)\}\)

\(G = \{V, E\}\)

If vertices are viewed as nodes and edges as references (pointers) connecting the nodes, graphs can be seen as a data structure that extends from linked lists.

Compared to linear relationships (linked lists) and divide-and-conquer relationships (trees), network relationships (graphs) are more complex due to their higher degree of freedom, as show below:

15.1.2 Common Types

Graphs can be divided into undirected graphs and directed graphs depending on whether edges have direction:

  • In undirected graphs, edges represent a "bidirectional" connection between two vertices, for example, the "friends" in Facebook.

  • In directed graphs, edges have directionality, that is, the edges \(A \rightarrow B\) and \(A \leftarrow B\) are independent of each other. For example, the "follow" and "followed" relationship on Instagram.

Depending on whether all vertices are connected, graphs can be divided into connected graphs and disconnected graphs:

  • For connected graphs, it is possible to reach any other vertex starting from an arbitrary vertex.

  • For disconnected graphs, there is at least one vertex that cannot be reached from an arbitrary starting vertex.

We can also add a weight variable to edges, For example, in Instagram, the system sorts your follower and following list by the level of interaction between you and other users (likes, views, comments, etc.). Such an interaction network can be represented by a weighted graph.

Graph data structures include the following commonly used terms.

15.1.3 Representation of Graphs

Common representations of graphs include "adjacency matrix" and "adjacency list". The following examples use undirected graphs.

Adjacency matrix

Let the number of vertices in the graph be \(n\), the adjacency matrix uses an \(n \times n\), matrix to represent the graph, where each row (column) represents a vertex, and the matrix elements represent edges, with 1 or 0 indicating whether there is an edge between two vertices.

Let the adjacency matrix be \(M\), and the list of vertices be \(V\), then the matrix element \(M[i, j] = 1\)indicates there is an edge between vertex \(V[i]\) and \(V[j]\), conversely \(M[i, j] = 0\) indicates there is no edge between the two vertices.

Adjacency matrices have the following characteristics.

  • A vertex cannot be connected to itself, so the elements on the main diagonal of the adjacency matrix are meaningless.

  • For undirected graphs, edges in both directions are equivalent, thus the adjacency matrix is symmetric with regard to the main diagonal.

  • By replacing the elements of the adjacency matrix from 1 and 0 to weights, we can represent weighted graphs.

When representing graphs with adjacency matrices, it is possible to directly access matrix elements to obtain edges, resulting in efficient operations of addition, deletion, lookup, and modification, all with a time complexity of \(O(1)\). However, the space complexity of the matrix is \(O(n^2)\), which consumes more memory.

15.1.4 Basic Graph Operations

15.1.5 Graph Traversal

15.1.5.1 Breadth-first Search

Breadth-first search is a near-to-far traversal method, starting from a certain node, always prioritizing the visit to the nearest vertices and expanding outwards layer by layer. As shown in below figure, starting from the top left vertex, first traverse all adjacent vertices of that vertex, then traverse all adjacent vertices of the next vertex, and so on, until all vertices have been visited.

BFS is usually implemented with the help of a queue, as shown in the code below. The queue is "first in, first out", which aligns with the BFS idea of traversing "from near to far":

  1. Add the starting vertex startVet to the queue and start the loop.

  2. In each iteration of the loop, pop the vertex at the front of the queue and record it as visited, then add all adjacent vertices of that vertex to the back of the queue.

  3. Repeat step 2 until all vertices have been visited.

To prevent revisiting vertices, we use a hash set visited to record which nodes have been visited:

Time complexity: All vertices will be enqueued and dequeued once, using \(O(|V|)\) time; in the process of traversing adjacent vertices, since it is an undirected graph, all edges will be visited 2 times, using \(O(2|E|)\) time; overall using \(O(|V| + |E|)\) time.

15.1.5.2 Depth-first Search