{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Simultaneous Iteration for Eigenvalue Computation\n",
"\n",
"We want to implement an iterative method to simultaneously calculate the $p \\leq n$ dominant eigenvalues of a matrix $A\\in \\mathbb C^{n\\times n}$, which we assume to be Hermitian.\n",
"The method relies on the reduced QR decomposition, in the form:\n",
"\n",
"$$ QR = M,$$\n",
"where:\n",
"- $M\\in\\mathbb C^{n\\times p}$.\n",
"- $Q\\in \\mathbb C^{n\\times p}$ and its columns form an orthonormal set in $\\mathbb C^n$ with respect to the usual inner product.\n",
"- $R\\in \\mathbb C^{p\\times p}$ is an upper triangular matrix with real and positive diagonal entries.\n",
"\n",
"#### Question 1\n",
"Provide the reduced QR decomposition of $M$ when $p=1$.\n",
"\n",
"#### Question 2\n",
"By writing\n",
"$$ Q = \\begin{pmatrix} \\widetilde Q &\\mathbf{q}\\end{pmatrix},\\qquad R = \\begin{pmatrix} \\widetilde R & \\mathbf{r} \\\\ 0 & \\alpha\\end{pmatrix}, \\qquad M = \\begin{pmatrix} \\widetilde{M} & \\mathbf m\\end{pmatrix}$$\n",
"with $\\widetilde Q,\\widetilde M \\in \\mathbb C^{n\\times (p-1)}$, $\\widetilde R\\in\\mathbb C^{(p-1)\\times (p-1)}$, $\\mathbf r \\in \\mathbb C^{p-1}$, $\\alpha\\in \\R_+$, and $\\mathbf m\\in \\mathbb C^n$, provide the QR decomposition of $M$ given the decomposition $\\widetilde Q \\widetilde R = \\widetilde M$ of the matrix formed by the first $(p-1)$ columns of $M$. (Use the properties of the QR decomposition).\n",
"\n",
"\n",
"#### Question 3\n",
"Implement a method to calculate the reduced QR decomposition."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "07cea03a",
"metadata": {},
"outputs": [],
"source": [
"using LinearAlgebra\n",
"\n",
"@views function myQR(M)\n",
" n,p = size(M)\n",
"\n",
" @assert p <= n \"Error: p > n\"\n",
" ## Implement the decomposition here\n",
" Q = zero(M)\n",
" R = zeros(eltype(M),p,p)\n",
"\n",
" return Q,R\n",
"end"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"M = randn(ComplexF64,100,50)\n",
"M = Complex{BigFloat}.(M)\n",
"Q,R = myQR(M)\n",
"\n",
"@show norm(Q'Q-I) # Check that the columns of Q are orthogonal\n",
"@show all(isreal.(diag(R))) # Check that the diagonal of R is real\n",
"@show all(real.(diag(R)) .>= 0) # and positive\n",
"@show norm(Q*R-M) # Check that QR = M up to roundoff error\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"@time qr(M) # Implementation of LinearAlgebra\n",
"@time myQR(M) # Our implementation"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Question 4\n",
"The simultaneous iteration method is defined, given an initial condition $X_0\\in\\mathbb C^{n\\times p}$, by the recurrence relation:\n",
"$$ X_{n+1} R_{n+1} = A X_n,$$\n",
"where $X_{n+1} R_{n+1}$ is the reduced QR decomposition of $AX_n \\in \\mathbb C^{n\\times p}$. It can be shown that for a Hermitian matrix $A$ and appropriate initial conditions $X_0$, the matrix $X_n$ essentially converges column by column to the matrix formed by the $p$ associated eigenvectors (see lecture notes for a proof).\n",
"\n",
"Implement a function `myEigen` in Julia that takes as arguments: a matrix `M` (assumed Hermitian), an integer `nev` (the number of eigenvalues to compute), an integer `niter` (the number of iterations), and an initial matrix `X₀`. The function should return a vector of `nev` eigenvalues and a matrix formed by an approximation of the dominant `nev` eigenvectors of `M` in columns.\n",
"\n",
"For matrices with entries that are not `BigFloat`, you can compare the results with those given by `LinearAlgebra.eigen` for debugging purposes."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"n,nev = 10,3\n",
"\n",
"## Seed for the random number generator\n",
"using Random\n",
"Random.seed!(2023)\n",
"##\n",
"\n",
"X = randn(n,n)/sqrt(n)\n",
"X₀ = randn(n,nev)/sqrt(n)\n",
"\n",
"\n",
"function myEigen(M,nev,niter,X₀)\n",
" ## Implement the iteration here\n",
" return λs,us\n",
"end\n",
"\n",
"M = BigFloat.(I + X'X)\n",
"\n",
"@time λs,us=myEigen(M,nev,500,X₀)\n",
"\n",
"display(λs)\n",
"display(us)\n",
"display(M*us - λs' .*us)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Question 5\n",
"\n",
"Study numerically the convergence speed of the first 3 eigenvectors towards their limit as a function of the number of iterations of the method. It can be assumed that all eigenvectors have converged after a large number (let's say 10000) of iterations to obtain reference values. The difference between two vectors $u$ and $v$ will be measured in terms of the angle formed by the generated subspaces:\n",
"$$ \\theta_{u,v} = \\arccos\\left(\\frac{|u^\\intercal v|}{\\|u\\|\\|v\\|}\\right).$$\n",
"Compare the error at the $k$-th iteration and the quantities $(\\lambda_i/\\lambda_{i+1})^{-k},\\, i=1,2,3$, on a graph with a logarithmic $y$-scale.\n",
"\n",
"To ensure numerically that $0\\leq \\frac{|u^\\intercal v|}{\\|u\\|\\|v\\|}\\leq 1$, the `clamp` function can be used."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"@time λs,us=myEigen(M,n,10000,randn(n,n)) # on calcule les valeurs et vecteurs propres \"convergés\"\n",
"display(M*us - λs' .*us)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"nev=3\n",
"\n",
"iters = 1:200\n",
"errs = [Float64[] for i=1:nev]\n",
"\n",
"for niter in iters\n",
" λhats,uhats = myEigen(M,nev,niter,X₀)\n",
" ## Calculate errors here\n",
"end\n",
"\n",
"using Plots,LaTeXStrings\n",
"colors = (:red,:blue,:green)\n",
"pl = plot(xlabel=\"Itération k\", ylabel=\"Erreur essentielle\",yaxis=:log,legend=:bottomleft)\n",
"\n",
"for i=1:nev\n",
" plot!(pl,iters,errs[i],label=L\"θ_{%$i}\",color=colors[i])\n",
" plot!(pl,k->(λs[i]/λs[i+1])^-k,label=L\"(λ_{%$i}/λ_{%$(i+1)})^{-k}\",linestyle=:dot,color=colors[i])\n",
"end\n",
"\n",
"pl"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "0f7dc8eb",
"metadata": {},
"source": [
"# PageRank Algorithm"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "2530e31e",
"metadata": {},
"source": [
"PageRank is an algorithm used to assign a score to the vertices of a directed graph. It is employed by numerous search engines to rank search results. In this context, the directed graph encodes the links between pages on the World Wide Web: the vertices of the directed graph represent web pages, and the edges represent connections between pages—there is an edge from page $i$ to page $j$ if page $i$ contains a hyperlink to page $j$.\n",
"\n",
"Consider a directed graph $G(V, E)$ with vertices $V = \\{1, \\dotsc, n\\}$ and edges $E$. The graph can be represented by its adjacency matrix $\\mathsf A \\in \\{0, 1\\}^{n \\times n}$, where the entries are given by\n",
"$$\n",
"a_{ij} =\n",
"\\begin{cases}\n",
" 1 & \\text{if there is an edge from $i$ to $j$,} \\\\\n",
" 0 & \\text{otherwise.}\n",
"\\end{cases}\n",
"$$\n",
"The idea of the PageRank algorithm, in its simplest form, is to assign scores $r_i$ to the vertices by solving the following system of equations:\n",
"$$ \\tag{PageRank}\n",
" \\forall i \\in V, \\qquad\n",
" r_i\n",
" = \\sum_{j \\in \\mathcal N(i)} \\frac{r_j}{o_j}.\n",
"$$\n",
"where $o_j$ is the out-degree of vertex $j$, i.e., the number of edges originating from $j$. Here, the sum applies to the set of nodes in $\\mathcal N(i)$, representing the set of incoming neighbors of vertex $i$, i.e., those with an edge pointing towards vertex $i$.\n",
"\n",
"Let $\\mathbf r = \\begin{pmatrix} r_1 & \\dots & r_n \\end{pmatrix}^T$. It can be shown that solving the system (PageRank) is equivalent to solving the following problem:\n",
"$$ \\tag{PageRank-vector}\n",
" \\mathbf r =\n",
" \\mathsf A^T\n",
" \\begin{pmatrix}\n",
" \\frac{1}{o_1} & & \\\\\n",
" & \\ddots & \\\\\n",
" & & \\frac{1}{o_n}\n",
" \\end{pmatrix}\n",
" \\mathbf r =: \\mathsf A^T \\mathsf O^{-1} \\mathbf r.\n",
"$$\n",
"In other words, the problem is to find an eigenvector with eigenvalue 1 of the matrix $\\mathsf M = \\mathsf A^T \\mathsf O^{-1}$. Note that at this stage, we have not proven the existence or uniqueness of a solution to this equation. The question of the uniqueness of a solution is related to the graph's connectivity and will not be addressed here. However, we will demonstrate that a solution to the problem exists.\n",
"\n",
"**Remark.** The matrix $\\mathsf O^{-1} \\mathsf A$ is the transition matrix of a random walk on the directed graph, where at each step, a move is made towards an outgoing neighbor with equal probability for each of them. Solving (PageRank-vector) is equivalent to finding a stationary distribution of this random walk.\n",
"\n",
"**Questions:**\n",
"\n",
"1. Notice that $\\mathsf M$ is a left stochastic matrix, meaning that the sum of the elements in each column is equal to 1.\n",
"\n",
"2. Prove that the eigenvalues of any matrix $\\mathsf B \\in \\mathbb R^{n \\times n}$ coincide with those of $\\mathsf B^T$. You can use the fact that $\\det(\\mathsf B) = \\det(\\mathsf B^T)$.\n",
"\n",
"3. Using the previous points, show that 1 is an eigenvalue, and $\\rho(\\mathsf M) = 1$. For the second part, find a subordinate matrix norm such that $\\lVert\\mathsf M\\rVert= 1$. This demonstrates the existence of a solution to (PageRank-vector) and also proves that 1 is the dominant eigenvalue of $\\mathsf M$.\n",
"\n",
"4. Implement the PageRank algorithm without using the `SparseArrays` package to rank the pages of the English-language Wikipedia as it was in 2013 (see instructions below for downloading the dataset). After assigning a score to all pages, print the top 10 ranked pages, which should be as follows:\n",
" ```\n",
" United States, United Kingdom, World War II, Latin, France, Germany, English language, China, Canada, India\n",
" ```\n",
"\n",
"5. Write a function `search(keyword)` to perform a search in the database. Here is an example of what this function could return:\n",
" ```\n",
" julia> search(\"Newton\")\n",
" 47-element Vector{String}:\n",
" \"Isaac Newton\"\n",
" \"Newton (unit)\"\n",
" \"Newton's laws of motion\"\n",
" …\n",
" ```"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "f870c2a9",
"metadata": {},
"source": [
"**Additional indications**:\n",
"\n",
"- Use the following code to download the data."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d9602dc0",
"metadata": {},
"outputs": [],
"source": [
"using LinearAlgebra\n",
"import Downloads\n",
"import Tar\n",
"\n",
"# URL where data can be downloaded\n",
"url = \"https://urbain.vaes.uk/static/wikidata.tar\"\n",
"\n",
"# Download the data\n",
"filename = \"wikidata.tar\"\n",
"isfile(filename) || Downloads.download(url, filename)\n",
"\n",
"# Extract data into directory `wikidata`\n",
"dirname = \"wikidata\"\n",
"isdir(dirname) || Tar.extract(filename, dirname)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "f6124b73",
"metadata": {},
"source": [
"This dataset contains a subset of publicly available data here. To limit computation time, only 5% of the highest-rated articles have been retained for this exercise. Once the archive is decompressed, the dataset can be loaded into Julia as follows:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b3025b8a",
"metadata": {},
"outputs": [],
"source": [
"import CSV\n",
"import DataFrames\n",
"\n",
"# Read nodes and edges into data frames\n",
"nodes_dataframe = CSV.read(\"$dirname/names.csv\", DataFrames.DataFrame)\n",
"edges_dataframe = CSV.read(\"$dirname/edges.csv\", DataFrames.DataFrame)\n",
"\n",
"# Convert data to matrices\n",
"nodes = Matrix(nodes_dataframe)\n",
"edges = Matrix(edges_dataframe)\n",
"\n",
"# The data structures should be self-explanatory\n",
"edges_dataframe"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "a5eee91d",
"metadata": {},
"source": [
" - You can use either the simplified version of the algorithm given in [PageRank-vector](#pagerank), or the enhanced version with a damping factor described on Wikipedia.\n",
"\n",
"- Since the dominant eigenvalue is known *a priori*, the following stopping criterion can be used:\n",
" $$\n",
" \\frac{\\lVert\\mathsf M \\widehat{\\mathbf r} - \\widehat{\\mathbf r}\\rVert_1}{\\lVert\\widehat{\\mathbf r}\\rVert_1} < 10^{-10}.\n",
" $$\n",
" where $\\widehat{\\mathbf r}$ is an approximation of the eigenvector corresponding to the dominant eigenvalue.\n",
"\n",
"- All methods for solving [PageRank-vector](#pagerank) require computing matrix-vector products with the matrix $\\mathsf M$. Since this matrix is very large, it cannot be stored as a dense matrix of type `Matrix`. To solve the problem, it is necessary to define a sparse matrix structure. You can, for example, use the COO sparse matrix format, which is the simplest. Another option is to use the CSR format, which allows more efficient matrix-vector product calculations. It will be useful, for solving the eigenvalue problem,\n",
"\n",
" * to define the matrix-vector multiplication method `*(A::my_sparse_matrix, b::Vector{Float64})`.\n",
"\n",
" * to define a function `my_sparse` that constructs a sparse matrix from vectors `rows`, `cols`, and `vals` in COO format. Obviously, if you choose to use the COO format, this function is trivial."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "062e7d05",
"metadata": {},
"outputs": [],
"source": [
"# Necessary to overload `*`\n",
"import Base: *\n",
"\n",
"# Modify if necessary\n",
"struct my_sparse_matrix\n",
" rows::Vector{Int}\n",
" cols::Vector{Int}\n",
" vals::Vector{Float64}\n",
" m::Int\n",
" n::Int\n",
"end\n",
"\n",
"function *(A::my_sparse_matrix, b::Vector{Float64})\n",
" # Fill me\n",
"end\n",
"\n",
"# Constructor from parameters in COO format (modify if necessary)\n",
"function my_sparse(rows, cols, vals, m, n)\n",
" my_sparse_matrix(rows, cols, vals, m, n)\n",
"end\n",
"\n",
"# Test the code\n",
"m, n = 4, 3\n",
"R = [2, 2, 2, 3, 3]\n",
"C = [1, 2, 3, 1, 3]\n",
"V = [5., 6., 7., 8., 9.]\n",
"A = my_sparse(R, C, V, m, n)\n",
"b = [1.; 1.; 1.]\n",
"@assert A*b == [0.; 18.; 17.; 0.] \"Multiplication does not work!\""
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "e76caa60",
"metadata": {},
"source": [
"**Note.** Although it is requested not to use `SparseArrays` in your final code, using this package for testing is encouraged. It can be useful, for example, to implement PageRank first with a sparse matrix constructed with `SparseArrays` before using your own sparse matrix structure. As a reminder, a sparse matrix can be constructed from COO format parameters as follows using `SparseArrays`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bdd7304e",
"metadata": {},
"outputs": [],
"source": [
"import SparseArrays\n",
"A = SparseArrays.sparse(R, C, V, m, n)\n",
"@assert A*b == [0.; 18.; 17.; 0.] \"Multiplication does not work!\""
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "1aa0c435",
"metadata": {},
"source": [
"You can organize the rest of your code in three cells:\n",
"\n",
"- Construct the matrix $\\mathsf M$:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "709cdf94",
"metadata": {},
"outputs": [],
"source": [
"# Fill me"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "dec46ca7",
"metadata": {},
"source": [
"- Solve the PageRank problem (the `sortperm` function may be useful here) and print the names of the top 10 ranked pages:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5a9c8322",
"metadata": {},
"outputs": [],
"source": [
"# Fill me"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "021d3952",
"metadata": {},
"source": [
"- Define and test the `search` function\n",
"(The functions `filter` and `occursin` can be useful here):"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "595b9f95",
"metadata": {},
"outputs": [],
"source": [
"function search(keyword)\n",
" # Fill me\n",
"end\n",
"search(\"Newton\")"
]
}
],
"metadata": {
"jupytext": {
"encoding": "# -*- coding: utf-8 -*-"
},
"kernelspec": {
"display_name": "Julia 1.9.0",
"language": "julia",
"name": "julia-1.9"
},
"language_info": {
"file_extension": ".jl",
"mimetype": "application/julia",
"name": "julia",
"version": "1.9.0"
}
},
"nbformat": 4,
"nbformat_minor": 2
}