{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Simultaneous Iteration for Eigenvalue Computation\n",
    "\n",
    "We want to implement an iterative method to simultaneously calculate the $p \\leq n$ dominant eigenvalues of a matrix $A\\in \\mathbb C^{n\\times n}$, which we assume to be Hermitian.\n",
    "The method relies on the reduced QR decomposition, in the form:\n",
    "\n",
    "$$ QR = M,$$\n",
    "where:\n",
    "- $M\\in\\mathbb C^{n\\times p}$.\n",
    "- $Q\\in \\mathbb C^{n\\times p}$ and its columns form an orthonormal set in $\\mathbb C^n$ with respect to the usual inner product.\n",
    "- $R\\in \\mathbb C^{p\\times p}$ is an upper triangular matrix with real and positive diagonal entries.\n",
    "\n",
    "#### Question 1\n",
    "Provide the reduced QR decomposition of $M$ when $p=1$.\n",
    "\n",
    "#### Question 2\n",
    "By writing\n",
    "$$ Q = \\begin{pmatrix} \\widetilde Q &\\mathbf{q}\\end{pmatrix},\\qquad R = \\begin{pmatrix} \\widetilde R & \\mathbf{r} \\\\ 0 & \\alpha\\end{pmatrix}, \\qquad M = \\begin{pmatrix} \\widetilde{M} & \\mathbf m\\end{pmatrix}$$\n",
    "with $\\widetilde Q,\\widetilde M \\in \\mathbb C^{n\\times (p-1)}$, $\\widetilde R\\in\\mathbb C^{(p-1)\\times (p-1)}$, $\\mathbf r \\in \\mathbb C^{p-1}$, $\\alpha\\in \\R_+$, and $\\mathbf m\\in \\mathbb C^n$, provide the QR decomposition of $M$ given the decomposition $\\widetilde Q \\widetilde R = \\widetilde M$ of the matrix formed by the first $(p-1)$ columns of $M$. (Use the properties of the QR decomposition).\n",
    "\n",
    "\n",
    "#### Question 3\n",
    "Implement a method to calculate the reduced QR decomposition."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "07cea03a",
   "metadata": {},
   "outputs": [],
   "source": [
    "using LinearAlgebra\n",
    "\n",
    "@views function myQR(M)\n",
    "    n,p = size(M)\n",
    "\n",
    "    @assert p <= n \"Error: p > n\"\n",
    "    ## Implement the decomposition here\n",
    "    Q = zero(M)\n",
    "    R = zeros(eltype(M),p,p)\n",
    "\n",
    "    return Q,R\n",
    "end"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "M = randn(ComplexF64,100,50)\n",
    "M = Complex{BigFloat}.(M)\n",
    "Q,R = myQR(M)\n",
    "\n",
    "@show norm(Q'Q-I) # Check that the columns of Q are orthogonal\n",
    "@show all(isreal.(diag(R))) # Check that the diagonal of R is real\n",
    "@show all(real.(diag(R)) .>= 0) # and positive\n",
    "@show norm(Q*R-M) # Check that QR = M up to roundoff error\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "@time qr(M) # Implementation of LinearAlgebra\n",
    "@time myQR(M) # Our implementation"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Question 4\n",
    "The simultaneous iteration method is defined, given an initial condition $X_0\\in\\mathbb C^{n\\times p}$, by the recurrence relation:\n",
    "$$ X_{n+1} R_{n+1} = A X_n,$$\n",
    "where $X_{n+1} R_{n+1}$ is the reduced QR decomposition of $AX_n \\in \\mathbb C^{n\\times p}$. It can be shown that for a Hermitian matrix $A$ and appropriate initial conditions $X_0$, the matrix $X_n$ essentially converges column by column to the matrix formed by the $p$ associated eigenvectors (see lecture notes for a proof).\n",
    "\n",
    "Implement a function `myEigen` in Julia that takes as arguments: a matrix `M` (assumed Hermitian), an integer `nev` (the number of eigenvalues to compute), an integer `niter` (the number of iterations), and an initial matrix `X₀`. The function should return a vector of `nev` eigenvalues and a matrix formed by an approximation of the dominant `nev` eigenvectors of `M` in columns.\n",
    "\n",
    "For matrices with entries that are not `BigFloat`, you can compare the results with those given by `LinearAlgebra.eigen` for debugging purposes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "n,nev = 10,3\n",
    "\n",
    "## Seed for the random number generator\n",
    "using Random\n",
    "Random.seed!(2023)\n",
    "##\n",
    "\n",
    "X = randn(n,n)/sqrt(n)\n",
    "X₀ = randn(n,nev)/sqrt(n)\n",
    "\n",
    "\n",
    "function myEigen(M,nev,niter,X₀)\n",
    "    ## Implement the iteration here\n",
    "    return λs,us\n",
    "end\n",
    "\n",
    "M = BigFloat.(I + X'X)\n",
    "\n",
    "@time λs,us=myEigen(M,nev,500,X₀)\n",
    "\n",
    "display(λs)\n",
    "display(us)\n",
    "display(M*us - λs' .*us)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Question 5\n",
    "\n",
    "Study numerically the convergence speed of the first 3 eigenvectors towards their limit as a function of the number of iterations of the method. It can be assumed that all eigenvectors have converged after a large number (let's say 10000) of iterations to obtain reference values. The difference between two vectors $u$ and $v$ will be measured in terms of the angle formed by the generated subspaces:\n",
    "$$ \\theta_{u,v} = \\arccos\\left(\\frac{|u^\\intercal v|}{\\|u\\|\\|v\\|}\\right).$$\n",
    "Compare the error at the $k$-th iteration and the quantities $(\\lambda_i/\\lambda_{i+1})^{-k},\\, i=1,2,3$, on a graph with a logarithmic $y$-scale.\n",
    "\n",
    "To ensure numerically that $0\\leq \\frac{|u^\\intercal v|}{\\|u\\|\\|v\\|}\\leq 1$, the `clamp` function can be used."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "@time λs,us=myEigen(M,n,10000,randn(n,n)) # on calcule les valeurs et vecteurs propres \"convergés\"\n",
    "display(M*us - λs' .*us)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "nev=3\n",
    "\n",
    "iters = 1:200\n",
    "errs = [Float64[] for i=1:nev]\n",
    "\n",
    "for niter in iters\n",
    "    λhats,uhats = myEigen(M,nev,niter,X₀)\n",
    "    ## Calculate errors here\n",
    "end\n",
    "\n",
    "using Plots,LaTeXStrings\n",
    "colors = (:red,:blue,:green)\n",
    "pl = plot(xlabel=\"Itération k\", ylabel=\"Erreur essentielle\",yaxis=:log,legend=:bottomleft)\n",
    "\n",
    "for i=1:nev\n",
    "    plot!(pl,iters,errs[i],label=L\"θ_{%$i}\",color=colors[i])\n",
    "    plot!(pl,k->(λs[i]/λs[i+1])^-k,label=L\"(λ_{%$i}/λ_{%$(i+1)})^{-k}\",linestyle=:dot,color=colors[i])\n",
    "end\n",
    "\n",
    "pl"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "0f7dc8eb",
   "metadata": {},
   "source": [
    "# PageRank Algorithm"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "2530e31e",
   "metadata": {},
   "source": [
    "PageRank is an algorithm used to assign a score to the vertices of a directed graph. It is employed by numerous search engines to rank search results. In this context, the directed graph encodes the links between pages on the World Wide Web: the vertices of the directed graph represent web pages, and the edges represent connections between pages—there is an edge from page $i$ to page $j$ if page $i$ contains a hyperlink to page $j$.\n",
    "\n",
    "Consider a directed graph $G(V, E)$ with vertices $V = \\{1, \\dotsc, n\\}$ and edges $E$. The graph can be represented by its adjacency matrix $\\mathsf A \\in \\{0, 1\\}^{n \\times n}$, where the entries are given by\n",
    "$$\n",
    "a_{ij} =\n",
    "\\begin{cases}\n",
    "    1 & \\text{if there is an edge from $i$ to $j$,} \\\\\n",
    "    0 & \\text{otherwise.}\n",
    "\\end{cases}\n",
    "$$\n",
    "The idea of the PageRank algorithm, in its simplest form, is to assign scores $r_i$ to the vertices by solving the following system of equations:\n",
    "$$ \\tag{PageRank}\n",
    "    \\forall i \\in V, \\qquad\n",
    "    r_i\n",
    "    = \\sum_{j \\in \\mathcal N(i)} \\frac{r_j}{o_j}.\n",
    "$$\n",
    "where $o_j$ is the out-degree of vertex $j$, i.e., the number of edges originating from $j$. Here, the sum applies to the set of nodes in $\\mathcal N(i)$, representing the set of incoming neighbors of vertex $i$, i.e., those with an edge pointing towards vertex $i$.\n",
    "\n",
    "Let $\\mathbf r = \\begin{pmatrix} r_1 & \\dots & r_n \\end{pmatrix}^T$. It can be shown that solving the system <a href=\"#pagerank\">(PageRank)</a> is equivalent to solving the following problem:\n",
    "$$ \\tag{PageRank-vector}\n",
    "    \\mathbf r =\n",
    "    \\mathsf A^T\n",
    "    \\begin{pmatrix}\n",
    "        \\frac{1}{o_1} & &  \\\\\n",
    "                      & \\ddots & \\\\\n",
    "                      & & \\frac{1}{o_n}\n",
    "    \\end{pmatrix}\n",
    "    \\mathbf r =:  \\mathsf A^T \\mathsf O^{-1} \\mathbf r.\n",
    "$$\n",
    "In other words, the problem is to find an eigenvector with eigenvalue 1 of the matrix $\\mathsf M = \\mathsf A^T \\mathsf O^{-1}$. Note that at this stage, we have not proven the existence or uniqueness of a solution to this equation. The question of the uniqueness of a solution is related to the graph's connectivity and will not be addressed here. However, we will demonstrate that a solution to the problem exists.\n",
    "\n",
    "**Remark.** The matrix $\\mathsf O^{-1} \\mathsf A$ is the transition matrix of a random walk on the directed graph, where at each step, a move is made towards an outgoing neighbor with equal probability for each of them. Solving <a href=\"#pagerank\">(PageRank-vector)</a> is equivalent to finding a stationary distribution of this random walk.\n",
    "\n",
    "**Questions:**\n",
    "\n",
    "1. Notice that $\\mathsf M$ is a left stochastic matrix, meaning that the sum of the elements in each column is equal to 1.\n",
    "\n",
    "2. Prove that the eigenvalues of any matrix $\\mathsf B \\in \\mathbb R^{n \\times n}$ coincide with those of $\\mathsf B^T$. You can use the fact that $\\det(\\mathsf B) = \\det(\\mathsf B^T)$.\n",
    "\n",
    "3. Using the previous points, show that 1 is an eigenvalue, and $\\rho(\\mathsf M) = 1$. For the second part, find a subordinate matrix norm such that $\\lVert\\mathsf M\\rVert= 1$. This demonstrates the existence of a solution to <a href=\"#pagerank\">(PageRank-vector)</a> and also proves that 1 is the dominant eigenvalue of $\\mathsf M$.\n",
    "\n",
    "4. Implement the PageRank algorithm without using the `SparseArrays` package to rank the pages of the English-language Wikipedia as it was in 2013 (see instructions below for downloading the dataset). After assigning a score to all pages, print the top 10 ranked pages, which should be as follows:\n",
    "   ```\n",
    "   United States, United Kingdom, World War II, Latin, France, Germany, English language, China, Canada, India\n",
    "   ```\n",
    "\n",
    "5. Write a function `search(keyword)` to perform a search in the database. Here is an example of what this function could return:\n",
    "   ```\n",
    "   julia> search(\"Newton\")\n",
    "   47-element Vector{String}:\n",
    "    \"Isaac Newton\"\n",
    "    \"Newton (unit)\"\n",
    "    \"Newton's laws of motion\"\n",
    "    …\n",
    "   ```"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "f870c2a9",
   "metadata": {},
   "source": [
    "**Additional indications**:\n",
    "\n",
    "- Use the following code to download the data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d9602dc0",
   "metadata": {},
   "outputs": [],
   "source": [
    "using LinearAlgebra\n",
    "import Downloads\n",
    "import Tar\n",
    "\n",
    "# URL where data can be downloaded\n",
    "url = \"https://urbain.vaes.uk/static/wikidata.tar\"\n",
    "\n",
    "# Download the data\n",
    "filename = \"wikidata.tar\"\n",
    "isfile(filename) || Downloads.download(url, filename)\n",
    "\n",
    "# Extract data into directory `wikidata`\n",
    "dirname = \"wikidata\"\n",
    "isdir(dirname) || Tar.extract(filename, dirname)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "f6124b73",
   "metadata": {},
   "source": [
    "This dataset contains a subset of publicly available data <a href=\"https://snap.stanford.edu/data/enwiki-2013.html\">here</a>. To limit computation time, only 5% of the highest-rated articles have been retained for this exercise. Once the archive is decompressed, the dataset can be loaded into Julia as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b3025b8a",
   "metadata": {},
   "outputs": [],
   "source": [
    "import CSV\n",
    "import DataFrames\n",
    "\n",
    "# Read nodes and edges into data frames\n",
    "nodes_dataframe = CSV.read(\"$dirname/names.csv\", DataFrames.DataFrame)\n",
    "edges_dataframe = CSV.read(\"$dirname/edges.csv\", DataFrames.DataFrame)\n",
    "\n",
    "# Convert data to matrices\n",
    "nodes = Matrix(nodes_dataframe)\n",
    "edges = Matrix(edges_dataframe)\n",
    "\n",
    "# The data structures should be self-explanatory\n",
    "edges_dataframe"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "a5eee91d",
   "metadata": {},
   "source": [
    " - You can use either the simplified version of the algorithm given in [PageRank-vector](#pagerank), or the enhanced version with a damping factor described on Wikipedia.\n",
    "\n",
    "- Since the dominant eigenvalue is known *a priori*, the following stopping criterion can be used:\n",
    "  $$\n",
    "      \\frac{\\lVert\\mathsf M \\widehat{\\mathbf r} - \\widehat{\\mathbf r}\\rVert_1}{\\lVert\\widehat{\\mathbf r}\\rVert_1} < 10^{-10}.\n",
    "  $$\n",
    "  where $\\widehat{\\mathbf r}$ is an approximation of the eigenvector corresponding to the dominant eigenvalue.\n",
    "\n",
    "- All methods for solving [PageRank-vector](#pagerank) require computing matrix-vector products with the matrix $\\mathsf M$. Since this matrix is very large, it cannot be stored as a dense matrix of type `Matrix`. To solve the problem, it is necessary to define a sparse matrix structure. You can, for example, use the COO sparse matrix format, which is the simplest. Another option is to use the CSR format, which allows more efficient matrix-vector product calculations. It will be useful, for solving the eigenvalue problem,\n",
    "\n",
    "  * to define the matrix-vector multiplication method `*(A::my_sparse_matrix, b::Vector{Float64})`.\n",
    "\n",
    "  * to define a function `my_sparse` that constructs a sparse matrix from vectors `rows`, `cols`, and `vals` in COO format. Obviously, if you choose to use the COO format, this function is trivial."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "062e7d05",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Necessary to overload `*`\n",
    "import Base: *\n",
    "\n",
    "# Modify if necessary\n",
    "struct my_sparse_matrix\n",
    "    rows::Vector{Int}\n",
    "    cols::Vector{Int}\n",
    "    vals::Vector{Float64}\n",
    "    m::Int\n",
    "    n::Int\n",
    "end\n",
    "\n",
    "function *(A::my_sparse_matrix, b::Vector{Float64})\n",
    "    # Fill me\n",
    "end\n",
    "\n",
    "# Constructor from parameters in COO format (modify if necessary)\n",
    "function my_sparse(rows, cols, vals, m, n)\n",
    "    my_sparse_matrix(rows, cols, vals, m, n)\n",
    "end\n",
    "\n",
    "# Test the code\n",
    "m, n = 4, 3\n",
    "R = [2, 2, 2, 3, 3]\n",
    "C = [1, 2, 3, 1, 3]\n",
    "V = [5., 6., 7., 8., 9.]\n",
    "A = my_sparse(R, C, V, m, n)\n",
    "b = [1.; 1.; 1.]\n",
    "@assert A*b == [0.; 18.; 17.; 0.] \"Multiplication does not work!\""
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "e76caa60",
   "metadata": {},
   "source": [
    "**Note.** Although it is requested not to use `SparseArrays` in your final code, using this package for testing is encouraged. It can be useful, for example, to implement PageRank first with a sparse matrix constructed with `SparseArrays` before using your own sparse matrix structure. As a reminder, a sparse matrix can be constructed from COO format parameters as follows using `SparseArrays`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bdd7304e",
   "metadata": {},
   "outputs": [],
   "source": [
    "import SparseArrays\n",
    "A = SparseArrays.sparse(R, C, V, m, n)\n",
    "@assert A*b == [0.; 18.; 17.; 0.] \"Multiplication does not work!\""
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "1aa0c435",
   "metadata": {},
   "source": [
    "You can organize the rest of your code in three cells:\n",
    "\n",
    "- Construct the matrix $\\mathsf M$:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "709cdf94",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Fill me"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "dec46ca7",
   "metadata": {},
   "source": [
    "- Solve the PageRank problem (the `sortperm` function may be useful here) and print the names of the top 10 ranked pages:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5a9c8322",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Fill me"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "021d3952",
   "metadata": {},
   "source": [
    "- Define and test the `search` function\n",
    "(The functions `filter` and `occursin` can be useful here):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "595b9f95",
   "metadata": {},
   "outputs": [],
   "source": [
    "function search(keyword)\n",
    "    # Fill me\n",
    "end\n",
    "search(\"Newton\")"
   ]
  }
 ],
 "metadata": {
  "jupytext": {
   "encoding": "# -*- coding: utf-8 -*-"
  },
  "kernelspec": {
   "display_name": "Julia 1.9.0",
   "language": "julia",
   "name": "julia-1.9"
  },
  "language_info": {
   "file_extension": ".jl",
   "mimetype": "application/julia",
   "name": "julia",
   "version": "1.9.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}