{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Introduction to Distributed Computing\n",
    "\n",
    "## Feng Li\n",
    "\n",
    "### Central University of Finance and Economics\n",
    "\n",
    "### [feng.li@cufe.edu.cn](feng.li@cufe.edu.cn)\n",
    "### Course home page: [https://feng.li/distcomp](https://feng.li/distcomp)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Why Distributed Systems\n",
    "\n",
    "- Moore’s law suited us well for the past decades.\n",
    "\n",
    "- But building bigger and bigger single servers (like IBM supercomputer) is no longer necessarily the best solution to large-scale problems in industry.\n",
    "\n",
    "- An alternative that has gained popularity is to tie together many low-end/commodity machines together as a single functional **distributed system**.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Distributed Computing is Everywhere \n",
    "\n",
    "\n",
    "![Search](./figures/eg1.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "![Stocks](./figures/eg2.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "![image](./figures/eg3.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## The performance of distributed systems\n",
    "\n",
    "\n",
    "- A high-end machine with four I/O channels each having a throughput of 100 MB/sec will require three hours to read a 4 TB data set! \n",
    "\n",
    "\n",
    "- With a distributed system, this same data set will be divided into smaller (typically 64 MB) blocks that are spread among many machines in the cluster via the **Distributed File System**."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Google's seminal paper for distributed computing\n",
    "\n",
    "![image](./figures/google-papers.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## The _move-code-to-data_ philosophy\n",
    "\n",
    "- The traditional supercomputer requires repeat transmissions of data between clients and servers. This works fine for computationally intensive work, but for data-intensive processing, the size of data becomes too large to be moved around easily. \n",
    "\n",
    "\n",
    "- A distributed systems focuses on **moving code to data**. \n",
    "\n",
    "- The clients send only the programs to be executed, and these programs are usually small.\n",
    "\n",
    "- More importantly, data are broken up and distributed across the cluster, and as much as possible, computation on a piece of data takes place on the same machine where that piece of data resides.\n",
    "\n",
    "- The whole process is known as **MapReduce**."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## The ecosystem for distributed computing\n",
    "\n",
    "![image](./figures/hadoop_ecosystem.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Google's First Distributed Computer\n",
    "\n",
    "![image](./figures/google-first-computer.jpeg)"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Slideshow",
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.17"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}