{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Introduction to Distributed Computing\n", "\n", "## Feng Li\n", "\n", "### Central University of Finance and Economics\n", "\n", "### [feng.li@cufe.edu.cn](feng.li@cufe.edu.cn)\n", "### Course home page: [https://feng.li/distcomp](https://feng.li/distcomp)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Why Distributed Systems\n", "\n", "- Moore’s law suited us well for the past decades.\n", "\n", "- But building bigger and bigger single servers (like IBM supercomputer) is no longer necessarily the best solution to large-scale problems in industry.\n", "\n", "- An alternative that has gained popularity is to tie together many low-end/commodity machines together as a single functional **distributed system**.\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Distributed Computing is Everywhere \n", "\n", "\n", "![Search](./figures/eg1.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "![Stocks](./figures/eg2.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "![image](./figures/eg3.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## The performance of distributed systems\n", "\n", "\n", "- A high-end machine with four I/O channels each having a throughput of 100 MB/sec will require three hours to read a 4 TB data set! \n", "\n", "\n", "- With a distributed system, this same data set will be divided into smaller (typically 64 MB) blocks that are spread among many machines in the cluster via the **Distributed File System**." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Google's seminal paper for distributed computing\n", "\n", "![image](./figures/google-papers.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## The _move-code-to-data_ philosophy\n", "\n", "- The traditional supercomputer requires repeat transmissions of data between clients and servers. This works fine for computationally intensive work, but for data-intensive processing, the size of data becomes too large to be moved around easily. \n", "\n", "\n", "- A distributed systems focuses on **moving code to data**. \n", "\n", "- The clients send only the programs to be executed, and these programs are usually small.\n", "\n", "- More importantly, data are broken up and distributed across the cluster, and as much as possible, computation on a piece of data takes place on the same machine where that piece of data resides.\n", "\n", "- The whole process is known as **MapReduce**." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## The ecosystem for distributed computing\n", "\n", "![image](./figures/hadoop_ecosystem.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Google's First Distributed Computer\n", "\n", "![image](./figures/google-first-computer.jpeg)" ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.17" } }, "nbformat": 4, "nbformat_minor": 4 }