{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Fundamental Modules for Statistical Modelling\n", "\n", "Feng Li\n", "\n", "School of Statistics and Mathematics\n", "\n", "Central University of Finance and Economics\n", "\n", "[feng.li@cufe.edu.cn](mailto:feng.li@cufe.edu.cn)\n", "\n", "[https://feng.li/python](https://feng.li/python)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Install different version of Python modules \n", "\n", "\n", "- What if you want to install an application and leave it be? If an application works, any change in its libraries or the versions of those libraries can break the application. Also, what if you can’t install packages into the global site-packages directory, due to not having permissions to change the host python environment?\n", "\n", "- Sometimes it is useful to have different versions of Python modules installed at different places. A quick solution is to install Python packages at current working directory." ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Looking in indexes: https://mirrors.163.com/pypi/simple/\n", "Collecting pandas==1.2.1\n", " Using cached https://mirrors.163.com/pypi/packages/c9/56/f415b4148622f469263ad2ece8bdf757972e94ffc97cb750dd8b79b04d43/pandas-1.2.1-cp39-cp39-manylinux1_x86_64.whl (9.7 MB)\n", "Collecting pytz>=2017.3\n", " Using cached https://mirrors.163.com/pypi/packages/70/94/784178ca5dd892a98f113cdd923372024dc04b8d40abe77ca76b5fb90ca6/pytz-2021.1-py2.py3-none-any.whl (510 kB)\n", "Collecting numpy>=1.16.5\n", " Using cached https://mirrors.163.com/pypi/packages/7a/4c/dd00ce768b0f0f7de5c486cbd9f5b922bc3af2f3a5da30121d7f7dc03130/numpy-1.21.2-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (15.8 MB)\n", "Collecting python-dateutil>=2.7.3\n", " Using cached https://mirrors.163.com/pypi/packages/d4/70/d60450c3dd48ef87586924207ae8907090de0b306af2bce5d134d78615cb/python_dateutil-2.8.1-py2.py3-none-any.whl (227 kB)\n", "Collecting six>=1.5\n", " Using cached https://mirrors.163.com/pypi/packages/ee/ff/48bde5c0f013094d729fe4b0316ba2a24774b3ff1c52d924a8a4cb04078a/six-1.15.0-py2.py3-none-any.whl (10 kB)\n", "Installing collected packages: six, pytz, python-dateutil, numpy, pandas\n", "Successfully installed numpy-1.21.2 pandas-1.2.1 python-dateutil-2.8.1 pytz-2021.1 six-1.15.0\n" ] } ], "source": [ "! pip3 install pandas==1.2.1 -I -t ." ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "'1.2.1'" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas\n", "pandas.__version__" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- In all these cases, [`virtualenv`](https://virtualenv.pypa.io/en/latest/) can also help you. It creates an environment that has its own installation directories, that doesn’t share libraries with other virtualenv environments (and optionally doesn’t access the globally installed libraries either)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Python modules for Statistics\n", "\n", "## NumPy\n", "\n", "`NumPy` is short for Numerical Python, is the foundational package for scientific computing in Python. It contains among other things:\n", "\n", "- a powerful N-dimensional array object\n", "- sophisticated (broadcasting) functions\n", "- tools for integrating C/C++ and Fortran code\n", "- useful linear algebra, Fourier transform, and random number capabilities\n", "\n", "Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases. \n", "\n", "- [NumPy Reference](https://numpy.org/doc/stable/reference/)\n", "- [NumPy User Guide](https://numpy.org/doc/stable/user/index.html)\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## SciPy \n", "\n", "`SciPy` is a collection of packages addressing a number of different standard problem domains in scientific computing. Here is a sampling of the packages included:\n", "\n", "- `scipy.integrate` : numerical integration routines and differential equation solvers.\n", "- `scipy.linalg` : linear algebra routines and matrix decompositions extending beyond those provided in `numpy.linalg`.\n", "- `scipy.optimize` : function optimizers (minimizers) and root finding algorithms." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "- `scipy.signal` : signal processing tools.\n", "- `scipy.sparse` : sparse matrices and sparse linear system solvers.\n", "- `scipy.special` : wrapper around SPECFUN, a Fortran library implementing many common mathematical functions, such as the gamma function.\n", "- `scipy.stats` : standard continuous and discrete probability distributions (density functions, samplers, continuous distribution functions), various statistical tests, and more descriptive statistics.\n", "- ... \n", "\n", "[SciPy Reference Guide](https://docs.scipy.org/doc/scipy/)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Linear Algebra\n", "\n", "Linear algebra can be done conveniently via `scipy.linalg`. When SciPy is built using the optimized ATLAS LAPACK and BLAS libraries, it has very fast linear algebra capabilities. If you dig deep enough, all of the raw lapack and blas libraries are available for your use for even more speed. In this section, some easier-to-use interfaces to these routines are described.\n", "\n", "All of these linear algebra routines expect an object that can be converted into a 2-dimensional array. The output of these routines is also a two-dimensional array." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Matrices and n-dimensional array" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "array([[1, 2],\n", " [3, 4]])" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import numpy as np\n", "from scipy import linalg\n", "A = np.array([[1,2],[3,4]])\n", "A" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "array([[-2. , 1. ],\n", " [ 1.5, -0.5]])" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "linalg.inv(A) # inverse of a matrix" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "array([[5, 6]])" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b = np.array([[5,6]]) #2D array\n", "b" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "array([[5],\n", " [6]])" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b.T" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "array([[ 5, 12],\n", " [15, 24]])" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A*b #not matrix multiplication!" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "array([[17],\n", " [39]])" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A.dot(b.T) #matrix multiplication" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "array([5, 6])" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b = np.array([5,6]) #1D array\n", "b" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "array([5, 6])" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b.T #not matrix transpose!" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "array([17, 39])" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A.dot(b) #does not matter for multiplication" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "- Numpy array can easily convet to Pandas, and vise versa." ] }, { "cell_type": "code", "execution_count": 58, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", " | 0 | \n", "1 | \n", "2 | \n", "
---|---|---|---|
0 | \n", "1 | \n", "2 | \n", "3 | \n", "
1 | \n", "4 | \n", "5 | \n", "6 | \n", "