{ "cells": [ { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "# Python modules for Statistics (Python统计模块)\n", "\n", "## NumPy\n", "\n", "`NumPy` is short for Numerical Python, is the foundational package for scientific computing in Python. It contains among other things:\n", "\n", "- a powerful N-dimensional array object\n", "- sophisticated (broadcasting) functions\n", "- tools for integrating C/C++ and Fortran code\n", "- useful linear algebra, Fourier transform, and random number capabilities\n", "\n", "Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases. \n", "\n", "- [NumPy Reference](http://docs.scipy.org/doc/numpy/reference/)\n", "- [NumPy User Guide](http://docs.scipy.org/doc/numpy/user/index.html)\n", "\n", "## SciPy \n", "\n", "`SciPy` is a collection of packages addressing a number of different standard problem domains in scientific computing. Here is a sampling of the packages included:\n", "\n", "- `scipy.integrate` : numerical integration routines and differential equation solvers.\n", "- `scipy.linalg` : linear algebra routines and matrix decompositions extending beyond those provided in `numpy.linalg`.\n", "- `scipy.optimize` : function optimizers (minimizers) and root finding algorithms.\n", "- `scipy.signal` : signal processing tools.\n", "- `scipy.sparse` : sparse matrices and sparse linear system solvers.\n", "- `scipy.special` : wrapper around SPECFUN, a Fortran library implementing many common mathematical functions, such as the gamma function.\n", "- `scipy.stats` : standard continuous and discrete probability distributions (density functions, samplers, continuous distribution functions), various statistical tests, and more descriptive statistics.\n", "- `scipy.weave` : tool for using inline C++ code to accelerate array computations.\n", "\n", "- `scipy.cluster` :\tClustering algorithms\n", "- `scipy.fftpack` : Fast Fourier Transform routines\n", "- `scipy.integrate` : Integration and ordinary differential equation solvers\n", "- `scipy.interpolate` : Interpolation and smoothing splines\n", "- `scipy.ndimage` : N-dimensional image processing\n", "optimize \tOptimization and root-finding routines\n", "- `scipy.spatial` : Spatial data structures and algorithms\n", "\n", "[SciPy Reference Guide](http://docs.scipy.org/doc/scipy/reference/)\n", "\n", "## pandas\n", "\n", "`pandas` provides rich data structures and functions designed to make working with structured data fast, easy, and expressive. It is, as you will see, one of the critical in-gredients enabling Python to be a powerful and productive data analysis environment. pandas combines the high performance array-computing features of `NumPy` with the flexible data manipulation capabilities of spreadsheets and relational databases (such as SQL). It provides sophisticated indexing functionality to make it easy to reshape, slice and dice, perform aggregations, and select subsets of data.\n", "\n", "pandas consists of the following things\n", "\n", "- A set of labeled array data structures, the primary of which are Series and DataFrame\n", "- Index objects enabling both simple axis indexing and multi-level / hierarchical axis indexing\n", "- An integrated group by engine for aggregating and transforming data sets\n", "- Date range generation (date_range) and custom date offsets enabling the implementation of customized frequencies\n", "- Input/Output tools: loading tabular data from flat files (CSV, delimited, Excel 2003), and saving and loading pandas objects from the fast and efficient PyTables/HDF5 format.\n", "- Memory-efficient “sparse” versions of the standard data structures for storing data that is mostly missing or mostly constant (some fixed value)\n", "- Moving window statistics (rolling mean, rolling standard deviation, etc.)\n", "- Static and moving window linear and panel regression\n", "\n", "[pandas Documentation](http://pandas.pydata.org/pandas-docs/version/0.17.0/)\n", "\n", "\n", "## matplotlib\n", "`matplotlib` is the most popular Python library for producing plots and other 2D data visualizations. It was originally created by John D. Hunter (JDH) and is now maintained by a large team of developers. It is well-suited for creating plots suitable for publication. It integrates well with IPython, thus providing a comfortable interactive environment for plotting and exploring data. The plots are also interactive; you can zoom in on a section of the plot and pan around the plot using the toolbar in the plot window.\n", "\n", "\n", "- [matplotlib User Guide](http://matplotlib.org/1.4.3/users/index.html)\n", "- [matplotlib Gallery](http://matplotlib.org/1.4.3/gallery.html)\n" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "# Login to and send data to Linux server and vice versa (访问远程Linux主机)\n", "\n", "Assume you have a Linux server you can login with host address `11.22.33.44`, ssh port `22` (`22` is the default port), user name `myusername` and password as `mysecret`\n", "\n", "## If your local computer is Windows (如果接入计算机是Windows)\n", "\n", "- To login to a Linux server in Windows, download [Putty](http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html) and install it on your Windows machine. Follow the software's instructions to login.\n", "\n", "- To send and receive files, download [FileZilla](https://filezilla-project.org/) and install it in you Windows machine. FileZilla works in Linux and Mac as well.\n", "\n", "- Start FileZilla and type `11.22.33.44` in **Host**, `myusername` in **Username**, `mysecret` in **Password**, and `22` in **Port**. Now click **Quickconnect** to login to the server. \n", "\n", "- Then you can send and receive data by draging and dropping files from and to you server's home folder.\n", "\n", "## If your local computer is Mac or Linux (如果接入计算机是苹果或者Linux)\n", "\n", "### Login to server from your Mac or Linux (从苹果电脑登陆到Linux服务器)\n", "\n", "- Start a terminal on your local computer and open the file `~/.ssh/config` (create it if not exist) \n", "```\n", " emacs ~/.ssh/config\n", "```\n", "\n", "- Copy the following information to the file and save the file.\n", "```\n", " Host myserver1\n", " Hostname 11.22.33.44\n", " Port 22\n", " User myusername\n", "```\n", "\n", "- Now in your local computer's terminal, you can login to your server directly (answer `yes` to any prompt during your first login).\n", "```\n", " ssh myserver1\n", "```\n", "\n", "### Send data to Linux server and vice versa from your Mac or Linux (用你的苹果或者Lunux电脑传送和接收数据)\n", "\n", "- If you use Linux, check whether you have `rsync` installed on your local computer with `rsync --version` in a terminal. If that does not exist, install it with `sudo apt-get install rsync`. Mac has rsync installed by default. \n", "\n", "- If you have a file called `stocks.csv` in your local computer's folder `~/Desktop/`, To send it to your linux server's folder `~/myproject/`, launch a terminal on your local computer, and type\n", "```\n", " rsync -av ~/Desktop/stocks myserver1:myproject/\n", "```\n", "\n", "- If you have a file called `stocks.csv` in your server's folder `~/myproject/`, To send it to your local computer's folder `~/Desktop/`, launch a terminal on your local computer, and type\n", "```\n", " rsync -av myserver1:myproject/stocks.csv ~/Desktop\n", "```\n", "\n", "- Type `man rsync` to see the complete manual of `rsync`." ] }, { "cell_type": "markdown", "metadata": { "collapsed": true, "deletable": true, "editable": true }, "source": [ "# Installing Python modules (安装Python模块)\n", "\n", "\n", "A lot of well-known packages are available in your Linux distribution. If you want to install say e.g. `numpy` in Python 3, launch a terminal and type in Debian/Ubuntu\n", "\n", "```\n", " sudo apt-get install python3-numpy\n", "```\n", "\n", "To install packages from PyPI (the Python Package Index), Please consult the [Python Packaging User Guide](https://python-packaging-user-guide.readthedocs.org/en/latest/installing/).\n" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "# Working with data (Pyhton数据操作)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "## Read and write data in Python with `stdin` and `stdout` (利用标准输入数处读写数据)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false, "deletable": true, "editable": true, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0\n", "[]\n" ] } ], "source": [ "#! /usr/bin/env python3\n", "# line_count.py\n", "import sys\n", "count = 0\n", "data = []\n", "for line in sys.stdin:\n", " count += 1\n", " data.append(line) \n", "print(count) # print goes to sys.stdout\n", "print(data)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Then launch a terminal and first make your Python script executable. Then send you `testFile` to your Python script\n", "\n", " chmod +x line_count.py\n", " cat L3-Python-for-Statistical-Modeling.html | line_count.py\n" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "## Read from and write to files directly (直接读解数据)\n", "\n", "You can also explicitly read from and write to files directly in your code. Python makes working with files pretty simple." ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "- The first step to working with a text file is to obtain a file object using `open()`\n", "\n", " 'r' means read-only\n", " \n", " file_for_reading = open('reading_file.txt', 'r')\n", " \n", " 'w' is write -- will destroy the file if it already exists!\n", " \n", " file_for_writing = open('writing_file.txt', 'w')\n", " \n", " 'a' is append -- for adding to the end of the file\n", "\n", " file_for_appending = open('appending_file.txt', 'a')\n", " \n", "- The second step is do something with the file.\n", "- Don't forget to close your files when you're done.\n", " \n", " file_for_writing.close()\n", " \n", "**Note** Because it is easy to forget to close your files, you should always use them in a **with** block, at the end of which they will be closed automatically:\n", "\n", " with open(filename,'r') as f:\n", " data = function_that_gets_data_from(f)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1\n" ] } ], "source": [ "#! /usr/bin/env python3\n", "# hash_check.py\n", "import re\n", "starts_with_hash = 0\n", "\n", "# look at each line in the file use a regex to see if it starts with '#' if it does, add 1\n", "# to the count.\n", "\n", "with open('line_count.py','r') as file:\n", " for line in file:\n", " if re.match(\"^#\",line):\n", " starts_with_hash += 1\n", "print(starts_with_hash)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "## Read a CSV file (读取CSV文件)\n", "\n", "If your file has no headers (which means you probably want each row as a list , and which places the burden on you to know what's in each column), you can use `csv.reader()` in `csv` module to iterate over the rows, each of which will be an appropriately split list.\n", "\n", "If your file has headers, you can either skip the header row (with an initial call to `reader.next()`) or get each row as a `dict` (with the headers as keys) by using `csv.DictReader()` in `module`:\n", "\n", "symbol\tdate\tclosing_price\n", "AAPL\t2015-01-23\t112.98\n", "AAPL\t2015-01-22\t112.4\n", "AAPL\t2015-01-21\t109.55\n", "AAPL\t2015-01-20\t108.72\n", "AAPL\t2015-01-16\t105.99\n", "AAPL\t2015-01-15\t106.82\n", "AAPL\t2015-01-14\t109.8\n", "AAPL\t2015-01-13\t110.22\n", "AAPL\t2015-01-12\t109.25\n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true, "deletable": true, "editable": true }, "outputs": [], "source": [ "#! /usr/bin/env python3\n", "\n", "import csv\n", "\n", "data = {'date':[], 'symbol':[], 'closing_price' : []}\n", "with open('stocks.csv', 'r') as f:\n", " reader = csv.DictReader(f, delimiter='\\t')\n", " for row in reader:\n", " data['date'].append(row[\"date\"])\n", " data['symbol'].append(row[\"symbol\"])\n", " data['closing_price'].append(float(row[\"closing_price\"]))" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false, "deletable": true, "editable": true, "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "dict_keys(['symbol', 'date', 'closing_price'])" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.keys()" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Alternatively, `pandas` provides `read_csv()` function to read csv files " ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "16556\n", "\n" ] } ], "source": [ "#! /usr/bin/env python3\n", "\n", "import pandas\n", "\n", "data2 = pandas.read_csv('stocks.csv', delimiter='\\t',header=None)\n", "print(len(data2))\n", "print(type(data2))" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "The pandas I/O API is a set of top level `reader` functions accessed like `read_csv()` that generally return a pandas object. These functions includes\n", "\n", " read_excel\n", " read_hdf\n", " read_sql\n", " read_json\n", " read_msgpack (experimental)\n", " read_html\n", " read_gbq (experimental)\n", " read_stata\n", " read_sas\n", " read_clipboard\n", " read_pickle\n", " \n", "See [pandas IO tools](http://pandas.pydata.org/pandas-docs/stable/io.html) for detailed explanation." ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "# Linear Algebra (线性代数)\n", "\n", "Linear algebra can be done conveniently via `scipy.linalg`. When SciPy is built using the optimized ATLAS LAPACK and BLAS libraries, it has very fast linear algebra capabilities. If you dig deep enough, all of the raw lapack and blas libraries are available for your use for even more speed. In this section, some easier-to-use interfaces to these routines are described.\n", "\n", "All of these linear algebra routines expect an object that can be converted into a 2-dimensional array. The output of these routines is also a two-dimensional array." ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "## Matrices and n-dimensional array (矩阵和多维数组)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "data": { "text/plain": [ "array([[1, 2],\n", " [3, 4]])" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import numpy as np\n", "from scipy import linalg\n", "A = np.array([[1,2],[3,4]])\n", "A" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "data": { "text/plain": [ "array([[-2. , 1. ],\n", " [ 1.5, -0.5]])" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "linalg.inv(A) # inverse of a matrix" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "data": { "text/plain": [ "array([[5, 6]])" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b = np.array([[5,6]]) #2D array\n", "b" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "data": { "text/plain": [ "array([[5],\n", " [6]])" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b.T" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "data": { "text/plain": [ "array([[ 5, 12],\n", " [15, 24]])" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A*b #not matrix multiplication!" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "data": { "text/plain": [ "array([[17],\n", " [39]])" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A.dot(b.T) #matrix multiplication" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "data": { "text/plain": [ "array([5, 6])" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b = np.array([5,6]) #1D array\n", "b" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "data": { "text/plain": [ "array([5, 6])" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b.T #not matrix transpose!" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "data": { "text/plain": [ "array([17, 39])" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A.dot(b) #does not matter for multiplication" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "## Solving linear system (求解线性方程组)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "data": { "text/plain": [ "array([[1, 2],\n", " [3, 4]])" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import numpy as np\n", "from scipy import linalg\n", "A = np.array([[1,2],[3,4]])\n", "A" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "data": { "text/plain": [ "array([[5],\n", " [6]])" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b = np.array([[5],[6]])\n", "b" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "data": { "text/plain": [ "array([[-4. ],\n", " [ 4.5]])" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "linalg.inv(A).dot(b) #slow" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "data": { "text/plain": [ "array([[ 0.],\n", " [ 0.]])" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A.dot(linalg.inv(A).dot(b))-b #check" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "data": { "text/plain": [ "array([[-4. ],\n", " [ 4.5]])" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.linalg.solve(A,b) #fast" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "data": { "text/plain": [ "array([[ 0.],\n", " [ 0.]])" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A.dot(np.linalg.solve(A,b))-b #check" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "## Determinant (行列式)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "data": { "text/plain": [ "-2.0" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import numpy as np\n", "from scipy import linalg\n", "A = np.array([[1,2],[3,4]])\n", "linalg.det(A)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "## Least-squares problems and pseudo-inverses (最小二乘和广义逆)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": true, "deletable": true, "editable": true }, "outputs": [], "source": [ "import numpy as np\n", "from scipy import linalg\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": true, "deletable": true, "editable": true }, "outputs": [], "source": [ "c1, c2 = 5.0, 2.0\n", "i = np.r_[1:11]\n", "xi = 0.1*i\n", "yi = c1*np.exp(-xi) + c2*xi\n", "zi = yi + 0.05 * np.max(yi) * np.random.randn(len(yi))" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": true, "deletable": true, "editable": true }, "outputs": [], "source": [ "A = np.c_[np.exp(-xi)[:, np.newaxis], xi[:, np.newaxis]]\n", "c, resid, rank, sigma = linalg.lstsq(A, zi)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": true, "deletable": true, "editable": true }, "outputs": [], "source": [ "xi2 = np.r_[0.1:1.0:100j]\n", "yi2 = c[0]*np.exp(-xi2) + c[1]*xi2" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "collapsed": true, "deletable": true, "editable": true }, "outputs": [], "source": [ "plt.plot(xi,zi,'x',xi2,yi2)\n", "plt.axis([0,1.1,3.0,5.5])\n", "plt.xlabel('$x_i$')\n", "plt.title('Data fitting with linalg.lstsq')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "## Eigenvalues and eigenvectors (特征向量和特征值)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(-0.372281323269+0j) (5.37228132327+0j)\n", "[-0.82456484 0.56576746]\n", "[-0.41597356 -0.90937671]\n", "[ 1. 1.]\n", "5.551115123125783e-17\n" ] } ], "source": [ "import numpy as np\n", "from scipy import linalg\n", "A = np.array([[1,2],[3,4]])\n", "la,v = linalg.eig(A)\n", "l1,l2 = la\n", "print(l1, l2) #eigenvalues\n", "\n", "print(v[:,0]) #first eigenvector\n", "\n", "print(v[:,1]) #second eigenvector\n", "\n", "print(np.sum(abs(v**2),axis=0)) #eigenvectors are unitary\n", "\n", "v1 = np.array(v[:,0]).T\n", "print(linalg.norm(A.dot(v1)-l1*v1)) #check the computation" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "## Singular Value Decomposition (SVD) (奇异值分解)" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": true, "deletable": true, "editable": true }, "outputs": [], "source": [ "import numpy as np\n", "from scipy import linalg\n", "A = np.array([[1,2,3],[4,5,6]])" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": true, "deletable": true, "editable": true }, "outputs": [], "source": [ "M,N = A.shape\n", "U,s,Vh = linalg.svd(A)\n", "Sig = linalg.diagsvd(s,M,N)" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "data": { "text/plain": [ "array([[-0.3863177 , -0.92236578],\n", " [-0.92236578, 0.3863177 ]])" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "U, Vh = U, Vh\n", "U" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "collapsed": false, "deletable": true, "editable": true, "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "array([[ 9.508032 , 0. , 0. ],\n", " [ 0. , 0.77286964, 0. ]])" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Sig" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "data": { "text/plain": [ "array([[-0.42866713, -0.56630692, -0.7039467 ],\n", " [ 0.80596391, 0.11238241, -0.58119908],\n", " [ 0.40824829, -0.81649658, 0.40824829]])" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Vh" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "collapsed": false, "deletable": true, "editable": true, "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "array([[ 1., 2., 3.],\n", " [ 4., 5., 6.]])" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "U.dot(Sig.dot(Vh)) #check computation" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "## QR decomposition (QR分解)\n", "\n", "The command for QR decomposition is `linalg.qr`." ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "## LU decomposition (LU分解)\n", " \n", "The SciPy command for this decomposition is `linalg.lu`. If the intent for performing LU decomposition is for solving linear systems then the command `linalg.lu_factor` should be used followed by repeated applications of the command `linalg.lu_solve` to solve the system for each new right-hand-side." ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "## Cholesky decomposition (乔列斯基分解)\n", "\n", "The command `linalg.cholesky` computes the cholesky factorization. For using Cholesky factorization to solve systems of equations there are also `linalg.cho_factor` and `linalg.cho_solve` routines that work similarly to their LU decomposition counterparts." ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "# Statistical Distributions (统计分布函数)\n", "\n", "A large number of probability distributions as well as a growing library of statistical functions are available in `scipy.stats`. See http://docs.scipy.org/doc/scipy/reference/stats.html for a complete list." ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Generate random numbers from normal distribution:" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "from scipy.stats import norm\n", "r = norm.rvs(loc=0, scale=1, size=1000)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Calculate a few first moments:" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "collapsed": true, "deletable": true, "editable": true }, "outputs": [], "source": [ "mean, var, skew, kurt = norm.stats(moments='mvsk')" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Display the probability density function (pdf)" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "x = np.linspace(norm.ppf(0.01), #ppf stands for percentiles.\n", " norm.ppf(0.99), 100)\n", "\n", "fig, ax = plt.subplots(1, 1)\n", "ax.plot(x, norm.pdf(x),\n", " 'r-', lw=5, alpha=0.6, label='norm pdf')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "And compare the histogram:" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "fig, ax = plt.subplots(1, 1)\n", "ax.hist(r, normed=True, histtype='stepfilled', alpha=0.2, label='...')\n", "ax.legend(loc='best', frameon=False)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "# Linear regression model (线性回归模型)\n", "\n", "This example computes a least-squares regression for two sets of measurements." ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'slope': -0.16344304227778697, 'intercept': 0.60919656607207551}\n", "{'p_value': 0.65616905736353337, 'r-squared': 0.029999999999999999}\n" ] } ], "source": [ "from scipy import stats\n", "import numpy as np\n", "x = np.random.random(10)\n", "y = np.random.random(10)\n", "slope, intercept, r_value, p_value, std_err = stats.linregress(x,y)\n", "print({'slope':slope,'intercept':intercept})\n", "print({'p_value':p_value,'r-squared':round(r_value**2,2)})" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "## Optimization (优化)\n", "\n", "The `minimize` function provides a common interface to unconstrained and constrained minimization algorithms for multivariate scalar functions in `scipy.optimize`" ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.000000\n", " Iterations: 339\n", " Function evaluations: 571\n", "[ 1. 1. 1. 1. 1.]\n" ] } ], "source": [ "import numpy as np\n", "from scipy.optimize import minimize\n", "\n", "## Define the function\n", "def rosen(x):\n", " \"\"\"The Rosenbrock function\"\"\"\n", " return sum(100.0*(x[1:]-x[:-1]**2.0)**2.0 + (1-x[:-1])**2.0)\n", "\n", "x0 = np.array([1.3, 0.7, 0.8, 1.9, 1.2])\n", "\n", "## Calling the minimize() function\n", "res = minimize(rosen, x0, method='nelder-mead',\n", " options={'xtol': 1e-8, 'disp': True})\n", "print(res.x)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "# Data Visualizing (数据可视化)" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "from matplotlib import pyplot as plt\n", "years = [1950, 1960, 1970, 1980, 1990, 2000, 2010]\n", "gdp = [300.2, 543.3, 1075.9, 2862.5, 5979.6, 10289.7, 14958.3]\n", "# create a line chart, years on x-axis, gdp on y-axis\n", "fig = plt.figure()\n", "plt.plot(years, gdp, color='green', marker='o', linestyle='solid')\n", "# add a title\n", "plt.title(\"Nominal GDP\")\n", "# add a label to the y-axis\n", "plt.ylabel(\"Billions of $\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "## 3D Plot (3D绘图)" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "collapsed": true, "deletable": true, "editable": true }, "outputs": [], "source": [ "from scipy import special\n", "def drumhead_height(n, k, distance, angle, t):\n", " kth_zero = special.jn_zeros(n, k)[-1]\n", " return np.cos(t) * np.cos(n*angle) * special.jn(n, distance*kth_zero)\n", "theta = np.r_[0:2*np.pi:50j]\n", "radius = np.r_[0:1:50j]\n", "x = np.array([r * np.cos(theta) for r in radius])\n", "y = np.array([r * np.sin(theta) for r in radius])\n", "z = np.array([drumhead_height(1, 1, r, theta, 0.5) for r in radius])" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "collapsed": true, "deletable": true, "editable": true }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "from mpl_toolkits.mplot3d import Axes3D\n", "from matplotlib import cm\n", "fig = plt.figure()\n", "ax = Axes3D(fig)\n", "ax.plot_surface(x, y, z, rstride=1, cstride=1, cmap=cm.jet)\n", "ax.set_xlabel('X')\n", "ax.set_ylabel('Y')\n", "ax.set_zlabel('Z')\n", "plt.show()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.3" } }, "nbformat": 4, "nbformat_minor": 0 }