{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Data Wrangling with Pandas\n", "\n", "Feng Li\n", "\n", "School of Statistics and Mathematics\n", "\n", "Central University of Finance and Economics\n", "\n", "[feng.li@cufe.edu.cn](mailto:feng.li@cufe.edu.cn)\n", "\n", "[https://feng.li/python](https://feng.li/python)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "\n", "## Introduction to Pandas\n", "\n", "- Python is a terrific platform for statistical data analysis partly because of the features of the language itself, but also because of a rich suite of 3rd party packages that provide robust and flexible data structures, efficient implementations of mathematical and statistical functions, and facitities for generating publication-quality graphics. \n", "\n", "- Pandas is at the top of the \"scientific stack\", because it allows data to be imported, manipulated and exported so easily. In contrast, NumPy supports the bottom of the stack with fundamental infrastructure for array operations, mathematical calculations, and random number generation. \n", "\n", "- We will cover both of these in some detail before getting down to the business of analyzing data." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "- **Pandas** is a Python package providing fast, flexible, and expressive data structures designed to work with *relational* or *labeled* data both. It is a fundamental high-level building block for doing practical, real world data analysis in Python. \n", "\n", "- Pandas is well suited for:\n", "\n", " - Tabular data with heterogeneously-typed columns, as in an SQL table or Excel spreadsheet\n", " - Ordered and unordered (not necessarily fixed-frequency) time series data.\n", " - Arbitrary matrix data (homogeneously typed or heterogeneous) with row and column labels\n", " - Any other form of observational / statistical data sets. The data actually need not be labeled at all to be placed into a pandas data structure" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Key features of Pandas:\n", " \n", "- Easy handling of **missing data**\n", "- **Size mutability**: columns can be inserted and deleted from DataFrame and higher dimensional objects\n", "- Automatic and explicit **data alignment**: objects can be explicitly aligned to a set of labels, or the data can be aligned automatically\n", "- Powerful, flexible **group by functionality** to perform split-apply-combine operations on data sets\n", "- Intelligent label-based **slicing, fancy indexing, and subsetting** of large data sets\n", "- Intuitive **merging and joining** data sets\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "- Flexible **reshaping and pivoting** of data sets\n", "- **Hierarchical labeling** of axes\n", "- Robust **IO tools** for loading data from flat files, Excel files, databases, and HDF5\n", "- **Time series functionality**: date range generation and frequency conversion, moving window statistics, moving window linear regressions, date shifting and lagging, etc." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "scrolled": false, "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Looking in indexes: https://mirrors.163.com/pypi/simple/\n", "Requirement already satisfied: pandas in /usr/local/lib/python3.9/dist-packages (1.3.4)\n", "Collecting pandas\n", " Using cached https://mirrors.163.com/pypi/packages/48/b4/1081d66b71c4dfc1bc1e19d6f2abbf93ed42f69df7703eb323742d45423e/pandas-1.3.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.5 MB)\n", " Using cached https://mirrors.163.com/pypi/packages/03/ea/98d488a4047b3fd8075b5c1e00469ad42d715e2c1e4fd15fa1ffaef8d635/pandas-1.3.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.5 MB)\n", "Requirement already satisfied: pytz>=2017.3 in /usr/lib/python3/dist-packages (from pandas) (2021.3)\n", "Requirement already satisfied: python-dateutil>=2.7.3 in /usr/lib/python3/dist-packages (from pandas) (2.8.1)\n", "Requirement already satisfied: numpy>=1.17.3 in /usr/local/lib/python3.9/dist-packages (from pandas) (1.21.2)\n" ] } ], "source": [ "# Install pandas within a terminal\n", "# ! pip3 install pandas -U \n", "! pip install pandas -U " ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Pandas Data Structures: Series\n", "\n", "A **Series** is a single vector of data (like a NumPy array) with an *index* that labels each element in the vector." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "0 632\n", "1 1638\n", "2 569\n", "3 115\n", "dtype: int64" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "counts = pd.Series([632, 1638, 569, 115])\n", "counts" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "- If an index is not specified, a default sequence of integers is assigned as the index. A NumPy array comprises the values of the `Series`, while the index is a pandas `Index` object." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "array([ 632, 1638, 569, 115])" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "counts.values" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "RangeIndex(start=0, stop=4, step=1)" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "counts.index" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "- We can assign meaningful labels to the index, if they are available:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "Firmicutes 632\n", "Proteobacteria 1638\n", "Actinobacteria 569\n", "Bacteroidetes 115\n", "dtype: int64" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bacteria = pd.Series([632, 1638, 569, 115], \n", " index=['Firmicutes', 'Proteobacteria', 'Actinobacteria', 'Bacteroidetes'])\n", "\n", "bacteria" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "- These labels can be used to refer to the values in the `Series`." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "569" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bacteria['Actinobacteria']" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "Index(['Firmicutes', 'Proteobacteria', 'Actinobacteria', 'Bacteroidetes'], dtype='object', name='phylum')" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bacteria.index" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "[False, True, True, False]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[name.endswith('bacteria') for name in bacteria.index]" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "Proteobacteria 1638\n", "Actinobacteria 569\n", "dtype: int64" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bacteria[[name.endswith('bacteria') for name in bacteria.index]]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "- Notice that the indexing operation preserved the association between the values and the corresponding indices. We can still use **positional indexing** if we wish." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "632" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bacteria[0]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "- We can give both the **array of values** and **the index** meaningful labels themselves:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "phylum\n", "Firmicutes 632\n", "Proteobacteria 1638\n", "Actinobacteria 569\n", "Bacteroidetes 115\n", "Name: counts, dtype: int64" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bacteria.name = 'counts'\n", "bacteria.index.name = 'phylum'\n", "bacteria" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "- We can also filter according to the values in the `Series`:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "phylum\n", "Proteobacteria 1638\n", "Name: counts, dtype: int64" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bacteria[bacteria>1000]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "- A `Series` can be thought of as an **ordered key-value** store. In fact, we can create one from a `dict`:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "Firmicutes 632\n", "Proteobacteria 1638\n", "Actinobacteria 569\n", "Bacteroidetes 115\n", "dtype: int64" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bacteria_dict = {'Firmicutes': 632, 'Proteobacteria': 1638, 'Actinobacteria': 569, 'Bacteroidetes': 115}\n", "pd.Series(bacteria_dict)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "- Notice that the `Series` is created in **key-sorted order**. If we pass a custom index to `Series`, it will select the corresponding values from the dict, and treat indices without corrsponding values as missing. Pandas uses the `NaN` (not a number) type for missing values." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "Cyanobacteria NaN\n", "Firmicutes 632.0\n", "Proteobacteria 1638.0\n", "Actinobacteria 569.0\n", "dtype: float64" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bacteria2 = pd.Series(bacteria_dict, index=['Cyanobacteria','Firmicutes','Proteobacteria','Actinobacteria'])\n", "bacteria2" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "Cyanobacteria True\n", "Firmicutes False\n", "Proteobacteria False\n", "Actinobacteria False\n", "dtype: bool" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bacteria2.isnull()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "- Critically, the labels are used to **align data** when used in operations with other Series objects:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "Actinobacteria 1138.0\n", "Bacteroidetes NaN\n", "Cyanobacteria NaN\n", "Firmicutes 1264.0\n", "Proteobacteria 3276.0\n", "dtype: float64" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bacteria + bacteria2" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Pandas Data Structures: DataFrame\n", "\n", "Inevitably, we want to be able to store, view and manipulate data that is *multivariate*, where for every index there are multiple fields or columns of data, often of varying data type.\n", "\n", "A `DataFrame` is a tabular data structure, encapsulating multiple series like columns in a spreadsheet. Data are stored internally as a 2-dimensional object, but the `DataFrame` allows us to represent and manipulate higher-dimensional data." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
valuepatientphylum
06321Firmicutes
116381Proteobacteria
25691Actinobacteria
31151Bacteroidetes
44332Firmicutes
511302Proteobacteria
67542Actinobacteria
75552Bacteroidetes
\n", "
" ], "text/plain": [ " value patient phylum\n", "0 632 1 Firmicutes\n", "1 1638 1 Proteobacteria\n", "2 569 1 Actinobacteria\n", "3 115 1 Bacteroidetes\n", "4 433 2 Firmicutes\n", "5 1130 2 Proteobacteria\n", "6 754 2 Actinobacteria\n", "7 555 2 Bacteroidetes" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = pd.DataFrame({'value':[632, 1638, 569, 115, 433, 1130, 754, 555],\n", " 'patient':[1, 1, 1, 1, 2, 2, 2, 2],\n", " 'phylum':['Firmicutes', 'Proteobacteria', 'Actinobacteria', \n", " 'Bacteroidetes', 'Firmicutes', 'Proteobacteria', \n", " 'Actinobacteria', 'Bacteroidetes']})\n", "data" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Indexing" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- Notice the `DataFrame` is sorted by column name. We can change the order by indexing them in the order we desire:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
phylumvaluepatient
0Firmicutes6321
1Proteobacteria16381
2Actinobacteria5691
3Bacteroidetes1151
4Firmicutes4332
5Proteobacteria11302
6Actinobacteria7542
7Bacteroidetes5552
\n", "
" ], "text/plain": [ " phylum value patient\n", "0 Firmicutes 632 1\n", "1 Proteobacteria 1638 1\n", "2 Actinobacteria 569 1\n", "3 Bacteroidetes 115 1\n", "4 Firmicutes 433 2\n", "5 Proteobacteria 1130 2\n", "6 Actinobacteria 754 2\n", "7 Bacteroidetes 555 2" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data[['phylum','value','patient']]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "- A `DataFrame` has a second index, representing the columns:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "Index(['value', 'patient', 'phylum'], dtype='object')" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.columns" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "- If we wish to access columns, we can do so either **by dict-like indexing** or **by attribute**:" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "0 632\n", "1 1638\n", "2 569\n", "3 115\n", "4 433\n", "5 1130\n", "6 754\n", "7 555\n", "Name: value, dtype: int64" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data['value']" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "0 632\n", "1 1638\n", "2 569\n", "3 115\n", "4 433\n", "5 1130\n", "6 754\n", "7 555\n", "Name: value, dtype: int64" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.value" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "pandas.core.series.Series" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(data.value)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "pandas.core.frame.DataFrame" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(data[['value']])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "- `.iloc[ ]` Purely integer-location based indexing for selection by position.\n", "\n", " - With a **scalar integer** will retrieve the given row" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "value 632\n", "patient 1\n", "phylum Firmicutes\n", "Name: 0, dtype: object" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.iloc[0]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "- With a slice object for row selection. Note that **stop of the slice are not included** `[2, 5)`." ] }, { "cell_type": "code", "execution_count": 64, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
valuepatientphylum
25691Actinobacteria
31151Bacteroidetes
44332Firmicutes
\n", "
" ], "text/plain": [ " value patient phylum\n", "2 569 1 Actinobacteria\n", "3 115 1 Bacteroidetes\n", "4 433 2 Firmicutes" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.iloc[2:5]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "- List Comprehensions is also possible for row selection" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
valuepatientphylum
06321Firmicutes
25691Actinobacteria
44332Firmicutes
67542Actinobacteria
\n", "
" ], "text/plain": [ " value patient phylum\n", "0 632 1 Firmicutes\n", "2 569 1 Actinobacteria\n", "4 433 2 Firmicutes\n", "6 754 2 Actinobacteria" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.iloc[lambda x: x.index % 2 == 0]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "- With a **list of integers** will retrieve the given row" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
valuepatientphylum
116381Proteobacteria
25691Actinobacteria
\n", "
" ], "text/plain": [ " value patient phylum\n", "1 1638 1 Proteobacteria\n", "2 569 1 Actinobacteria" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.iloc[[1,2]]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "- With **a boolean list** mask the same length as the index." ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "scrolled": true, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
valuepatientphylum
06321Firmicutes
25691Actinobacteria
44332Firmicutes
67542Actinobacteria
\n", "
" ], "text/plain": [ " value patient phylum\n", "0 632 1 Firmicutes\n", "2 569 1 Actinobacteria\n", "4 433 2 Firmicutes\n", "6 754 2 Actinobacteria" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.iloc[[True, False,True, False,True, False,True, False]]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "- **Indexing both axes**\n", "\n", "You can mix the indexer types for the index and columns. Use `:` to select the entire axis." ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.iloc[0, 1] # With scalar integers" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
valuephylum
0632Firmicutes
4433Firmicutes
\n", "
" ], "text/plain": [ " value phylum\n", "0 632 Firmicutes\n", "4 433 Firmicutes" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.iloc[[0, 4], [0, 2]] # With lists of integers." ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
valuepatient
116381
25691
\n", "
" ], "text/plain": [ " value patient\n", "1 1638 1\n", "2 569 1" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.iloc[1:3, 0:2]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "- `.loc[ ]` Access a group of rows and columns by label(s) or a boolean array." ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
max_speedshield
cobra12
viper45
sidewinder78
\n", "
" ], "text/plain": [ " max_speed shield\n", "cobra 1 2\n", "viper 4 5\n", "sidewinder 7 8" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],\n", " index=['cobra', 'viper', 'sidewinder'],\n", " columns=['max_speed', 'shield'])\n", "df" ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "max_speed 4\n", "shield 5\n", "Name: viper, dtype: int64" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc['viper'] # Single label returns the row as a Series." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "- Slice with labels for row and single label for column. Note that **both the start and stop of the slice are included**." ] }, { "cell_type": "code", "execution_count": 52, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "cobra 1\n", "viper 4\n", "Name: max_speed, dtype: int64" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc['cobra':'viper', 'max_speed']" ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
max_speedshield
viper45
sidewinder78
\n", "
" ], "text/plain": [ " max_speed shield\n", "viper 4 5\n", "sidewinder 7 8" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[['viper', 'sidewinder']] # List of labels returns a DataFrame." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "- Boolean list with the same length as the row axis" ] }, { "cell_type": "code", "execution_count": 53, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
max_speedshield
sidewinder78
\n", "
" ], "text/plain": [ " max_speed shield\n", "sidewinder 7 8" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[[False, False, True]]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "- Conditional that returns a boolean Series" ] }, { "cell_type": "code", "execution_count": 55, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
max_speedshield
sidewinder78
\n", "
" ], "text/plain": [ " max_speed shield\n", "sidewinder 7 8" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[df['shield'] > 6]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "- Slice with integer labels for rows using `.loc[ ]`. Note that **both the start and stop of the slice are included**." ] }, { "cell_type": "code", "execution_count": 66, "metadata": { "scrolled": true, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
valuepatientphylum
06321Firmicutes
116381Proteobacteria
25691Actinobacteria
31151Bacteroidetes
44332Firmicutes
502Proteobacteria
67542Actinobacteria
75552Bacteroidetes
\n", "
" ], "text/plain": [ " value patient phylum\n", "0 632 1 Firmicutes\n", "1 1638 1 Proteobacteria\n", "2 569 1 Actinobacteria\n", "3 115 1 Bacteroidetes\n", "4 433 2 Firmicutes\n", "5 0 2 Proteobacteria\n", "6 754 2 Actinobacteria\n", "7 555 2 Bacteroidetes" ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data" ] }, { "cell_type": "code", "execution_count": 68, "metadata": { "scrolled": false, "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
valuepatientphylum
31151Bacteroidetes
44332Firmicutes
502Proteobacteria
\n", "
" ], "text/plain": [ " value patient phylum\n", "3 115 1 Bacteroidetes\n", "4 433 2 Firmicutes\n", "5 0 2 Proteobacteria" ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.loc[3:5]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Set values" ] }, { "cell_type": "code", "execution_count": 60, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
max_speedshield
cobra12
viper450
sidewinder750
\n", "
" ], "text/plain": [ " max_speed shield\n", "cobra 1 2\n", "viper 4 50\n", "sidewinder 7 50" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Set value for all items matching the list of labels\n", "df.loc[['viper', 'sidewinder'], ['shield']] = 50\n", "df" ] }, { "cell_type": "code", "execution_count": 61, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
max_speedshield
cobra1010
viper450
sidewinder750
\n", "
" ], "text/plain": [ " max_speed shield\n", "cobra 10 10\n", "viper 4 50\n", "sidewinder 7 50" ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc['cobra'] = 10 # Set value for an entire row\n", "\n", "df" ] }, { "cell_type": "code", "execution_count": 62, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
max_speedshield
cobra3010
viper3050
sidewinder3050
\n", "
" ], "text/plain": [ " max_speed shield\n", "cobra 30 10\n", "viper 30 50\n", "sidewinder 30 50" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[:, 'max_speed'] = 30 # Set value for an entire column\n", "df" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "- It is important to note that the Series returned when a DataFrame is indexted is merely a **view** on the DataFrame, and not a copy of the data itself. So you must be cautious when manipulating this data:" ] }, { "cell_type": "code", "execution_count": 56, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "0 632\n", "1 1638\n", "2 569\n", "3 115\n", "4 433\n", "5 1130\n", "6 754\n", "7 555\n", "Name: value, dtype: int64" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "vals = data.value\n", "vals" ] }, { "cell_type": "code", "execution_count": 57, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/tmp/ipykernel_229928/1693880163.py:1: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", " vals[5] = 0\n" ] }, { "data": { "text/plain": [ "0 632\n", "1 1638\n", "2 569\n", "3 115\n", "4 433\n", "5 0\n", "6 754\n", "7 555\n", "Name: value, dtype: int64" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "vals[5] = 0\n", "vals" ] }, { "cell_type": "code", "execution_count": 58, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
valuepatientphylum
06321Firmicutes
116381Proteobacteria
25691Actinobacteria
31151Bacteroidetes
44332Firmicutes
502Proteobacteria
67542Actinobacteria
75552Bacteroidetes
\n", "
" ], "text/plain": [ " value patient phylum\n", "0 632 1 Firmicutes\n", "1 1638 1 Proteobacteria\n", "2 569 1 Actinobacteria\n", "3 115 1 Bacteroidetes\n", "4 433 2 Firmicutes\n", "5 0 2 Proteobacteria\n", "6 754 2 Actinobacteria\n", "7 555 2 Bacteroidetes" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "vals = data.value.copy()\n", "vals[5] = 1000\n", "data" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "- We can create or modify columns by assignment:" ] }, { "cell_type": "code", "execution_count": 69, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/tmp/ipykernel_229928/2998967180.py:1: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", " data.value[3] = 14\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
valuepatientphylum
06321Firmicutes
116381Proteobacteria
25691Actinobacteria
3141Bacteroidetes
44332Firmicutes
502Proteobacteria
67542Actinobacteria
75552Bacteroidetes
\n", "
" ], "text/plain": [ " value patient phylum\n", "0 632 1 Firmicutes\n", "1 1638 1 Proteobacteria\n", "2 569 1 Actinobacteria\n", "3 14 1 Bacteroidetes\n", "4 433 2 Firmicutes\n", "5 0 2 Proteobacteria\n", "6 754 2 Actinobacteria\n", "7 555 2 Bacteroidetes" ] }, "execution_count": 69, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.value[3] = 14\n", "data" ] }, { "cell_type": "code", "execution_count": 72, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
valuepatientphylumyear
06321Firmicutes2013
116381Proteobacteria2013
25691Actinobacteria2013
3141Bacteroidetes2013
44332Firmicutes2013
502Proteobacteria2013
67542Actinobacteria2013
75552Bacteroidetes2013
\n", "
" ], "text/plain": [ " value patient phylum year\n", "0 632 1 Firmicutes 2013\n", "1 1638 1 Proteobacteria 2013\n", "2 569 1 Actinobacteria 2013\n", "3 14 1 Bacteroidetes 2013\n", "4 433 2 Firmicutes 2013\n", "5 0 2 Proteobacteria 2013\n", "6 754 2 Actinobacteria 2013\n", "7 555 2 Bacteroidetes 2013" ] }, "execution_count": 72, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data['year'] = 2013\n", "data" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "- But note, we cannot use the attribute indexing method to add a new column:" ] }, { "cell_type": "code", "execution_count": 73, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
valuepatientphylumyear
06321Firmicutes2013
116381Proteobacteria2013
25691Actinobacteria2013
3141Bacteroidetes2013
44332Firmicutes2013
502Proteobacteria2013
67542Actinobacteria2013
75552Bacteroidetes2013
\n", "
" ], "text/plain": [ " value patient phylum year\n", "0 632 1 Firmicutes 2013\n", "1 1638 1 Proteobacteria 2013\n", "2 569 1 Actinobacteria 2013\n", "3 14 1 Bacteroidetes 2013\n", "4 433 2 Firmicutes 2013\n", "5 0 2 Proteobacteria 2013\n", "6 754 2 Actinobacteria 2013\n", "7 555 2 Bacteroidetes 2013" ] }, "execution_count": 73, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.treatment = 1\n", "data" ] }, { "cell_type": "code", "execution_count": 74, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 74, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.treatment" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "- Specifying a `Series` as a new columns cause its values to be added according to the `DataFrame`'s index:" ] }, { "cell_type": "code", "execution_count": 75, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "0 0\n", "1 0\n", "2 0\n", "3 0\n", "4 1\n", "5 1\n", "dtype: int64" ] }, "execution_count": 75, "metadata": {}, "output_type": "execute_result" } ], "source": [ "treatment = pd.Series([0]*4 + [1]*2)\n", "treatment" ] }, { "cell_type": "code", "execution_count": 76, "metadata": { "scrolled": true, "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
valuepatientphylumyeartreatment
06321Firmicutes20130.0
116381Proteobacteria20130.0
25691Actinobacteria20130.0
3141Bacteroidetes20130.0
44332Firmicutes20131.0
502Proteobacteria20131.0
67542Actinobacteria2013NaN
75552Bacteroidetes2013NaN
\n", "
" ], "text/plain": [ " value patient phylum year treatment\n", "0 632 1 Firmicutes 2013 0.0\n", "1 1638 1 Proteobacteria 2013 0.0\n", "2 569 1 Actinobacteria 2013 0.0\n", "3 14 1 Bacteroidetes 2013 0.0\n", "4 433 2 Firmicutes 2013 1.0\n", "5 0 2 Proteobacteria 2013 1.0\n", "6 754 2 Actinobacteria 2013 NaN\n", "7 555 2 Bacteroidetes 2013 NaN" ] }, "execution_count": 76, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data['treatment'] = treatment\n", "data" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "- Other Python data structures (ones without an index) need to be the same length as the `DataFrame`. The following produces an error." ] }, { "cell_type": "code", "execution_count": 79, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "# month = ['Jan', 'Feb', 'Mar', 'Apr']\n", "# data['month'] = month" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "- We can extract the underlying data as a simple `ndarray` by accessing the `values` attribute:" ] }, { "cell_type": "code", "execution_count": 80, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "array([[632, 1, 'Firmicutes', 2013, 0.0],\n", " [1638, 1, 'Proteobacteria', 2013, 0.0],\n", " [569, 1, 'Actinobacteria', 2013, 0.0],\n", " [14, 1, 'Bacteroidetes', 2013, 0.0],\n", " [433, 2, 'Firmicutes', 2013, 1.0],\n", " [0, 2, 'Proteobacteria', 2013, 1.0],\n", " [754, 2, 'Actinobacteria', 2013, nan],\n", " [555, 2, 'Bacteroidetes', 2013, nan]], dtype=object)" ] }, "execution_count": 80, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.values" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Lab\n", "\n", "- What are the differences between `.iloc[ ]` and ``loc[ ]``?\n", "- Within `.iloc[ ]` and ``loc[ ]``, when shall we (not) use `[ ]`? \n", "- When doing indexing, what is the sign `:` used for?\n" ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.8" }, "rise": { "auto_select": "first", "autolaunch": false, "enable_chalkboard": true, "start_slideshow_at": "selected", "theme": "black" } }, "nbformat": 4, "nbformat_minor": 1 }