Feng Li

School of Statistics and Mathematics

Central University of Finance and Economics

"The simple graph has brought more information to the data analyst’s mind than any other device."

— John Tukey

`mpg`

data frame¶`mpg`

contains observations collected by the US Environment Protection Agency on 38 models of car. You can see more details via `?mpg`

. Among the variables in `mpg`

are:

`displ`

, a car’s engine size, in litres.`hwy`

, a car's fuel efficiency on the highway, in miles per gallon (mpg). A car with a low fuel efficiency consumes more fuel than a car with a high fuel efficiency when they travel the same distance.- ...

Practice and look at the help document of `plot`

In [1]:

```
library(ggplot2)
attach(mpg)
plot(displ, hwy)
abline(lm(hwy~displ))
title("Regression of MPG on engine size")
```

In [2]:

```
hist(mpg$hwy)
```

In [3]:

```
d <- density(mpg$hwy) # returns the density data
plot(d)
```

In [4]:

```
car.table <- table(mpg$manufacturer)
pie.cars <- car.table
names(pie.cars) <- names(car.table)
pie(pie.cars)
```

In [5]:

```
# Boxplot of MPG
boxplot(mpg$hwy, main = 'Boxplot of MPG')
# Boxplot of MPG by Car Cylinders
boxplot(hwy~cyl,data = mpg, main = "Car Milage Data",
xlab = "Number of Cylinders", ylab = "Miles Per Gallon")
```

In [6]:

```
# install.packages("corrplot")
data(mtcars)
library(corrplot)
M <- cor(mtcars)
corrplot(M, addCoef.col = "grey")
```

corrplot 0.92 loaded

In [7]:

```
library(forecast)
library(fpp)
plot(ausbeer)
```

In [8]:

```
library(ggplot2)
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy))
```

With ggplot2, you begin a plot with the function

`ggplot()`

.`ggplot()`

creates a coordinate system that you can add layers to.`geom_point()`

adds a layer of points to your plot, which creates a scatterplot. You can specify the color, size and shape of these points. Each geom function in ggplot2 takes a`mapping`

argument.

Run

`ggplot(data = mpg)`

. What do you see?How many rows are in mpg? How many columns?

What does the

`drv`

variable describe? Read the help for`?mpg`

to find out.Make a scatterplot of

`hwy`

vs`cyl`

.What happens if you make a scatterplot of

`class`

vs`drv`

? Why is the plot not useful?What happens for the outliers?

In [9]:

```
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = class))
```

In [10]:

```
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy), color = 2, size = 3)
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy), shape = 18)
```

What’s gone wrong with this code? Why are the points not blue?

Which variables in

`mpg`

are categorical? Which variables are continuous?Map a continuous variable to

`color`

. How does it behave differently for categorical vs. continuous variables?What happens if you use something other than a variable name as the color, like aes(colour = displ < 5)?

In [11]:

```
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = "blue"))
```

- Another way, particularly useful for categorical variables, is to split your plot into facets, subplots that each display one subset of the data.
- To facet your plot by a single variable, use
`facet_wrap()`

. The first argument should be a formula, which you create with ~ followed by a variable name.

In [12]:

```
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_wrap(~ class, nrow = 2)
```

In [13]:

```
library(GGally)
ggpairs(subset(mtcars, select = c(1, 3, 4, 5, 6)))
```