R Data Import and Export¶

Feng Li

School of Statistics and Mathematics

Central University of Finance and Economics

feng.li@cufe.edu.cn

https://feng.li/statcomp

>>> Link to Python version 1

Plain Text Format¶

  • Read a file in table format (.csv, .txt, .dat) and creates a data frame from it.

    • read.table()
    • read.csv()
    • read.csv2()
  • Write a file in text format.

    • write.table()
    • write.csv()
    • write.csv2()
In [1]:
mytable <- read.table(file = "data/table1.txt", sep = "\t", header = TRUE)
mytable
A data.frame: 50 × 5
STATEY1Y2X1X2
<chr><dbl><dbl><dbl><dbl>
AL2206.02186.0 92.7 91.4
AK 0.7 0.7151.0149.0
AZ 73.0 74.0 61.0 56.0
AR3620.03737.0 86.3 91.8
CA7472.07444.0 63.4 58.4
CO 788.0 873.0 77.8 73.0
CT1029.0 948.0106.0104.0
DE 168.0 164.0117.0113.0
FL2568.02537.0 62.0 57.2
GA4302.04301.0 80.6 80.8
HI 227.5 224.5 85.0 85.5
ID 187.0 203.0 79.1 72.9
IL 793.0 809.0 65.0 70.5
IN5445.05290.0 62.7 60.1
IA2151.02247.0 56.5 53.0
KS 404.0 389.0 54.5 47.8
KY 412.0 483.0 67.7 73.5
LA 273.0 254.0115.0115.0
ME1069.01070.0101.0 97.0
MD 885.0 898.0 76.6 75.4
MA 235.0 237.0105.0102.0
MI1406.01396.0 58.0 53.8
MN2499.02697.0 57.7 54.0
MS1434.01468.0 87.8 86.7
MO1580.01622.0 55.4 51.5
MT 172.0 164.0 68.0 66.0
NE1202.01400.0 50.3 48.9
NV 2.2 1.8 53.9 52.7
NH 43.0 49.0109.0104.0
NJ 442.0 491.0 85.0 83.0
NM 283.0 302.0 74.0 70.0
NY 975.0 987.0 68.1 64.0
NC3033.03045.0 82.8 78.7
ND 51.0 45.0 55.2 48.0
OH4667.04637.0 59.1 54.7
OK 869.0 830.0101.0100.0
OR 652.0 686.0 77.0 74.6
PA4976.05130.0 61.0 52.0
RI 53.0 50.0102.0 99.0
SC1422.01420.0 70.1 65.9
SD 435.0 602.0 48.0 45.8
TN 277.0 279.0 71.0 80.7
TX3317.03356.0 76.7 72.6
UT 456.0 486.0 64.0 59.0
VT 31.0 30.0106.0102.0
VA 934.0 988.0 86.3 81.2
WA1287.01313.0 74.1 71.5
WV 136.0 174.0104.0109.0
WI 910.0 873.0 60.1 54.0
WY 1.7 1.7 83.0 83.0
In [3]:
as.matrix(mytable[, 2:5]) # Convert to a Matrix format
A matrix: 50 × 4 of type dbl
Y1Y2X1X2
2206.02186.0 92.7 91.4
0.7 0.7151.0149.0
73.0 74.0 61.0 56.0
3620.03737.0 86.3 91.8
7472.07444.0 63.4 58.4
788.0 873.0 77.8 73.0
1029.0 948.0106.0104.0
168.0 164.0117.0113.0
2568.02537.0 62.0 57.2
4302.04301.0 80.6 80.8
227.5 224.5 85.0 85.5
187.0 203.0 79.1 72.9
793.0 809.0 65.0 70.5
5445.05290.0 62.7 60.1
2151.02247.0 56.5 53.0
404.0 389.0 54.5 47.8
412.0 483.0 67.7 73.5
273.0 254.0115.0115.0
1069.01070.0101.0 97.0
885.0 898.0 76.6 75.4
235.0 237.0105.0102.0
1406.01396.0 58.0 53.8
2499.02697.0 57.7 54.0
1434.01468.0 87.8 86.7
1580.01622.0 55.4 51.5
172.0 164.0 68.0 66.0
1202.01400.0 50.3 48.9
2.2 1.8 53.9 52.7
43.0 49.0109.0104.0
442.0 491.0 85.0 83.0
283.0 302.0 74.0 70.0
975.0 987.0 68.1 64.0
3033.03045.0 82.8 78.7
51.0 45.0 55.2 48.0
4667.04637.0 59.1 54.7
869.0 830.0101.0100.0
652.0 686.0 77.0 74.6
4976.05130.0 61.0 52.0
53.0 50.0102.0 99.0
1422.01420.0 70.1 65.9
435.0 602.0 48.0 45.8
277.0 279.0 71.0 80.7
3317.03356.0 76.7 72.6
456.0 486.0 64.0 59.0
31.0 30.0106.0102.0
934.0 988.0 86.3 81.2
1287.01313.0 74.1 71.5
136.0 174.0104.0109.0
910.0 873.0 60.1 54.0
1.7 1.7 83.0 83.0
In [2]:
write.table(mytable, file = "mysavedtable.txt", sep = ",") # save with comma separator

Microsoft Excel¶

  • Install and load the openxlsx package.
  • Read Excel files with read.xlsx() function

      read.xlsx(xlsxFile, sheet, startRow = 1, colNames = TRUE, rowNames = FALSE, ...)
  • Write Excel files

    write.xlsx(x, file, asTable = FALSE, overwrite = TRUE, ...)
In [10]:
# install.packages("openxlsx")
Installing package into ‘/home/fli/.R/library’
(as ‘lib’ is unspecified)

In [9]:
require("openxlsx")
mydata <- read.xlsx("data/UNHBDScore.xlsx",
                    sheet = 1,
                    startRow = 1,
                    colNames = TRUE)
mydata
A data.frame: 20 × 11
TeamSlides/ReportCodesVideoUse.public.dataUse.big.dataThemeInnovationMethodsPresentationVisualization
<chr><chr><chr><chr><chr><chr><dbl><dbl><dbl><dbl><dbl>
1Scipopulis YesYesYesYesYes43344
2Analysts Against Poverty YesYesYesYesNo 33232
3Upside Down YesYesNo YesYes22212
4safe-encounter.ai YesYesYesYesYes32131
5Big Data and Disasters - ADB YesYesYesYesYes54544
6Datains - Nippon Koei YesYesYesYesYes53344
7Frame YesYesYesYesYes53232
8GC-TICS YesYesYesYesYes33342
9Meaning Maker YesYesYesYesYes43344
10Satria Garuda YesYesYesYesYes32221
11Vegetation Growth YesYesYesYesYes33332
12YOLO (You Only Live Once) YesYesYesYesYes44333
13ZU_ProMinds YesYesYesYesYes33333
14GEMy YesYesYesYesYes43332
15Not wee Data YesYesYesYesYes33332
16MKIT_AIMS2018 No YesNo YesYes22211
17Explorers YesYesYesYesYes44555
18Multiverse of Data YesYesYesYesYes43433
19QUT Centre for Data Science No YesNo YesYes32211
20UAE Diversified Big Data Expert TeamYesYesYesYesYes44323

Read and Write MAT Files¶

  • Install and load the R.matlab package.
  • No Matlab installation required.
  • Methods readMat() and writeMat() for reading and writing MAT files.
In [11]:
# install.packages(R.matlab)
library("R.matlab")
readMat("data/RajanData.mat")
R.matlab v3.7.0 (2022-08-25 21:52:34 UTC) successfully loaded. See ?R.matlab for help.


Attaching package: ‘R.matlab’


The following objects are masked from ‘package:base’:

    getOption, isOpen


$X
A matrix: 4405 × 5 of type dbl
10.15283128 1.04527506.046561 0.08774734
10.59341052 1.64792983.939774 0.18836954
10.24067336 2.85774545.757311 0.24105133
10.37908270 0.68596074.942442 0.10643459
10.22463373 0.81535653.644353 0.10099751
10.28268554 2.00069005.687355 0.10290431
10.16413260 0.85672307.444632 0.07588428
10.64726939 1.05477386.007547 0.10247049
10.10341836 0.87605566.427695-0.04920358
10.12944697 2.07064806.069407 0.14460620
10.10986147 1.48786376.850210 0.18828204
10.22716095 0.88281604.145006 0.10592493
10.10519763 3.66748372.216809 0.41223156
10.88186163 1.01948524.973619 0.31369309
10.44649656 4.17629868.968512 0.31245069
10.68209559 0.90980374.418684 0.08361230
10.09845585 1.25328364.988076 0.12320417
10.30317060 0.89657705.179500 0.10322459
10.27682052 1.26509144.286548 0.11287619
10.19914411 1.00422233.970990 0.05652562
10.21840969 0.90084264.563952 0.08914537
10.36891574 0.77645821.599792 0.04900973
10.0831839416.75166802.215719 0.03011832
10.07090499 3.29403234.797549 0.19508910
10.30679425 0.90371794.584620 0.05960136
10.13470736 1.21516196.310098 0.05064859
10.04878252 1.03638316.610284 0.10701220
10.43966075 2.03060094.536859 0.28045713
10.10106310 0.79128123.201648 0.12439597
10.16227839 0.83579083.221033 0.06801826
⋮⋮⋮⋮⋮
10.231630571.4541071 1.5096176-0.45443400
10.014125141.9036772 4.4757333 0.10464538
10.103291013.1218296 3.6441697 0.20205401
10.069157152.6923948 3.2967624 0.44818360
10.247965274.5749457 0.4187103-0.25908844
10.063443154.3150962 4.2255336 0.17914748
10.061351373.0522308 3.3308107 0.24496727
10.040141194.1432166 2.8499548 0.14002478
10.010685104.4860544 1.1749559 0.10708674
10.072082576.1268956 3.6317269 0.11618639
10.023403384.1160044 2.0272268 0.13732224
10.367952173.1915709 3.2789544 0.10186379
10.330900410.8523913 6.4572217 0.05928239
10.306857091.4329358 4.5266571 0.23338493
10.319437521.6375911 7.9338800 0.16894862
10.664187331.2743732 9.3798299 0.12295684
10.240864220.9936688 9.4355879 0.04494273
10.692665640.9936683 6.1993483 0.17269039
10.898999560.1482478 5.6734504 0.04860495
10.933145900.3128039 6.0399737 0.07985335
10.707253700.9642831 7.9493340 0.18984009
10.606401931.4674369 6.7021385 0.09358254
10.714170701.3699697 5.3074902 0.20615239
10.124618271.174256310.2837032 0.07930871
10.357185101.8335471 6.7345917 0.17267889
10.543076371.1319547 5.1073566 0.08069173
10.499455220.7121362 6.1899834 0.04285418
10.955068710.7826259 4.3393540 0.05073775
10.271265701.4070131 4.5467993 0.02018504
10.552913841.2432457 7.7006338 0.09380608
$y
A matrix: 4405 × 1 of type dbl
0.48061406
0.49789298
0.08513211
0.72337394
0.87229293
0.25224938
0.65029221
0.82784745
0.96586876
0.29379814
0.25159485
0.69433447
0.08579703
0.52781013
0.12396548
0.82881142
0.25703244
0.52647353
0.47460682
0.56260611
0.69551821
0.27355197
0.05781200
0.13346465
0.57969422
0.69786571
0.23100059
0.13815073
0.32689204
0.79901292
⋮
0.14383677
0.30137774
0.32939544
0.68419846
0.07815812
0.06176844
0.07326558
0.04638021
0.03154198
0.04460804
0.01265112
0.18559067
0.79289621
0.36263210
0.33696546
0.60398084
0.65082213
0.56969612
0.43006658
0.18838975
0.57448000
0.44272956
0.53810913
0.67301374
0.30994720
0.48864363
0.65195421
0.93939365
0.19348394
0.45197044
$yName
A matrix: 1 × 1 of type chr
debtratio
$Description
A matrix: 1 × 1 of type chr
Rajan-Zingales JF 1995 data, but with definitions from Cook, Kieschnick and McCullough, Journal of Empirical Finance 2008
$XName
A matrix: 1 × 5
consttangmbtrlogsaleprofit

Minitab, S-PLUS, SAS, SPSS, Stata, Systat¶

The recommended R package foreign provides import facilities for files produced by these statistical systems.

  • Function read.xport() reads a file in SAS Transport (XPORT) format and return a list of data frames.

  • Function read.mtp() imports a 'Minitab Portable Worksheet'. This returns the components of the worksheet as an R list

  • Function read.spss() can read files created by the 'save' and 'export' commands in SPSS

  • Files from Stata can be read and written by functions read.dta() and write.dta().

Images¶

  • JPEG Format

    install.packages("jpeg")
    require("jpeg")
    readJPEG()
    rasterImage()
  • Packages bmp, jpeg and png read the formats after which they are named. See also packages biOps and Momocs, and Bioconductor package EBImage.

In [13]:
# install.packages("jpeg")
library("jpeg")
   dim(myprofile)
image(1:1539, 1:1154, myprofile[, , 1])
image(1:1539, 1:1154, myprofile[, , 2])
image(1:1539, 1:1154, myprofile[, , 3])
  1. 1539
  2. 1154
  3. 3
In [14]:
plot(c(100, 250), c(300, 450), type = "n", xlab = "", ylab = "")
rasterImage(myprofile, 100, 300, 150, 350, interpolate = FALSE)
rasterImage(myprofile, 100, 400, 150, 450)
rasterImage(myprofile, 200, 300, 200 + xinch(.5),
            300 + yinch(.3), interpolate = FALSE)
rasterImage(myprofile, 200, 400, 250, 450,
            angle = 15, interpolate = FALSE)

R and Database¶

  • SQL has limited numerical and statistical features. For example, it has no least squares fitting procedures, and to find quantiles requires a sophisticated query.

  • Not only are basic statistical functions missing from SQL, but in many cases the numerical algorithms used in the basic aggregate functions are not implemented to safeguard numerical accuracy.

  • For these reasons, it may be desirable or even necessary to perform a statistical analysis in a statistical package rather than in the database. One way to do this, is to extract the data from the database and import it into statistical software.

  • The RODBC package provides access to databases (including Microsoft Access and Microsoft SQL Server) through an ODBC interface.

      odbcConnect(dsn, uid="", pwd="")
      sqlFetch(channel, sqtable)
      sqlQuery(channel, query)
      sqlSave(channel, mydf, tablename = sqtable, append = FALSE)
      sqlDrop(channel, sqtable)
  • Use the RODBC package to read Oracle Database.

  • The DBI package in R provides a uniform, client- side interface to different database management systems, such as MySQL, PostgreSQL, and Oracle.

  • The RMySQL package provides an interface to MySQL.

  • The ROracle package provides an interface for Oracle.

  • The RJDBC package provides access to databases through a JDBC interface.

Further Suggested Read¶

  • R Data Import/Export

    http://cran.r-project.org/doc/manuals/r-release/R-data.pdf