Using Intel compiler and Intel MKL in R

Linking R with external BLAS library can speed up the matrix calculations. This topic has been discussed many times in R Installation and Administration R mailing list and other places.

Here are two more official documents from Intel that I though might be useful

where the first document gives examples on how to link MKL with R for different situations. And the latter one gives very convenient way of configuring the correct linking parameters under various conditions which I found very useful.

For how to compile R with Intel compiler, please refer to the R Installation and Administration.

Below is a simple benchmark test on my Linux system showing how much one can gain by  having MKL linked and/or compiling R by Intel compiler.

Configure flags

Add the following lines to “config.site”. You may change the configure parameters depending on your own situation.

## Make sure intel compiler is installed and loaded which can be set in .bashrc
## as e.g.
## . /opt/intel/bin/compilervars.sh intel64

MKL_LIB_PATH=/opt/intel/mkl/lib/intel64

## Use intel compiler
CC='icc -std=c99'
CFLAGS='-g -O3 -wd188 -ip '

F77='ifort'
FFLAGS='-g -O3 '

CXX='icpc'
CXXFLAGS='-g -O3 '

FC='ifort'
FCFLAGS='-g -O3 '

## MKL with GNU version of Open MP threaded, GCC
# MKL=" -L${MKL_LIB_PATH}                         \
#       -Wl,--start-group                         \
#           -lmkl_gf_lp64                         \
#           -lmkl_intel_thread                    \
#           -lmkl_core                            \
#       -Wl,--end-group                           \
#       -lgomp -lpthread"

## MKL With Intel MP threaded , ICC
# MKL=" -L${MKL_LIB_PATH}                         \
#       -Wl,--start-group                         \
#           -lmkl_intel_lp64                      \
#           -lmkl_intel_thread                    \
#           -lmkl_core                            \
#       -Wl,--end-group                           \
#       -liomp5 -lpthread"

## MKL sequential, ICC
MKL=" -L${MKL_LIB_PATH}                         \
      -Wl,--start-group                         \
          -lmkl_intel_lp64                      \
          -lmkl_sequential                      \
          -lmkl_core                            \
      -Wl,--end-group"

BLAS_LIBS="$MKL"

And then compile and install R as follows

./configure --with-blas --with-lapack
make
make install

System information

  • Debian Wheezy AMD64
  • Intel(R) Core(TM) i7-2700K CPU @ 3.50GHz
  • 16G RAM

 Matrix calculation benchmark without MKL (gcc 4.7.2)

   R Benchmark 2.5
   ===============
Number of times each test is run__________________________:  3

   I. Matrix calculation
   ---------------------
Creation, transp., deformation of a 2500x2500 matrix (sec):  0.455000000000001 
2400x2400 normal distributed random matrix ^1000____ (sec):  0.383000000000002 
Sorting of 7,000,000 random values__________________ (sec):  0.647666666666667 
2800x2800 cross-product matrix (b = a' * a)_________ (sec):  10.75 
Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec):  5.02266666666667 
                      --------------------------------------------
                 Trimmed geom. mean (2 extremes eliminated):  1.13963496799737 

   II. Matrix functions
   --------------------
FFT over 2,400,000 random values____________________ (sec):  0.392666666666666 
Eigenvalues of a 640x640 random matrix______________ (sec):  0.73766666666666 
Determinant of a 2500x2500 random matrix____________ (sec):  3.30266666666667 
Cholesky decomposition of a 3000x3000 matrix________ (sec):  3.872 
Inverse of a 1600x1600 random matrix________________ (sec):  3.04166666666667 
                      --------------------------------------------
                Trimmed geom. mean (2 extremes eliminated):  1.94959995852139 

   III. Programmation
   ------------------
3,500,000 Fibonacci numbers calculation (vector calc)(sec):  0.663333333333346 
Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec):  0.315333333333323 
Grand common divisors of 400,000 pairs (recursion)__ (sec):  1.74266666666667 
Creation of a 500x500 Toeplitz matrix (loops)_______ (sec):  0.471666666666674 
Escoufier's method on a 45x45 matrix (mixed)________ (sec):  0.381 
                      --------------------------------------------
                Trimmed geom. mean (2 extremes eliminated):  0.492149816487262 

Total time for all 15 tests_________________________ (sec):  32.179 
Overall mean (sum of I, II and III trimmed means/3)_ (sec):  1.03023476346231

 Matrix calculation benchmark with Intel compiler (without MKL)

   R Benchmark 2.5
   ===============
Number of times each test is run__________________________:  3

   I. Matrix calculation
   ---------------------
Creation, transp., deformation of a 2500x2500 matrix (sec):  0.438333333333333 
2400x2400 normal distributed random matrix ^1000____ (sec):  0.362666666666666 
Sorting of 7,000,000 random values__________________ (sec):  0.625666666666666 
2800x2800 cross-product matrix (b = a' * a)_________ (sec):  6.06 
Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec):  2.66333333333333 
                      --------------------------------------------
                 Trimmed geom. mean (2 extremes eliminated):  0.900584248749399 

   II. Matrix functions
   --------------------
FFT over 2,400,000 random values____________________ (sec):  0.372 
Eigenvalues of a 640x640 random matrix______________ (sec):  0.456999999999996 
Determinant of a 2500x2500 random matrix____________ (sec):  1.85666666666667 
Cholesky decomposition of a 3000x3000 matrix________ (sec):  1.44933333333334 
Inverse of a 1600x1600 random matrix________________ (sec):  1.85266666666667 
                      --------------------------------------------
                Trimmed geom. mean (2 extremes eliminated):  1.07060004219009 

   III. Programmation
   ------------------
3,500,000 Fibonacci numbers calculation (vector calc)(sec):  0.510333333333335 
Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec):  0.308666666666667 
Grand common divisors of 400,000 pairs (recursion)__ (sec):  1.581 
Creation of a 500x500 Toeplitz matrix (loops)_______ (sec):  0.408000000000001 
Escoufier's method on a 45x45 matrix (mixed)________ (sec):  0.285000000000011 
                      --------------------------------------------
                Trimmed geom. mean (2 extremes eliminated):  0.400560336912059 

Total time for all 15 tests_________________________ (sec):  19.2306666666667 
Overall mean (sum of I, II and III trimmed means/3)_ (sec):  0.728237740489568

 Matrix calculation benchmark with sequential MKL (gcc 4.7.2)

   R Benchmark 2.5
   ===============
Number of times each test is run__________________________:  3

   I. Matrix calculation
   ---------------------
Creation, transp., deformation of a 2500x2500 matrix (sec):  0.458333333333333 
2400x2400 normal distributed random matrix ^1000____ (sec):  0.378 
Sorting of 7,000,000 random values__________________ (sec):  0.643666666666666 
2800x2800 cross-product matrix (b = a' * a)_________ (sec):  0.922666666666667 
Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec):  0.482999999999999 
                      --------------------------------------------
                 Trimmed geom. mean (2 extremes eliminated):  0.522311832408545 

   II. Matrix functions
   --------------------
FFT over 2,400,000 random values____________________ (sec):  0.406666666666666 
Eigenvalues of a 640x640 random matrix______________ (sec):  0.288999999999997 
Determinant of a 2500x2500 random matrix____________ (sec):  0.497 
Cholesky decomposition of a 3000x3000 matrix________ (sec):  0.438000000000002 
Inverse of a 1600x1600 random matrix________________ (sec):  0.37866666666667 
                      --------------------------------------------
                Trimmed geom. mean (2 extremes eliminated):  0.407058274866339 

   III. Programmation
   ------------------
3,500,000 Fibonacci numbers calculation (vector calc)(sec):  0.648999999999996 
Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec):  0.306000000000002 
Grand common divisors of 400,000 pairs (recursion)__ (sec):  1.785 
Creation of a 500x500 Toeplitz matrix (loops)_______ (sec):  0.455333333333328 
Escoufier's method on a 45x45 matrix (mixed)________ (sec):  0.375 
                      --------------------------------------------
                Trimmed geom. mean (2 extremes eliminated):  0.480324939124224 

Total time for all 15 tests_________________________ (sec):  8.46533333333333 
Overall mean (sum of I, II and III trimmed means/3)_ (sec):  0.467419897853855

 Matrix calculation benchmark with Intel compiler and sequential MKL

   R Benchmark 2.5
   ===============
Number of times each test is run__________________________:  3

   I. Matrix calculation
   ---------------------
Creation, transp., deformation of a 2500x2500 matrix (sec):  0.475333333333333 
2400x2400 normal distributed random matrix ^1000____ (sec):  0.369 
Sorting of 7,000,000 random values__________________ (sec):  0.637000000000002 
2800x2800 cross-product matrix (b = a' * a)_________ (sec):  0.884666666666665 
Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec):  0.451333333333332 
                      --------------------------------------------
                 Trimmed geom. mean (2 extremes eliminated):  0.515084369178734 

   II. Matrix functions
   --------------------
FFT over 2,400,000 random values____________________ (sec):  0.372666666666667 
Eigenvalues of a 640x640 random matrix______________ (sec):  0.285999999999999 
Determinant of a 2500x2500 random matrix____________ (sec):  0.504 
Cholesky decomposition of a 3000x3000 matrix________ (sec):  0.429 
Inverse of a 1600x1600 random matrix________________ (sec):  0.370333333333332 
                      --------------------------------------------
                Trimmed geom. mean (2 extremes eliminated):  0.389753671609465 

   III. Programmation
   ------------------
3,500,000 Fibonacci numbers calculation (vector calc)(sec):  0.474000000000001 
Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec):  0.309333333333332 
Grand common divisors of 400,000 pairs (recursion)__ (sec):  1.522 
Creation of a 500x500 Toeplitz matrix (loops)_______ (sec):  0.431000000000002 
Escoufier's method on a 45x45 matrix (mixed)________ (sec):  0.267999999999994 
                      --------------------------------------------
                Trimmed geom. mean (2 extremes eliminated):  0.398315717938976 

Total time for all 15 tests_________________________ (sec):  7.78366666666666 
Overall mean (sum of I, II and III trimmed means/3)_ (sec):  0.430822797869348

Comments

3 responses to “Using Intel compiler and Intel MKL in R”

  1. Ander Avatar
    Ander

    When installing additional R packages, how to set it up such that the compiling takes the advantage of MKL?

    1. You don’t have to do any additional setup. “R CMD install pkg” (or install.packages(“pkg”)) will automatically use linked MKL.

  2. Sang Avatar
    Sang

    Thanks for this information.
    Clear, concise, and worked for me!

Leave a Reply

Your email address will not be published. Required fields are marked *