Using Intel compiler and Intel MKL in R

Linking R with external BLAS library can speed up the matrix calculations. This topic has been discussed many times in R Installation and Administration R mailing list and other places.

Here are two more official documents from Intel that I though might be useful

where the first document gives examples on how to link MKL with R for different situations. And the latter one gives very convenient way of configuring the correct linking parameters under various conditions which I found very useful.

For how to compile R with Intel compiler, please refer to the R Installation and Administration.

Below is a simple benchmark test on my Linux system showing how much one can gain by  having MKL linked and/or compiling R by Intel compiler.

Configure flags

Add the following lines to “config.site”. You may change the configure parameters depending on your own situation.

## Make sure intel compiler is installed and loaded which can be set in .bashrc
## as e.g.
## . /opt/intel/bin/compilervars.sh intel64

MKL_LIB_PATH=/opt/intel/mkl/lib/intel64

## Use intel compiler
CC='icc -std=c99'
CFLAGS='-g -O3 -wd188 -ip '

F77='ifort'
FFLAGS='-g -O3 '

CXX='icpc'
CXXFLAGS='-g -O3 '

FC='ifort'
FCFLAGS='-g -O3 '

## MKL with GNU version of Open MP threaded, GCC
# MKL=" -L${MKL_LIB_PATH}                         \
#       -Wl,--start-group                         \
#           -lmkl_gf_lp64                         \
#           -lmkl_intel_thread                    \
#           -lmkl_core                            \
#       -Wl,--end-group                           \
#       -lgomp -lpthread"

## MKL With Intel MP threaded , ICC
# MKL=" -L${MKL_LIB_PATH}                         \
#       -Wl,--start-group                         \
#           -lmkl_intel_lp64                      \
#           -lmkl_intel_thread                    \
#           -lmkl_core                            \
#       -Wl,--end-group                           \
#       -liomp5 -lpthread"

## MKL sequential, ICC
MKL=" -L${MKL_LIB_PATH}                         \
      -Wl,--start-group                         \
          -lmkl_intel_lp64                      \
          -lmkl_sequential                      \
          -lmkl_core                            \
      -Wl,--end-group"

BLAS_LIBS="$MKL"

And then compile and install R as follows

./configure --with-blas --with-lapack
make
make install

System information

  • Debian Wheezy AMD64
  • Intel(R) Core(TM) i7-2700K CPU @ 3.50GHz
  • 16G RAM

 Matrix calculation benchmark without MKL (gcc 4.7.2)

   R Benchmark 2.5
   ===============
Number of times each test is run__________________________:  3

   I. Matrix calculation
   ---------------------
Creation, transp., deformation of a 2500x2500 matrix (sec):  0.455000000000001 
2400x2400 normal distributed random matrix ^1000____ (sec):  0.383000000000002 
Sorting of 7,000,000 random values__________________ (sec):  0.647666666666667 
2800x2800 cross-product matrix (b = a' * a)_________ (sec):  10.75 
Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec):  5.02266666666667 
                      --------------------------------------------
                 Trimmed geom. mean (2 extremes eliminated):  1.13963496799737 

   II. Matrix functions
   --------------------
FFT over 2,400,000 random values____________________ (sec):  0.392666666666666 
Eigenvalues of a 640x640 random matrix______________ (sec):  0.73766666666666 
Determinant of a 2500x2500 random matrix____________ (sec):  3.30266666666667 
Cholesky decomposition of a 3000x3000 matrix________ (sec):  3.872 
Inverse of a 1600x1600 random matrix________________ (sec):  3.04166666666667 
                      --------------------------------------------
                Trimmed geom. mean (2 extremes eliminated):  1.94959995852139 

   III. Programmation
   ------------------
3,500,000 Fibonacci numbers calculation (vector calc)(sec):  0.663333333333346 
Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec):  0.315333333333323 
Grand common divisors of 400,000 pairs (recursion)__ (sec):  1.74266666666667 
Creation of a 500x500 Toeplitz matrix (loops)_______ (sec):  0.471666666666674 
Escoufier's method on a 45x45 matrix (mixed)________ (sec):  0.381 
                      --------------------------------------------
                Trimmed geom. mean (2 extremes eliminated):  0.492149816487262 

Total time for all 15 tests_________________________ (sec):  32.179 
Overall mean (sum of I, II and III trimmed means/3)_ (sec):  1.03023476346231

 Matrix calculation benchmark with Intel compiler (without MKL)

   R Benchmark 2.5
   ===============
Number of times each test is run__________________________:  3

   I. Matrix calculation
   ---------------------
Creation, transp., deformation of a 2500x2500 matrix (sec):  0.438333333333333 
2400x2400 normal distributed random matrix ^1000____ (sec):  0.362666666666666 
Sorting of 7,000,000 random values__________________ (sec):  0.625666666666666 
2800x2800 cross-product matrix (b = a' * a)_________ (sec):  6.06 
Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec):  2.66333333333333 
                      --------------------------------------------
                 Trimmed geom. mean (2 extremes eliminated):  0.900584248749399 

   II. Matrix functions
   --------------------
FFT over 2,400,000 random values____________________ (sec):  0.372 
Eigenvalues of a 640x640 random matrix______________ (sec):  0.456999999999996 
Determinant of a 2500x2500 random matrix____________ (sec):  1.85666666666667 
Cholesky decomposition of a 3000x3000 matrix________ (sec):  1.44933333333334 
Inverse of a 1600x1600 random matrix________________ (sec):  1.85266666666667 
                      --------------------------------------------
                Trimmed geom. mean (2 extremes eliminated):  1.07060004219009 

   III. Programmation
   ------------------
3,500,000 Fibonacci numbers calculation (vector calc)(sec):  0.510333333333335 
Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec):  0.308666666666667 
Grand common divisors of 400,000 pairs (recursion)__ (sec):  1.581 
Creation of a 500x500 Toeplitz matrix (loops)_______ (sec):  0.408000000000001 
Escoufier's method on a 45x45 matrix (mixed)________ (sec):  0.285000000000011 
                      --------------------------------------------
                Trimmed geom. mean (2 extremes eliminated):  0.400560336912059 

Total time for all 15 tests_________________________ (sec):  19.2306666666667 
Overall mean (sum of I, II and III trimmed means/3)_ (sec):  0.728237740489568

 Matrix calculation benchmark with sequential MKL (gcc 4.7.2)

   R Benchmark 2.5
   ===============
Number of times each test is run__________________________:  3

   I. Matrix calculation
   ---------------------
Creation, transp., deformation of a 2500x2500 matrix (sec):  0.458333333333333 
2400x2400 normal distributed random matrix ^1000____ (sec):  0.378 
Sorting of 7,000,000 random values__________________ (sec):  0.643666666666666 
2800x2800 cross-product matrix (b = a' * a)_________ (sec):  0.922666666666667 
Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec):  0.482999999999999 
                      --------------------------------------------
                 Trimmed geom. mean (2 extremes eliminated):  0.522311832408545 

   II. Matrix functions
   --------------------
FFT over 2,400,000 random values____________________ (sec):  0.406666666666666 
Eigenvalues of a 640x640 random matrix______________ (sec):  0.288999999999997 
Determinant of a 2500x2500 random matrix____________ (sec):  0.497 
Cholesky decomposition of a 3000x3000 matrix________ (sec):  0.438000000000002 
Inverse of a 1600x1600 random matrix________________ (sec):  0.37866666666667 
                      --------------------------------------------
                Trimmed geom. mean (2 extremes eliminated):  0.407058274866339 

   III. Programmation
   ------------------
3,500,000 Fibonacci numbers calculation (vector calc)(sec):  0.648999999999996 
Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec):  0.306000000000002 
Grand common divisors of 400,000 pairs (recursion)__ (sec):  1.785 
Creation of a 500x500 Toeplitz matrix (loops)_______ (sec):  0.455333333333328 
Escoufier's method on a 45x45 matrix (mixed)________ (sec):  0.375 
                      --------------------------------------------
                Trimmed geom. mean (2 extremes eliminated):  0.480324939124224 

Total time for all 15 tests_________________________ (sec):  8.46533333333333 
Overall mean (sum of I, II and III trimmed means/3)_ (sec):  0.467419897853855

 Matrix calculation benchmark with Intel compiler and sequential MKL

   R Benchmark 2.5
   ===============
Number of times each test is run__________________________:  3

   I. Matrix calculation
   ---------------------
Creation, transp., deformation of a 2500x2500 matrix (sec):  0.475333333333333 
2400x2400 normal distributed random matrix ^1000____ (sec):  0.369 
Sorting of 7,000,000 random values__________________ (sec):  0.637000000000002 
2800x2800 cross-product matrix (b = a' * a)_________ (sec):  0.884666666666665 
Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec):  0.451333333333332 
                      --------------------------------------------
                 Trimmed geom. mean (2 extremes eliminated):  0.515084369178734 

   II. Matrix functions
   --------------------
FFT over 2,400,000 random values____________________ (sec):  0.372666666666667 
Eigenvalues of a 640x640 random matrix______________ (sec):  0.285999999999999 
Determinant of a 2500x2500 random matrix____________ (sec):  0.504 
Cholesky decomposition of a 3000x3000 matrix________ (sec):  0.429 
Inverse of a 1600x1600 random matrix________________ (sec):  0.370333333333332 
                      --------------------------------------------
                Trimmed geom. mean (2 extremes eliminated):  0.389753671609465 

   III. Programmation
   ------------------
3,500,000 Fibonacci numbers calculation (vector calc)(sec):  0.474000000000001 
Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec):  0.309333333333332 
Grand common divisors of 400,000 pairs (recursion)__ (sec):  1.522 
Creation of a 500x500 Toeplitz matrix (loops)_______ (sec):  0.431000000000002 
Escoufier's method on a 45x45 matrix (mixed)________ (sec):  0.267999999999994 
                      --------------------------------------------
                Trimmed geom. mean (2 extremes eliminated):  0.398315717938976 

Total time for all 15 tests_________________________ (sec):  7.78366666666666 
Overall mean (sum of I, II and III trimmed means/3)_ (sec):  0.430822797869348

3 thoughts on “Using Intel compiler and Intel MKL in R”

Leave a Reply

Your email address will not be published. Required fields are marked *