Linking R with external BLAS library can speed up the matrix calculations. This topic has been discussed many times in R Installation and Administration R mailing list and other places.
Here are two more official documents from Intel that I though might be useful
where the first document gives examples on how to link MKL with R for different situations. And the latter one gives very convenient way of configuring the correct linking parameters under various conditions which I found very useful.
For how to compile R with Intel compiler, please refer to the R Installation and Administration.
Below is a simple benchmark test on my Linux system showing how much one can gain by having MKL linked and/or compiling R by Intel compiler.
Configure flags
Add the following lines to “config.site”. You may change the configure parameters depending on your own situation.
## Make sure intel compiler is installed and loaded which can be set in .bashrc ## as e.g. ## . /opt/intel/bin/compilervars.sh intel64 MKL_LIB_PATH=/opt/intel/mkl/lib/intel64 ## Use intel compiler CC='icc -std=c99' CFLAGS='-g -O3 -wd188 -ip ' F77='ifort' FFLAGS='-g -O3 ' CXX='icpc' CXXFLAGS='-g -O3 ' FC='ifort' FCFLAGS='-g -O3 ' ## MKL with GNU version of Open MP threaded, GCC # MKL=" -L${MKL_LIB_PATH} \ # -Wl,--start-group \ # -lmkl_gf_lp64 \ # -lmkl_intel_thread \ # -lmkl_core \ # -Wl,--end-group \ # -lgomp -lpthread" ## MKL With Intel MP threaded , ICC # MKL=" -L${MKL_LIB_PATH} \ # -Wl,--start-group \ # -lmkl_intel_lp64 \ # -lmkl_intel_thread \ # -lmkl_core \ # -Wl,--end-group \ # -liomp5 -lpthread" ## MKL sequential, ICC MKL=" -L${MKL_LIB_PATH} \ -Wl,--start-group \ -lmkl_intel_lp64 \ -lmkl_sequential \ -lmkl_core \ -Wl,--end-group" BLAS_LIBS="$MKL"
And then compile and install R as follows
./configure --with-blas --with-lapack
make
make install
System information
- Debian Wheezy AMD64
- Intel(R) Core(TM) i7-2700K CPU @ 3.50GHz
- 16G RAM
Matrix calculation benchmark without MKL (gcc 4.7.2)
R Benchmark 2.5 =============== Number of times each test is run__________________________: 3 I. Matrix calculation --------------------- Creation, transp., deformation of a 2500x2500 matrix (sec): 0.455000000000001 2400x2400 normal distributed random matrix ^1000____ (sec): 0.383000000000002 Sorting of 7,000,000 random values__________________ (sec): 0.647666666666667 2800x2800 cross-product matrix (b = a' * a)_________ (sec): 10.75 Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec): 5.02266666666667 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 1.13963496799737 II. Matrix functions -------------------- FFT over 2,400,000 random values____________________ (sec): 0.392666666666666 Eigenvalues of a 640x640 random matrix______________ (sec): 0.73766666666666 Determinant of a 2500x2500 random matrix____________ (sec): 3.30266666666667 Cholesky decomposition of a 3000x3000 matrix________ (sec): 3.872 Inverse of a 1600x1600 random matrix________________ (sec): 3.04166666666667 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 1.94959995852139 III. Programmation ------------------ 3,500,000 Fibonacci numbers calculation (vector calc)(sec): 0.663333333333346 Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec): 0.315333333333323 Grand common divisors of 400,000 pairs (recursion)__ (sec): 1.74266666666667 Creation of a 500x500 Toeplitz matrix (loops)_______ (sec): 0.471666666666674 Escoufier's method on a 45x45 matrix (mixed)________ (sec): 0.381 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 0.492149816487262 Total time for all 15 tests_________________________ (sec): 32.179 Overall mean (sum of I, II and III trimmed means/3)_ (sec): 1.03023476346231
Matrix calculation benchmark with Intel compiler (without MKL)
R Benchmark 2.5 =============== Number of times each test is run__________________________: 3 I. Matrix calculation --------------------- Creation, transp., deformation of a 2500x2500 matrix (sec): 0.438333333333333 2400x2400 normal distributed random matrix ^1000____ (sec): 0.362666666666666 Sorting of 7,000,000 random values__________________ (sec): 0.625666666666666 2800x2800 cross-product matrix (b = a' * a)_________ (sec): 6.06 Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec): 2.66333333333333 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 0.900584248749399 II. Matrix functions -------------------- FFT over 2,400,000 random values____________________ (sec): 0.372 Eigenvalues of a 640x640 random matrix______________ (sec): 0.456999999999996 Determinant of a 2500x2500 random matrix____________ (sec): 1.85666666666667 Cholesky decomposition of a 3000x3000 matrix________ (sec): 1.44933333333334 Inverse of a 1600x1600 random matrix________________ (sec): 1.85266666666667 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 1.07060004219009 III. Programmation ------------------ 3,500,000 Fibonacci numbers calculation (vector calc)(sec): 0.510333333333335 Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec): 0.308666666666667 Grand common divisors of 400,000 pairs (recursion)__ (sec): 1.581 Creation of a 500x500 Toeplitz matrix (loops)_______ (sec): 0.408000000000001 Escoufier's method on a 45x45 matrix (mixed)________ (sec): 0.285000000000011 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 0.400560336912059 Total time for all 15 tests_________________________ (sec): 19.2306666666667 Overall mean (sum of I, II and III trimmed means/3)_ (sec): 0.728237740489568
Matrix calculation benchmark with sequential MKL (gcc 4.7.2)
R Benchmark 2.5 =============== Number of times each test is run__________________________: 3 I. Matrix calculation --------------------- Creation, transp., deformation of a 2500x2500 matrix (sec): 0.458333333333333 2400x2400 normal distributed random matrix ^1000____ (sec): 0.378 Sorting of 7,000,000 random values__________________ (sec): 0.643666666666666 2800x2800 cross-product matrix (b = a' * a)_________ (sec): 0.922666666666667 Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec): 0.482999999999999 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 0.522311832408545 II. Matrix functions -------------------- FFT over 2,400,000 random values____________________ (sec): 0.406666666666666 Eigenvalues of a 640x640 random matrix______________ (sec): 0.288999999999997 Determinant of a 2500x2500 random matrix____________ (sec): 0.497 Cholesky decomposition of a 3000x3000 matrix________ (sec): 0.438000000000002 Inverse of a 1600x1600 random matrix________________ (sec): 0.37866666666667 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 0.407058274866339 III. Programmation ------------------ 3,500,000 Fibonacci numbers calculation (vector calc)(sec): 0.648999999999996 Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec): 0.306000000000002 Grand common divisors of 400,000 pairs (recursion)__ (sec): 1.785 Creation of a 500x500 Toeplitz matrix (loops)_______ (sec): 0.455333333333328 Escoufier's method on a 45x45 matrix (mixed)________ (sec): 0.375 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 0.480324939124224 Total time for all 15 tests_________________________ (sec): 8.46533333333333 Overall mean (sum of I, II and III trimmed means/3)_ (sec): 0.467419897853855
Matrix calculation benchmark with Intel compiler and sequential MKL
R Benchmark 2.5 =============== Number of times each test is run__________________________: 3 I. Matrix calculation --------------------- Creation, transp., deformation of a 2500x2500 matrix (sec): 0.475333333333333 2400x2400 normal distributed random matrix ^1000____ (sec): 0.369 Sorting of 7,000,000 random values__________________ (sec): 0.637000000000002 2800x2800 cross-product matrix (b = a' * a)_________ (sec): 0.884666666666665 Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec): 0.451333333333332 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 0.515084369178734 II. Matrix functions -------------------- FFT over 2,400,000 random values____________________ (sec): 0.372666666666667 Eigenvalues of a 640x640 random matrix______________ (sec): 0.285999999999999 Determinant of a 2500x2500 random matrix____________ (sec): 0.504 Cholesky decomposition of a 3000x3000 matrix________ (sec): 0.429 Inverse of a 1600x1600 random matrix________________ (sec): 0.370333333333332 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 0.389753671609465 III. Programmation ------------------ 3,500,000 Fibonacci numbers calculation (vector calc)(sec): 0.474000000000001 Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec): 0.309333333333332 Grand common divisors of 400,000 pairs (recursion)__ (sec): 1.522 Creation of a 500x500 Toeplitz matrix (loops)_______ (sec): 0.431000000000002 Escoufier's method on a 45x45 matrix (mixed)________ (sec): 0.267999999999994 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 0.398315717938976 Total time for all 15 tests_________________________ (sec): 7.78366666666666 Overall mean (sum of I, II and III trimmed means/3)_ (sec): 0.430822797869348
When installing additional R packages, how to set it up such that the compiling takes the advantage of MKL?
You don’t have to do any additional setup. “R CMD install pkg” (or install.packages(“pkg”)) will automatically use linked MKL.
Thanks for this information.
Clear, concise, and worked for me!