Linking R with external BLAS library can speed up the matrix calculations. This topic has been discussed many times in R Installation and Administration R mailing list and other places.
Here are two more official documents from Intel that I though might be useful
where the first document gives examples on how to link MKL with R for different situations. And the latter one gives very convenient way of configuring the correct linking parameters under various conditions which I found very useful.
For how to compile R with Intel compiler, please refer to the R Installation and Administration.
Below is a simple benchmark test on my Linux system showing how much one can gain by having MKL linked and/or compiling R by Intel compiler.
Contents
Configure flags
Add the following lines to “config.site”. You may change the configure parameters depending on your own situation.
## Make sure intel compiler is installed and loaded which can be set in .bashrc ## as e.g. ## . /opt/intel/bin/compilervars.sh intel64 MKL_LIB_PATH=/opt/intel/mkl/lib/intel64 ## Use intel compiler CC='icc -std=c99' CFLAGS='-g -O3 -wd188 -ip ' F77='ifort' FFLAGS='-g -O3 ' CXX='icpc' CXXFLAGS='-g -O3 ' FC='ifort' FCFLAGS='-g -O3 ' ## MKL with GNU version of Open MP threaded, GCC # MKL=" -L${MKL_LIB_PATH} \ # -Wl,--start-group \ # -lmkl_gf_lp64 \ # -lmkl_intel_thread \ # -lmkl_core \ # -Wl,--end-group \ # -lgomp -lpthread" ## MKL With Intel MP threaded , ICC # MKL=" -L${MKL_LIB_PATH} \ # -Wl,--start-group \ # -lmkl_intel_lp64 \ # -lmkl_intel_thread \ # -lmkl_core \ # -Wl,--end-group \ # -liomp5 -lpthread" ## MKL sequential, ICC MKL=" -L${MKL_LIB_PATH} \ -Wl,--start-group \ -lmkl_intel_lp64 \ -lmkl_sequential \ -lmkl_core \ -Wl,--end-group" BLAS_LIBS="$MKL"
And then compile and install R as follows
./configure --with-blas --with-lapack
make
make install
System information
- Debian Wheezy AMD64
- Intel(R) Core(TM) i7-2700K CPU @ 3.50GHz
- 16G RAM
Matrix calculation benchmark without MKL (gcc 4.7.2)
R Benchmark 2.5 =============== Number of times each test is run__________________________: 3 I. Matrix calculation --------------------- Creation, transp., deformation of a 2500x2500 matrix (sec): 0.455000000000001 2400x2400 normal distributed random matrix ^1000____ (sec): 0.383000000000002 Sorting of 7,000,000 random values__________________ (sec): 0.647666666666667 2800x2800 cross-product matrix (b = a' * a)_________ (sec): 10.75 Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec): 5.02266666666667 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 1.13963496799737 II. Matrix functions -------------------- FFT over 2,400,000 random values____________________ (sec): 0.392666666666666 Eigenvalues of a 640x640 random matrix______________ (sec): 0.73766666666666 Determinant of a 2500x2500 random matrix____________ (sec): 3.30266666666667 Cholesky decomposition of a 3000x3000 matrix________ (sec): 3.872 Inverse of a 1600x1600 random matrix________________ (sec): 3.04166666666667 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 1.94959995852139 III. Programmation ------------------ 3,500,000 Fibonacci numbers calculation (vector calc)(sec): 0.663333333333346 Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec): 0.315333333333323 Grand common divisors of 400,000 pairs (recursion)__ (sec): 1.74266666666667 Creation of a 500x500 Toeplitz matrix (loops)_______ (sec): 0.471666666666674 Escoufier's method on a 45x45 matrix (mixed)________ (sec): 0.381 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 0.492149816487262 Total time for all 15 tests_________________________ (sec): 32.179 Overall mean (sum of I, II and III trimmed means/3)_ (sec): 1.03023476346231
Matrix calculation benchmark with Intel compiler (without MKL)
R Benchmark 2.5 =============== Number of times each test is run__________________________: 3 I. Matrix calculation --------------------- Creation, transp., deformation of a 2500x2500 matrix (sec): 0.438333333333333 2400x2400 normal distributed random matrix ^1000____ (sec): 0.362666666666666 Sorting of 7,000,000 random values__________________ (sec): 0.625666666666666 2800x2800 cross-product matrix (b = a' * a)_________ (sec): 6.06 Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec): 2.66333333333333 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 0.900584248749399 II. Matrix functions -------------------- FFT over 2,400,000 random values____________________ (sec): 0.372 Eigenvalues of a 640x640 random matrix______________ (sec): 0.456999999999996 Determinant of a 2500x2500 random matrix____________ (sec): 1.85666666666667 Cholesky decomposition of a 3000x3000 matrix________ (sec): 1.44933333333334 Inverse of a 1600x1600 random matrix________________ (sec): 1.85266666666667 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 1.07060004219009 III. Programmation ------------------ 3,500,000 Fibonacci numbers calculation (vector calc)(sec): 0.510333333333335 Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec): 0.308666666666667 Grand common divisors of 400,000 pairs (recursion)__ (sec): 1.581 Creation of a 500x500 Toeplitz matrix (loops)_______ (sec): 0.408000000000001 Escoufier's method on a 45x45 matrix (mixed)________ (sec): 0.285000000000011 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 0.400560336912059 Total time for all 15 tests_________________________ (sec): 19.2306666666667 Overall mean (sum of I, II and III trimmed means/3)_ (sec): 0.728237740489568
Matrix calculation benchmark with sequential MKL (gcc 4.7.2)
R Benchmark 2.5 =============== Number of times each test is run__________________________: 3 I. Matrix calculation --------------------- Creation, transp., deformation of a 2500x2500 matrix (sec): 0.458333333333333 2400x2400 normal distributed random matrix ^1000____ (sec): 0.378 Sorting of 7,000,000 random values__________________ (sec): 0.643666666666666 2800x2800 cross-product matrix (b = a' * a)_________ (sec): 0.922666666666667 Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec): 0.482999999999999 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 0.522311832408545 II. Matrix functions -------------------- FFT over 2,400,000 random values____________________ (sec): 0.406666666666666 Eigenvalues of a 640x640 random matrix______________ (sec): 0.288999999999997 Determinant of a 2500x2500 random matrix____________ (sec): 0.497 Cholesky decomposition of a 3000x3000 matrix________ (sec): 0.438000000000002 Inverse of a 1600x1600 random matrix________________ (sec): 0.37866666666667 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 0.407058274866339 III. Programmation ------------------ 3,500,000 Fibonacci numbers calculation (vector calc)(sec): 0.648999999999996 Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec): 0.306000000000002 Grand common divisors of 400,000 pairs (recursion)__ (sec): 1.785 Creation of a 500x500 Toeplitz matrix (loops)_______ (sec): 0.455333333333328 Escoufier's method on a 45x45 matrix (mixed)________ (sec): 0.375 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 0.480324939124224 Total time for all 15 tests_________________________ (sec): 8.46533333333333 Overall mean (sum of I, II and III trimmed means/3)_ (sec): 0.467419897853855
Matrix calculation benchmark with Intel compiler and sequential MKL
R Benchmark 2.5 =============== Number of times each test is run__________________________: 3 I. Matrix calculation --------------------- Creation, transp., deformation of a 2500x2500 matrix (sec): 0.475333333333333 2400x2400 normal distributed random matrix ^1000____ (sec): 0.369 Sorting of 7,000,000 random values__________________ (sec): 0.637000000000002 2800x2800 cross-product matrix (b = a' * a)_________ (sec): 0.884666666666665 Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec): 0.451333333333332 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 0.515084369178734 II. Matrix functions -------------------- FFT over 2,400,000 random values____________________ (sec): 0.372666666666667 Eigenvalues of a 640x640 random matrix______________ (sec): 0.285999999999999 Determinant of a 2500x2500 random matrix____________ (sec): 0.504 Cholesky decomposition of a 3000x3000 matrix________ (sec): 0.429 Inverse of a 1600x1600 random matrix________________ (sec): 0.370333333333332 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 0.389753671609465 III. Programmation ------------------ 3,500,000 Fibonacci numbers calculation (vector calc)(sec): 0.474000000000001 Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec): 0.309333333333332 Grand common divisors of 400,000 pairs (recursion)__ (sec): 1.522 Creation of a 500x500 Toeplitz matrix (loops)_______ (sec): 0.431000000000002 Escoufier's method on a 45x45 matrix (mixed)________ (sec): 0.267999999999994 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 0.398315717938976 Total time for all 15 tests_________________________ (sec): 7.78366666666666 Overall mean (sum of I, II and III trimmed means/3)_ (sec): 0.430822797869348
Leave a Reply