The **Chinese translation** was produced by a team led by Professor Yanfei Kang (Beihang University) and Professor Feng Li (Central University of Finance and Economics). The following students were also involved: Cheng Fan, Liu Yu, Long Xiaoyu, Wang Xiaoqian, Zeng Jiayue, Zhang Bohan, and Zhu Shuaidong.

**Abstract: **Feature-based time series representation has attracted substantial attention in a wide range of time series analysis methods. Recently, the use of time series features for forecast model selection and model averaging has been an emerging research focus in the forecasting community. Nonetheless, most of the existing approaches depend on the manual choice of an appropriate set of features. Exploiting machine learning methods to automatically extract features from time series becomes crucially important in the state-of-the-art time series analysis. In this paper, we introduce an automated approach to extract time series features based on images. Time series are first transformed into recurrence images, from which local features can be extracted using computer vision algorithms. The extracted features are used for forecast model selection and model averaging. Our experiments show that forecasting based on automatically extracted features, with less human intervention and a more comprehensive view of the raw time series data, yields comparable performances with the top best methods proposed in the largest forecasting competition M4.

**Links: **Working Paper

I am organizing a special thematic session (of either 3 or 4 talks of 30 minutes each) in *Monte Carlo Methods for Large Dependent Data*. If you are interested in contributing a talk, please let me know.

Confirmed Speakers:

- Clara Grazian, University of Oxford
- Ruben Loaiza-Maya, Monash University
- Yanfei Kang, Beihang University
- Feng Li, Central University of Finance and Economics

Time: 14:00-17:00 on Nov 31, Dec 7, 14, 21

Venue: Guanghua Building #1, Room 114

Checkout this link for more details.

]]>**Working Paper on arXiv****Associated R package**: tsgeneration**Associated Rshiny app**: tsgeneration

The slides mentioned in this talk are available here. A text format for this talk is also available for download.

]]>Full program details are available online.

Slides for the keynote speaker are available here.

]]>https://cran.r-project.org/web/packages/dng/index.html

At the moment, this package includes the distribution function and random generating process for the “split-t” density and gradient base on my previous paper “Flexible Modeling of Conditional Distributions using Smooth Mixtures of Asymmetric Student T Densities”, *Journal of Statistical Planning and Inference*. We plan to add more densities.

Thank you Jiayue for the hard work.

]]>Understanding how corporate defaults cluster is particularly important for risk management of portfolios of corporate debt. In this paper, we discuss the dynamic nature of the clustering of credit risk across firms pairwise in the same family corporation in China. We insert the tail-dependence coefficient into the Joe-Clayton copula model directly through a reparameterized methodology to estimate the tail-dependence structure of credit risk. We also use both macroeconomic and firm-specific covariates to study the dynamic nature of the lower tail-dependence coefficient of distance-to-default which measures the credit risk clustering, and to find the driving forces behind credit risk clustering. Empirical results indicate that both macroeconomic and firm-specific covariates play important roles in the time-varying features of credit risk clustering. However, for different pairwise portfolios, these macroeconomic and firm-specific covariates have different effects.

**Keywords**: Credit risk clustering; Covariate-dependent copulas; tail-dependence; distance-to-default; MCMC.

In our study we apply our method to total of 45 pairwise firms. There are 39 pairs showing significant results, three of which are already explained in the paper. The supplementary material shows the empirical results of credit risk clustering across 36 significant pairwise firms not listed in the paper.

]]>*Modified from http://www.phdcomics.com/comics/archive.php?comicid=1795*

Statistical methods have developed rapidly in the past twenty years. One driving factor is that more and more complicated high-dimensional data require sophisticated data analysis methods. A noticeably successful case is the machine learning field which is now widely used in industry. Another reason is the dramatic advancements in the statistical computational environment. Computationally intensive methods that in the past could only be run on expensive super computers are now possible to run on a standard PC. This has created an enormous momentum for Bayesian analysis where complex models are typically analyzed with modern computer-intensive simulation methods.

Traditional linear models with Gaussian assumptions are challenged by the new large complicated datasets, which have in turn spurred interest in new approaches with flexible modeling with less restrictive assumptions. Moreover, research has shifted from merely modeling the mean and variance of the data to sophisticated modeling of skewness, tail-dependence, and outliers. However such work demands efficient inference tools. The development of highly efficient Markov chain Monte Carlo (MCMC) methods has reduced the barrier. The Bayesian approach provides a natural way for prediction, model comparison and evaluation of complicated models, and has the additional advantage of being intimately connected with decision making.

In statistics, density estimation is the procedure of estimating an unknown density p(y) from observed data. Density estimation techniques trace back to the use of histograms, later followed by kernel density estimation in which the shape of the data is approximated through a kernel function with a smoothing parameter. However, kernel density estimation suffers from one obstacle, which is the necessary step of specifying the bandwidth.

Mixture models have recently become a popular alternative approach. A mixture density is a combination of different densities with different weights. Usually the mixture density is a weighted sum of densities. Mixture densities can be used to capture data characteristics such as multi-modality, fat tails, and skewness. Figure 1 shows four examples.

[Fig-1Using mixture of normal densities (thin lines) to mimic a flexible density (bold line)]

Conditional density estimation concentrates on the modeling of the relationship between a response *y* and set of covariates *x* through a conditional density function p(y|x). In the simplest case, the homoscedastic Gaussian linear regression y = x′ β + ε is trivially equivalent to modeling p(y|x) by a Gaussian density with mean function μ = x′ β and constant variance.

In Bayesian statistics, inference of an unknown quantity θ, say p(θ|y), combines data information y, p(y|θ), with prior beliefs about θ, p(θ). In many simple statistical models with vague priors that play a minimal role in the posterior distribution, Bayesian inference draws similar conclusions to those obtained from a traditional frequentist approach. The Bayesian approach is however more easily extended to more complicated models using MCMC simulation techniques. In principle, MCMC can be applied to many hard-to-estimate models. However the efficiency depends heavily on how efficient the MCMC algorithm is. This is especially true in nonlinear models with many correlated parameters.

A key factor for evaluating a method’s performance is to check how it balances the trade-off between goodness-of-fit and overfitting. It is common that if a model wins in goodness-of-fit, it will lose in prediction. Variable selection is a technique that is commonly used in such a context. Historically the purposes for using variable selection are to select meaningful covariates that contribute to the model, inhibit ill-behaved design matrices, and prevent model overfitting. Methods like backward and forward selection are standard routines in most statistical software packages. However, the drawbacks are obvious in those techniques, e.g. the selection depends heavily on the starting points, which becomes more problematic with high dimensional data with many covariates. Most current methods rely on Bayesian variable selection via MCMC. A standard Bayesian variable selection approach is to augment the regression model with a variable selection indicator for each covariate. For the purpose of overcoming problems with overfitting, shrinkage estimation can also be used as an alternative, or even complementary, approach to variable selection. A shrinkage estimator shrinks the regression coefficients towards zero rather than eliminating the covariate completely. One way to select a proper value of the shrinkage is by cross-validation.

Modeling the volatility and variability in financial data has been a highly active research area since the seminal paper by Engle introduced the ARCH model in 1982, and there are large financial markets for volatility-based instruments. Financial data, such as stock market returns, are typically heavy tailed and subject to volatility clustering, i.e. time-varying variance. They also frequently show skewness and kurtosis that evolve in a very persistent fashion or they may have been the result of a financial crisis with an unprecedented volatility, see Figure 2 for modeling the degree of freedom with S&P 500 returns. To model such data requires sophisticated MCMC treatment, but in return, we obtain better insights into a situation that other methods can hardly tackle.

[Fig-2Time series plot of the posterior median and 95% probability intervals for kurtosis in terms of degrees of freedom of the return distribution for S&P 500 stock returns.]

LIDAR, Light Detection And Ranging, is a technique that uses laser-emitted light to detect chemical compounds in the atmosphere. In the dataset we have analyzed, the response variable (logratio) consists of 221 observations on the log ratio of received light from two laser sources: one at the resonance frequency of the target compound, and the other from a frequency off this target frequency. The predictor is the distance traveled before the light is reflected back to its source (range). Our aim is to model the predictive density p(logratio | range). A smooth mixture of asymmetric densities is used to model such predictive density which involves in a large number of parameters, see Figure 3 for the fitted curve with the confidence band. It is therefore likely to over-fit the data unless model complexity is controlled effectively. Bayesian variable selection in all parameters can lead to important simplifications of the mixture components. Not only does this control complexity for a given number of components, but it also simplifies the existing components if an additional component is added to the model.

[Fig-3Smooth mixture models for the LIDAR data. The figure displays the actual data overlayed on predictive regions and the predictive mean.]

In finance applications, a firm’s leverage (fraction of external financing) is usually modeled as a function of the proportion of fixed assets, the firm’s market value in relation to its book value, firm sales, and profits. The relationships between leverage and the covariates are highly nonlinear. There are also outliers. Strong nonlinearities seem to be a quite general feature of balance sheet data, but only a handful articles have suggested using nonlinear/nonparametric models. One attempt is to extend the regression model by introducing a lot of auxiliary variables, aka *splines. *A nonlinear curve/surface can then be constructed by choosing the correct number of splines and placing them in the right covariate space (see Figure 4 for the fitted mean curve and the standard deviation). Nonetheless, correctly allocating the splines in covariate space is not trivial. Bayesian methods treat the locations as unknown parameters that efficiently allocate the splines and therefore keep the number of splines to a minimum. Compared with the traditional deterministic spline approach, the Bayesian approach allows the splines to move freely in the covariate space and provides a dynamic surface with the measurement of surface uncertainty.

[Fig-4.1, Fig-4.2The posterior mean (left) and standard deviation (right) of the posterior surface for the model for firm leverage data. The depth of the color indicates how the leverage varies with book values and profits. The subplot to the right also shows an overlay of the covariate observations.]

In the 1950s, linear regression model that was considered as very advanced is now the standard course content for university students. The data are much more complicated nowadays not only because the volume increases but also the structure is much more complicated. Very high-dimensional data that are a mix of numeric variables, character strings, images, or videos are not rare anymore. Sophisticated models are essential for such a situation. In principle, the complicated model should be able to capture more complicated data features but estimating and interpreting such a model is not obvious. Personally speaking, there is a huge space to explore computationally and statistically. Statistical models that can adapt to modern computational architectures already flourish in industry. Techniques like high performance computing will be more widely used in statistics and will be made aware to statisticians eventually.

(*I would like to thank Professor Mattias Villani who introduced me to this exciting area.*)

I am humbled to have been selected for this award. Thanks should go to all the people at Cramér Society for reviewing my thesis and thank all the people including my former supervisor Professor Mattias Villani who helped me so much during my PhD studies. Unfortunately I was too busy to fly to Stockholm during that week. So I cast a video presentation for the The Cramér Society annual meeting. This video is now available on YouTube or you can download it from this link.

Read more from my home university.

]]>Here are two more official documents from Intel that I though might be useful

where the first document gives examples on how to link MKL with R for different situations. And the latter one gives very convenient way of configuring the correct linking parameters under various conditions which I found very useful.

For how to compile R with Intel compiler, please refer to the R Installation and Administration.

Below is a simple benchmark test on my Linux system showing how much one can gain by having MKL linked and/or compiling R by Intel compiler.

Add the following lines to “config.site”. You may change the configure parameters depending on your own situation.

## Make sure intel compiler is installed and loaded which can be set in .bashrc ## as e.g. ## . /opt/intel/bin/compilervars.sh intel64 MKL_LIB_PATH=/opt/intel/mkl/lib/intel64 ## Use intel compiler CC='icc -std=c99' CFLAGS='-g -O3 -wd188 -ip ' F77='ifort' FFLAGS='-g -O3 ' CXX='icpc' CXXFLAGS='-g -O3 ' FC='ifort' FCFLAGS='-g -O3 ' ## MKL with GNU version of Open MP threaded, GCC # MKL=" -L${MKL_LIB_PATH} \ # -Wl,--start-group \ # -lmkl_gf_lp64 \ # -lmkl_intel_thread \ # -lmkl_core \ # -Wl,--end-group \ # -lgomp -lpthread" ## MKL With Intel MP threaded , ICC # MKL=" -L${MKL_LIB_PATH} \ # -Wl,--start-group \ # -lmkl_intel_lp64 \ # -lmkl_intel_thread \ # -lmkl_core \ # -Wl,--end-group \ # -liomp5 -lpthread" ## MKL sequential, ICC MKL=" -L${MKL_LIB_PATH} \ -Wl,--start-group \ -lmkl_intel_lp64 \ -lmkl_sequential \ -lmkl_core \ -Wl,--end-group" BLAS_LIBS="$MKL"

And then compile and install R as follows

./configure --with-blas --with-lapack

make

make install

- Debian Wheezy AMD64
- Intel(R) Core(TM) i7-2700K CPU @ 3.50GHz
- 16G RAM

R Benchmark 2.5 =============== Number of times each test is run__________________________: 3 I. Matrix calculation --------------------- Creation, transp., deformation of a 2500x2500 matrix (sec): 0.455000000000001 2400x2400 normal distributed random matrix ^1000____ (sec): 0.383000000000002 Sorting of 7,000,000 random values__________________ (sec): 0.647666666666667 2800x2800 cross-product matrix (b = a' * a)_________ (sec): 10.75 Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec): 5.02266666666667 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 1.13963496799737 II. Matrix functions -------------------- FFT over 2,400,000 random values____________________ (sec): 0.392666666666666 Eigenvalues of a 640x640 random matrix______________ (sec): 0.73766666666666 Determinant of a 2500x2500 random matrix____________ (sec): 3.30266666666667 Cholesky decomposition of a 3000x3000 matrix________ (sec): 3.872 Inverse of a 1600x1600 random matrix________________ (sec): 3.04166666666667 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 1.94959995852139 III. Programmation ------------------ 3,500,000 Fibonacci numbers calculation (vector calc)(sec): 0.663333333333346 Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec): 0.315333333333323 Grand common divisors of 400,000 pairs (recursion)__ (sec): 1.74266666666667 Creation of a 500x500 Toeplitz matrix (loops)_______ (sec): 0.471666666666674 Escoufier's method on a 45x45 matrix (mixed)________ (sec): 0.381 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 0.492149816487262 Total time for all 15 tests_________________________ (sec): 32.179 Overall mean (sum of I, II and III trimmed means/3)_ (sec): 1.03023476346231

R Benchmark 2.5 =============== Number of times each test is run__________________________: 3 I. Matrix calculation --------------------- Creation, transp., deformation of a 2500x2500 matrix (sec): 0.438333333333333 2400x2400 normal distributed random matrix ^1000____ (sec): 0.362666666666666 Sorting of 7,000,000 random values__________________ (sec): 0.625666666666666 2800x2800 cross-product matrix (b = a' * a)_________ (sec): 6.06 Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec): 2.66333333333333 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 0.900584248749399 II. Matrix functions -------------------- FFT over 2,400,000 random values____________________ (sec): 0.372 Eigenvalues of a 640x640 random matrix______________ (sec): 0.456999999999996 Determinant of a 2500x2500 random matrix____________ (sec): 1.85666666666667 Cholesky decomposition of a 3000x3000 matrix________ (sec): 1.44933333333334 Inverse of a 1600x1600 random matrix________________ (sec): 1.85266666666667 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 1.07060004219009 III. Programmation ------------------ 3,500,000 Fibonacci numbers calculation (vector calc)(sec): 0.510333333333335 Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec): 0.308666666666667 Grand common divisors of 400,000 pairs (recursion)__ (sec): 1.581 Creation of a 500x500 Toeplitz matrix (loops)_______ (sec): 0.408000000000001 Escoufier's method on a 45x45 matrix (mixed)________ (sec): 0.285000000000011 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 0.400560336912059 Total time for all 15 tests_________________________ (sec): 19.2306666666667 Overall mean (sum of I, II and III trimmed means/3)_ (sec): 0.728237740489568

R Benchmark 2.5 =============== Number of times each test is run__________________________: 3 I. Matrix calculation --------------------- Creation, transp., deformation of a 2500x2500 matrix (sec): 0.458333333333333 2400x2400 normal distributed random matrix ^1000____ (sec): 0.378 Sorting of 7,000,000 random values__________________ (sec): 0.643666666666666 2800x2800 cross-product matrix (b = a' * a)_________ (sec): 0.922666666666667 Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec): 0.482999999999999 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 0.522311832408545 II. Matrix functions -------------------- FFT over 2,400,000 random values____________________ (sec): 0.406666666666666 Eigenvalues of a 640x640 random matrix______________ (sec): 0.288999999999997 Determinant of a 2500x2500 random matrix____________ (sec): 0.497 Cholesky decomposition of a 3000x3000 matrix________ (sec): 0.438000000000002 Inverse of a 1600x1600 random matrix________________ (sec): 0.37866666666667 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 0.407058274866339 III. Programmation ------------------ 3,500,000 Fibonacci numbers calculation (vector calc)(sec): 0.648999999999996 Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec): 0.306000000000002 Grand common divisors of 400,000 pairs (recursion)__ (sec): 1.785 Creation of a 500x500 Toeplitz matrix (loops)_______ (sec): 0.455333333333328 Escoufier's method on a 45x45 matrix (mixed)________ (sec): 0.375 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 0.480324939124224 Total time for all 15 tests_________________________ (sec): 8.46533333333333 Overall mean (sum of I, II and III trimmed means/3)_ (sec): 0.467419897853855

R Benchmark 2.5 =============== Number of times each test is run__________________________: 3 I. Matrix calculation --------------------- Creation, transp., deformation of a 2500x2500 matrix (sec): 0.475333333333333 2400x2400 normal distributed random matrix ^1000____ (sec): 0.369 Sorting of 7,000,000 random values__________________ (sec): 0.637000000000002 2800x2800 cross-product matrix (b = a' * a)_________ (sec): 0.884666666666665 Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec): 0.451333333333332 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 0.515084369178734 II. Matrix functions -------------------- FFT over 2,400,000 random values____________________ (sec): 0.372666666666667 Eigenvalues of a 640x640 random matrix______________ (sec): 0.285999999999999 Determinant of a 2500x2500 random matrix____________ (sec): 0.504 Cholesky decomposition of a 3000x3000 matrix________ (sec): 0.429 Inverse of a 1600x1600 random matrix________________ (sec): 0.370333333333332 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 0.389753671609465 III. Programmation ------------------ 3,500,000 Fibonacci numbers calculation (vector calc)(sec): 0.474000000000001 Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec): 0.309333333333332 Grand common divisors of 400,000 pairs (recursion)__ (sec): 1.522 Creation of a 500x500 Toeplitz matrix (loops)_______ (sec): 0.431000000000002 Escoufier's method on a 45x45 matrix (mixed)________ (sec): 0.267999999999994 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 0.398315717938976 Total time for all 15 tests_________________________ (sec): 7.78366666666666 Overall mean (sum of I, II and III trimmed means/3)_ (sec): 0.430822797869348]]>

There is a lot discussions on how to integrate latexdiff with version control systems like git. If you have your tex documents git-controlled. You want to check the change of two revisions visually (not the standard git-diff for text files but you want see the difference in a pdf file for two tex files). For sophisticated solution integrating with git, you may consider using git-latexdiff.

A simple solution is to run the command

latexdiff <(git show oldcommit:file.tex) file.tex > diff.tex

and then simply run e.g.

pdflatex -interaction=nonstopmode diff.tex

to see the changes in diff.pdf.

]]>I found a solution (or a workaround, to be precise) with BibTeX. See my BibTeX database page for a detailed explanation.

Others suggest using Biber (A BibTeX replacement for users of BibLaTeX) but I have not spent time on that yet.

]]>You can find the code at

The function depends on the Pochhammer function which is available at

]]>- How to pass arguments to the script
- Which version of Rscript should you use
- A facility to provide a bash-line help

I did a simple try and here is an example Rscript how it works

- https://github.com/feng-li/flutils/blob/master/bin/embedAllFont

- Efron, B. (1979). Bootstrap methods: another look at the jackknife.
*The annals of Statistics*, 7(1), 1-26.

In order to have more fun in the reading, we are aiming to complete the two tasks before **Sunday, Dec 22 .**

- Post at least one question here you found during the reading.
- Give comments to at least one question that other people asked.

Happy reading!

]]>To use it more efficiently, I suggest you put the function at some place, e.g. **~/workspace/R_utils/sourceDir.R**, and write the following lines in .Rprofile file.

```
.my.env <- new.env()
sys.source("~/workspace/R_utils/sourceDir.R", envir=.my.env)
attach(.my.env)
```

Next time the function will be automatically loaded under a personal environment **.my.env** when R is launched and you can use it directly. The advantage of this is that **rm(list=ls())** command will not remove the function sourceDir. But if you really want to remove sourceDir function, you can use **rm(list=ls(all=TRUE))** command instead.

When you create a matrix in the usual way like this,

`> a <- matrix(rnorm(10),2,5)`

`> a [,1] [,2] [,3] [,4] [,5] [1,] 1.3488918 0.6225795 -0.7444514 1.3130491 1.7877849 [2,] -0.2385392 0.5656759 0.9037435 -0.2217444 -0.2656875`

the dimension dropped after picking up a single row or column in this way,

`> b <- a[,1]`

`> b`

`[1] 1.3488918 -0.2385392`

`> dim(b)`

`NULL`

The solution is to try it with a parameter** drop = FALSE**,

`> b <- a[,1,drop = FALSE]`

`> b`

`[,1]`

`[1,] 1.3488918`

`[2,] -0.2385392`

`> dim(b)`

`[1] 2 1`

**Method A (R users)**Before you make your graph in R, use**par(mar=c(bottom, left, top, right))**to specify the margin you want to keep. The default value is c(5, 4, 4, 2) + 0.1. Try this example to see the differences.`par(mar=c(5,4,4,2)+0.1) # The defualt margin`

s`plot(rnorm(100))`

`dev.copy2eps() # Save as eps`

`par(mar=c(4,4,0,0)+0.1) # Figure with very tight margins`

`plot(rnorm(100))`

`dev.copy2eps()`

**Method B (use epstool)**Very handy tool that can handle the optimal bounding box`epstool --copy --bbox file.eps file_new.eps`

**Method C (use ps2epsi)**It automatically calculates the bounding box required for all encapsulated PostScript files, so most of the time it does a pretty good job`ps2epsi <input.eps> <output.eps>`

**Method D (DIY for any eps )**Use a text editor open your eps file and you will find a line like this`%%BoundingBox: 0 0 503 503`

in the front lines of the file. Adjust these vales to proper integers. Save it and test if the margins are better. When you want to crop an eps file and include it into LaTeX with

**\includegraphics**command, you should use**\includegraphics***instead. Because If * is present, then the graphic is ‘clipped’ to the size specified. If * is omitted, then any part of the graphic that is outside the specified ‘bounding box’ will over-print the surrounding text. By the way, the options**trim**,**bb**,**viewport**options in**\includegraphics**can do the same job in a different manner without editing the eps file, see the help document for details.

if(a<-5)

Assume you want make a condition to check if “**a is smaller than negative five**“, then do something. So you wrote

`if (a<-5)`

`{`

`sin(pi/3) }`

but R will check if you “**assign positive five to a**” since “<-” in R is an assignment operator. And of course this is always TRUE. As a result, R will always do the calculations within the condition.

**
Solutions**: use a better coding style. i.e. always put a space between the operator (either assignment operators or relations operators) and values e.g.,

`if (a < -5)`

`{`

`sin(pi/3)`

`}`

or use the parentheses if you want to make sure what you are ding.

`if (a<(-5))`

`{`

`sin(pi/3)`

`}`

When you want to break a long expression into several lines in R, you don’t have to put a special notation at end of each line and R will check if your expression has finished. This makes thing convenient but also brings troubles. Assume you have a very long expression and you want to break it into two lines, e.g.

`myvalue <- sin(pi/3) + cos(pi/3) + 2*sin(pi/3)*cos(pi/3)`

The result should be 2.232051.

But you wrote

`myvalue <- sin(pi/3) + cos(pi/3) + 2*sin(pi/3)*cos(pi/3)`

R will think you have finished the expression at the end of first line and started a new expression from the second line. You will find the result is 1.366025 since the second part is not included in at all.

**Solutions: **You can either put a pair of parentheses in your expression like this

`myvalue <- (sin(pi/3) + cos(pi/3)`

`+ 2*sin(pi/3)*cos(pi/3))`

but too many parentheses make the code very hard to read. So you can do the trick that alway break the line after the arithmetic operators

`myvalue <- sin(pi/3) + cos(pi/3)`

`+ 2*sin(pi/3)*cos(pi/3)`

As is described in R help document, using ‘diag(x)’ can have unexpected effects if ‘x’ is a vector could be of length one, like this example

`> diag(7.4)`

`[,1] [,2] [,3] [,4] [,5] [,6] [,7]`

`[1,] 1 0 0 0 0 0 0`

`[2,] 0 1 0 0 0 0 0`

`[3,] 0 0 1 0 0 0 0`

`[4,] 0 0 0 1 0 0 0`

`[5,] 0 0 0 0 1 0 0`

`[6,] 0 0 0 0 0 1 0`

`[7,] 0 0 0 0 0 0 1`

** Solutions: ** To avoid this, use “diag(x, nrow = length(x))” for consistent behavior **when “x” is a vector**

`> x = c(1,2,3)`

`> diag(x,length(x))`

`[,1] [,2] [,3]`

`[1,] 1 0 0`

`[2,] 0 2 0`

`[3,] 0 0 3`

`> x = 2.4`

`> diag(x,length(x))`

`[,1]`

`[1,] 2.4`

The first argument of sample function has some inconsistent behaviors when the length of x is 1 and x is an integer, see this example

`sample(x=3, n = 10, replace=TRUE) # same as sample(x=1:3, n = 10, replace=TRUE)`

If you want to* sample “3” ten times with replacement,* i.e. you obtain a vector of ten 3, you have to check that condition explicitly.

]]>

cat("\033[2J\033[H")

Note: This is not applicable to Rgui for Windows.

]]>> Mydata comment(Mydata) <- "This is a data from a sequence" > Mydata [1] 1 2 3 4 5 6 7 8 9 10 > comment(Mydata) [1] "This is a data from a sequence" > str(Mydata) atomic [1:10] 1 2 3 4 5 6 7 8 9 10 - attr(*, "comment")= chr "This is a data from a sequence"]]>

But the final fonts size for the main title (or axis, lab, sub) is determined by the product of the three variables, ps, cex, and cex.main (or cex.axis, cex.lab, cex.sub), respectively.

So if you want 12 points fonts in the title, you may set the following options

par(ps = 12, cex = 1, cex.main = 1)]]>

Please visit the git page for the updated code

- https://github.com/feng-li/flutils/blob/master/math/K.R
- https://github.com/feng-li/flutils/blob/master/math/K.X.R

]]>