OCaml-R: Getting started

OCaml-R is composed of a few modules, packaged inside various findlib packages:
The R package provides the interface to the R interpreter, and nothing more. It provides functions init and terminate to adminstrate R sessions, for instance.
The R.interpreter package is somewhat peculiar: It does not provide any code, nor any usable module for the end user of OCaml-R. It nevertheless has two successive side-effects: It loads the R package, and therefore loads libR.so, and then, initialises cleanly the R intepreter. You therefore only need a #require "R.interpreter" in the toplevel to start up the R interpreter, and you only need to compile with ocamlfind ocamlc -package R.interpreter to ensure at compile-time that R will start up properly. Which is rather useful when you want to create a whole bunch of interdependent bindings of R libraries.
The R.math package provides an interface to the libRmath.so standalone mathematical library of R. For instance, on Debian systems, libRmath.so has been compiled as to be independent of libR.so. One should avoid using simultaneously the R and R.math packages.
The R.base and R.stats packages provide components of R's standard library. They are not provided with the R package because they require functions and constructions that are only available when the R interpreter is up and running. Nevertheless, type declarations do not require an active R interpreter and are therefore located in the R package.
To initialise R, you can use the init function, or more simply, rely on the R.interpreter package.
yziquel@seldon:~$ ocaml
Objective Caml version 3.11.1
# #use "topfind";;
- : unit = ()
Findlib has been successfully loaded. Additional directives:
#require "package";; to load a package
#list;; to list the available packages
#camlp4o;; to load camlp4 (standard syntax)
#camlp4r;; to load camlp4 (revised syntax)
#predicates "p,q,...";; to set these predicates
Topfind.reset();; to force that packages will be reloaded
#thread;; to enable threads
- : unit = ()
# #require "R.interpreter";;
/usr/lib/ocaml/unix.cma: loaded
/usr/lib/ocaml/str.cma: loaded
/usr/lib/ocaml/calendar: added to search path
/usr/lib/ocaml/calendar/calendarLib.cma: loaded
/usr/lib/ocaml/R: added to search path
/usr/lib/ocaml/R/r.cma: loaded
/usr/lib/ocaml/R/oCamlR.cmo: loaded
#
The R interpreter is now active. Moreover, when the program will exit, the terminate function will be called to exit cleanly from the R session.
To evaluate R code, for instance c(10.4, 5.6, 3.1, 6.4, 21.7), we can use the R.eval_string function in the following way:
# let q = R.eval_string "c(10.4, 5.6, 3.1, 6.4, 21.7)";; val q : '_a R.t = <abstr>
The polymorphic 'a R.t type represents an R value denoting an OCaml value of type 'a. We are now going to use R.floats_of_t to convert q into a list of floats:
# let w = R.floats_of_t q;; val w : float list = [10.4; 5.6; 3.1; 6.4; 21.7]
The type of q has then been unified to the type float list R.t:
# q;; - : float list R.t = <abstr>
The R.eval_string function operates in two stages: First, the provided character string is parsed by the R.parse function, which yields a value of type R.lang R.sxp. Such values are expressions that are executable within the R interpreter, and are evaluated via a call to R.eval_langsxp.
# let e = R.parse "c(10.4, 5.6, 3.1, 6.4, 21.7)";; val e : R.lang R.sxp = <abstr> # let r = R.eval_langsxp e;; val r : '_a R.t = <abstr> # let t = R.floats_of_t r;; val t : float list = [10.4; 5.6; 3.1; 6.4; 21.7]
We indeed get the same values for w and for t.
We now have three R values available from Objective Caml: q, e and r. We will now examine these values in a bit more detail.
q has type float list R.t. This means that it is an R value that corresponds to an OCaml value of type float list. As a matter of fact, there are two distinct typing schemes for R values in OCaml-R: The polymorphic type 'a R.t, and the monomorphic type R.sexp. The difference between these two types is that the first one contains type information about its OCaml equivalent, whereas the second one could apply to any R value wrapped into OCaml with OCaml-R.
We can convert a value of type 'a R.t into an R.sexp value by a subtyping mechanism. Reciprocally, there is an R.cast function that converts R.sexp values into values of type 'a R.t. For instance, the subtyping mechanism allows us to use q as an argument to R.Pretty.t_of_sexp:
# R.Pretty.t_of_sexp (q : float list R.t :> R.sexp);; - : R.Pretty.t = R.Pretty.FLOATS [10.4; 5.6; 3.1; 6.4; 21.7]
R.Pretty.t_of_sexp is a function that recursively analyses the internal structure of the R value passed as argument, and which outputs a condensed, legible, description of its internal structure.
We would also like to apply it to e, but e has type R.lang R.sxp. 'a R.sxp also covers R values wrapped into OCaml via OCaml-R, but it is an internal type that should not be visible to the user (the API is still not perfect in this regard). We are therefore going to be extremely "dirty" by doing precisely what the typing scheme of OCaml-R aims at forbidding (i.e. use Obj.magic):
# open R.Pretty;;
# R.Pretty.t_of_sexp (Obj.magic e : R.sexp);;
- : R.Pretty.t =
CALL (SYMBOL (Some ("c", SPECIAL 89)),
[(NULL, FLOATS [10.4]); (NULL, FLOATS [5.6]); (NULL, FLOATS [3.1]);
(NULL, FLOATS [6.4]); (NULL, FLOATS [21.7])])
This description means that e is an executable expression (a CALL), which consists of executing the function symbolised by "c", located at offset 89 in the table of R primitive functions, with arguments a list of anonymous arguments, each one denoting a floating point number.
One should keep in mind that, behind the scenes, R is a weakly and dynamically typed language, which partly explains the internal structure of its values. One of the essential aspects of this dynamic typing is called the "sexptype" of R values, which can be accessed through the R.sexptype function:
# R.sexptype (q : float list R.t :> R.sexp);; - : R.sexptype = R.RealSxp # R.sexptype (Obj.magic e : R.sexp);; - : R.sexptype = R.LangSxp
Now that we have seen how to start up the R interpreter, how to parse R code, execute R code, and that we have some understanding of the internal structure of R values, we are going to see how to interface an R function in order to make it available to the OCaml world. We are going to take, as an example, the sample function of the base library of R. The R help page informs us of the following, concerning function sample:
sample package:base R Documentation
Random Samples and Permutations
Description:
'sample' takes a sample of the specified size from the elements of
'x' using either with or without replacement.
Usage:
sample(x, size, replace = FALSE, prob = NULL)
sample.int(n, size, replace = FALSE, prob = NULL)
Arguments:
x: Either a (numeric, complex, character or logical) vector of
more than one element from which to choose, or a positive
integer.
n: a non-negative integer, the number of items to choose from.
size: positive integer giving the number of items to choose.
replace: Should sampling be with replacement?
prob: A vector of probability weights for obtaining the elements of
the vector being sampled.We will therefore consider x to be of type 'a list R.t, we will consider size as an OCaml integer, and replace and prob are optional arguments of respective type bool option and float list option. In this context, exporting sample to OCaml can be done in the following way:
# let stub_sample = R.symbol "sample";;
val stub_sample : R.sexp = <abstr>
# let sample (x : 'a list R.t) size ?replace ?prob () : 'a list R.t =
R.eval stub_sample [
R.arg (fun x -> x) x ;
R.arg R.int size ;
R.opt R.bool "replace" replace ;
R.opt R.floats "prob" prob ];;
val sample :
'a list R.t ->
int -> ?replace:bool -> ?prob:float list -> unit -> 'a list R.t = <fun>
A few words of explanation: R.symbol "sample" is a way to load in memory an object whose symbol, or symbolic name, is the string "sample". Roughly stub_sample is the R value that contains the function called "sample". We will then use the R.eval stub_sample [...] construction to make an R function call, calling stub_sample on the arguments between brackets.
The calling conventions on R side are somewhat peculiar: Its a complex mixture, where arguments are named, and where, in the scope of evaluation, we try to match the name of the arguments to names that the R function expects. It is possible that such names do not match, or that there is no name for a given argument, that matching of names is only partial, et ceterę...
To conceal all this complexity to the end-user of OCaml-R, we use two auxiliary functions, R.arg and R.opt. R.arg concerns the OCaml arguments that are required, while R.opt concerns OCaml arguments that are optional. The function given as argument to R.opt or R.arg is a conversion function from OCaml side to R side. Then, for optional arguments, you have the name of the argument, and then the argument itself. It is a bit complicated at first sight, but not so much (and there is room for improvement.)
Once the function is exported to Objective Caml, we can use it to take a sample of 4 elements out of a list of 10 elements. For example:
# R.floats_of_t (sample (R.floats
[0.0; 0.1; 0.2; 0.3; 0.4; 0.5; 0.6; 0.7; 0.8; 0.9]) 4 ());;
- : float list = [0.; 0.7; 0.3; 0.4]
What has indeed been done:
What indeeds remain to be done:
OCaml-R: Getting started
