An Introduction to R: Examples for Actuaries
Nigel De Silva (nigel.desilva@thomasmiller.com)
Up to: Contents
Back to: Some preliminaries
Forward to: R Graphics
2.3 R Data Structures
2.3.1 Data Classes
As an object orientated language, everything in R is an object. Each object has a class.
The simplest data objects are one-dimensional arrays called vectors, consisting of any number of elements. For example, the calculation:
> 2 + 2
[1] 4
results in a vector, from the numeric class (as it contains a number), with just one element. Note that the command "2+2" is itself and object of the expression class.
The simplest elements produce vectors of the following classes:
- logical: The values T (or TRUE) and F (or FALSE).
- integer: Integer values such as 3 or -4.
- numeric: Floating-point real numbers (double-precision by default). Numerical values can be written as whole numbers (for example, 3., -4.), decimal fractions (4.52, -6.003), or in scientific notation (6.02e23, 8e-47).
- complex: Complex numbers of the form a + bi, where a and b are integers or numeric (for example, 3 + 1.23i).
- character: character strings enclosed by matching double quotes (") or apostrophes ( ’), for example, "Actuary", ’idea’.
Two other elements which are particularly useful are:
- factors: These represent labelled observations. For example sex is a factor, generally incorporating two levels: male and female. These are generally used to represent qualitative effects in models.
- ordered factors: A factor where the levels are ordered. For example there may be three responses to a question about quality, high, medium or low, but each level is not necessarily on a linear scale.
2.3.2 Vectors
The simplest type of data object in R is a vector, which is simply an ordered set of values. Some further examples of creating vectors are shown below:
> 11:20
[1] 11 12 13 14 15 16 17 18 19 20
This creates a numeric vector containing the elements 11 to 20. The ":" is a shorthand for the explicit command, seq(from=11, to=20, by=1). Vectors can be assigned a name (case sensitive) via the assignment operator ("<-"), for example:
> x <- 11:20
> y <- c(54, 16, 23, 34, 87) # "c" means "combine"
> z <- c("apple", "bear", "candle")
Note: The "#" can be used to make comments in your code. R ignores anything after it on the same line.
To display a vector, use its name. To extract subsets of vectors, use their numerical indices with the subscript operator "[" as in the following examples.
> z
[1] "apple" "bear" "candle"
> x[4]
[1] 14
> y[c(1,3,5)]
[1] 54 23 87
The number of elements and their mode completely define the data object as a vector. The class of any vector is the mode of its elements:
> class(c(T,T,F,T))
[1] "logical"
> class(y)
[1] "numeric"
The number of elements in a vector is called the length of the vector and can be obtained for any vector using the length function:
> length(x)
[1] 10
Vectors may have named elements.
> temp <- c(11, 12, 17)
> names(temp) <- c("London", "Madrid", "New York")
> temp
London Madrid New York
11 12 17
Operations can be performed on the entire vector as a whole without looping through each element. This is important for writing efficient code as we will see later. For example, a conversion to Fahrenheit can be achieved by:
> 9/5 * temp + 32
London Madrid New York
51.8 53.6 62.6
2.3.3 Matrices
An extension of the vector is the matrix class. We can create a matrix from the vector x as shown below:
> matrix(x, nrow=5)
[,1] [,2]
[1,] 11 16
[2,] 12 17
[3,] 13 18
[4,] 14 19
[5,] 15 20
Or alternatively, by:
> dim(x) <- c(5,2)
> x
[,1] [,2]
[1,] 11 16
[2,] 12 17
[3,] 13 18
[4,] 14 19
[5,] 15 20
We can join matrices via cbind or rbind. As for vectors, we can extract an element using subscripts, or perform operations on all elements:
> x[3,2]
[1] 18
> x[3, ] # Omitting the column index prints the entire row
[1] 13 18
> x-10
[,1] [,2]
[1,] 1 6
[2,] 2 7
[3,] 3 8
[4,] 4 9
[5,] 5 10
Alternatively, matrix operations are possible, for example:
> t(x) # Transpose
[,1] [,2] [,3] [,4] [,5]
[1,] 11 12 13 14 15
[2,] 16 17 18 19 20
> t(x) %*% x # Multiplication
[,1] [,2]
[1,] 855 1180
[2,] 1180 1630
Many more operations are possible, for example solving linear equations, eigenvalues and eigenvectors, decompositions, etc.
2.3.4 Arrays
Arrays are a further abstraction of matrices and can be created in similar ways. For example:
> array(1:12, dim=c(2,3,2))
, , 1
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
, , 2
[,1] [,2] [,3]
[1,] 7 9 11
[2,] 8 10 12
As with matrices, various operations are possible for arrays.
2.3.5 Data Frames
Data frames are of fundamental importance when it comes to most modelling and graphical functions in R. Until now all data strictures have been atomic in that they contain data of just one mode, such as integers or characters. Data frames are two dimensional tables in which each column can take on different modes.
We can create a data frame as follows:
> Temps <- data.frame(town = c("London", "Madrid", "New York"),
+ temp = c(11, 12, 17))
> Temps
town temp
1 London 11
2 Madrid 12
3 New York 17
Note: The command is split over two lines. As the final closing parentheses for the data.frame function was not provided in the first command, R expected further instructions, in this case details for the second column of temperatures.
The columns are named town and temp respectively. We can extract subsets of this data frame via subscripts (as for matrices):
> Temps[1,2]
[1] 11
> Temps[2,1]
[1] Madrid
Levels: London Madrid New York
Notice that the “town” column is not actually a vector of characters, but a vector of factors with three levels, listed below the result.
An alternative method of extraction is to consider that each column of the data frame is a vector with a name. It can be accessed via the "$" operator, and the result treated as a vector.
> Temps$temp
[1] 11 12 17
> Temps$temp[2]
[1] 12
Often packages come with datasets. For example the MASS library contains a dataset called Insurance. We can access the first five rows of this dataset via:
> library(MASS)
> Insurance[1:5,]
District Group Age Holders Claims
1 1 <1l <25 197 38
2 1 <1l 25-29 264 35
3 1 <1l 30-35 246 20
4 1 <1l >35 1680 156
5 1 1-1.5l <25 284 63
This is a dataset of motor claims where, District is a factor, Group and Age are ordered factors and the last two columns are integers.
> class(Insurance$Age)
[1] "ordered" "factor"
> levels(Insurance$Age)
[1] "<25" "25-29" "30-35" ">35"
2.3.6 Lists
Lists make it possible to collect an arbitrary set of R objects together under a single name. You might for example collect together vectors of several different modes and lengths, scalars, matrices or more general arrays, functions, etc. Lists can be, and often are, a rag-tag of different objects.
As an illustration the list object that R creates as output from the attributes function.
> attributes(Temps)
$names
[1] "town" "temp"
$row.names
[1] "1" "2" "3"
$class
[1] "data.frame"
In this case, the elements are all vectors. We can access a list’s elements via subscripts, this time using the "[[" operator, or via names, using the "$" operator. For example:
> attributes(Temps)[[1]]
[1] "town" "temp"
> attributes(Temps)$row.names
[1] "1" "2" "3"
2.3.7 Other Classes
There are many other object classes that you will come across in R. As in any other object orientated programming language, classes can be defined by the user and contributed packages typically create special classes used by their functions.
Generally, you may never need to worry about classes. User defined classes are typically lists. The elements can be identified via the names function and then accessed as described above.
> names(attributes(Temps))
[1] "names" "row.names" "class"
Comments (0)
You don't have permission to comment on this page.