| 
  • If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

View
 

R Examples For Actuaries - R Data Structures

Page history last edited by PBworks 17 years, 9 months ago

An Introduction to R: Examples for Actuaries

 

Nigel De Silva (nigel.desilva@thomasmiller.com)


Up to: Contents

Back to: Some preliminaries

Forward to: R Graphics


2.3 R Data Structures

2.3.1 Data Classes

As an object orientated language, everything in R is an object. Each object has a class.

 

The simplest data objects are one-dimensional arrays called vectors, consisting of any number of elements. For example, the calculation:

  > 2 + 2
  [1] 4

results in a vector, from the numeric class (as it contains a number), with just one element. Note that the command "2+2" is itself and object of the expression class.

 

The simplest elements produce vectors of the following classes:

 

  • logical: The values T (or TRUE) and F (or FALSE).
  • integer: Integer values such as 3 or -4.
  • numeric: Floating-point real numbers (double-precision by default). Numerical values can be written as whole numbers (for example, 3., -4.), decimal fractions (4.52, -6.003), or in scientific notation (6.02e23, 8e-47).
  • complex: Complex numbers of the form a + bi, where a and b are integers or numeric (for example, 3 + 1.23i).
  • character: character strings enclosed by matching double quotes (") or apostrophes ( ’), for example, "Actuary", ’idea’.

 

Two other elements which are particularly useful are:

 

  • factors: These represent labelled observations. For example sex is a factor, generally incorporating two levels: male and female. These are generally used to represent qualitative effects in models.
  • ordered factors: A factor where the levels are ordered. For example there may be three responses to a question about quality, high, medium or low, but each level is not necessarily on a linear scale.


2.3.2 Vectors

The simplest type of data object in R is a vector, which is simply an ordered set of values. Some further examples of creating vectors are shown below:

  > 11:20
  [1] 11 12 13 14 15 16 17 18 19 20

This creates a numeric vector containing the elements 11 to 20. The ":" is a shorthand for the explicit command, seq(from=11, to=20, by=1). Vectors can be assigned a name (case sensitive) via the assignment operator ("<-"), for example:

  > x <- 11:20
  > y <- c(54, 16, 23, 34, 87)	# "c" means "combine"
  > z <- c("apple", "bear", "candle")

 

Note: The "#" can be used to make comments in your code. R ignores anything after it on the same line.

 

To display a vector, use its name. To extract subsets of vectors, use their numerical indices with the subscript operator "[" as in the following examples.

  > z
  [1] "apple"  "bear"   "candle"
  > x[4]
  [1] 14
  > y[c(1,3,5)]
  [1] 54 23 87

The number of elements and their mode completely define the data object as a vector. The class of any vector is the mode of its elements:

  > class(c(T,T,F,T))
  [1] "logical"
  > class(y)
  [1] "numeric"

The number of elements in a vector is called the length of the vector and can be obtained for any vector using the length function:

  > length(x)
  [1] 10

Vectors may have named elements.

  > temp <- c(11, 12, 17)
  > names(temp) <- c("London", "Madrid", "New York")
  > temp
    London   Madrid New York 
        11       12       17

Operations can be performed on the entire vector as a whole without looping through each element. This is important for writing efficient code as we will see later. For example, a conversion to Fahrenheit can be achieved by:

  > 9/5 * temp + 32 
    London   Madrid New York 
      51.8     53.6     62.6


2.3.3 Matrices

An extension of the vector is the matrix class. We can create a matrix from the vector x as shown below:

  > matrix(x, nrow=5)
       [,1] [,2]
  [1,]   11   16
  [2,]   12   17
  [3,]   13   18
  [4,]   14   19
  [5,]   15   20

Or alternatively, by:

  > dim(x) <- c(5,2)
  > x
       [,1] [,2]
  [1,]   11   16
  [2,]   12   17
  [3,]   13   18
  [4,]   14   19
  [5,]   15   20

We can join matrices via cbind or rbind. As for vectors, we can extract an element using subscripts, or perform operations on all elements:

  > x[3,2]
  [1] 18
  > x[3, ]  	# Omitting the column index prints the entire row
  [1] 13 18
  > x-10
       [,1] [,2]
  [1,]    1    6
  [2,]    2    7
  [3,]    3    8
  [4,]    4    9
  [5,]    5   10

Alternatively, matrix operations are possible, for example:

  > t(x)		# Transpose
       [,1] [,2] [,3] [,4] [,5]
  [1,]   11   12   13   14   15
  [2,]   16   17   18   19   20
  > t(x) %*% x   	# Multiplication
       [,1] [,2]
  [1,]  855 1180
  [2,] 1180 1630

Many more operations are possible, for example solving linear equations, eigenvalues and eigenvectors, decompositions, etc.


2.3.4 Arrays

Arrays are a further abstraction of matrices and can be created in similar ways. For example:

  > array(1:12, dim=c(2,3,2))
  , , 1

       [,1] [,2] [,3]
  [1,]    1    3    5
  [2,]    2    4    6

  , , 2

       [,1] [,2] [,3]
  [1,]    7    9   11
  [2,]    8   10   12

As with matrices, various operations are possible for arrays.


2.3.5 Data Frames

Data frames are of fundamental importance when it comes to most modelling and graphical functions in R. Until now all data strictures have been atomic in that they contain data of just one mode, such as integers or characters. Data frames are two dimensional tables in which each column can take on different modes.

 

We can create a data frame as follows:

  > Temps <- data.frame(town = c("London", "Madrid", "New York"), 
  + temp = c(11, 12, 17))
  > Temps
        town temp
  1   London   11
  2   Madrid   12
  3 New York   17

Note: The command is split over two lines. As the final closing parentheses for the data.frame function was not provided in the first command, R expected further instructions, in this case details for the second column of temperatures.

 

The columns are named town and temp respectively. We can extract subsets of this data frame via subscripts (as for matrices):

  > Temps[1,2]
  [1] 11
  > Temps[2,1]
  [1] Madrid
  Levels: London Madrid New York 

Notice that the “town” column is not actually a vector of characters, but a vector of factors with three levels, listed below the result.

 

An alternative method of extraction is to consider that each column of the data frame is a vector with a name. It can be accessed via the "$" operator, and the result treated as a vector.

  > Temps$temp
  [1] 11 12 17
  > Temps$temp[2]
  [1] 12

Often packages come with datasets. For example the MASS library contains a dataset called Insurance. We can access the first five rows of this dataset via:

  > library(MASS)
  > Insurance[1:5,]
    District  Group   Age Holders Claims
  1        1    <1l   <25     197     38
  2        1    <1l 25-29     264     35
  3        1    <1l 30-35     246     20
  4        1    <1l   >35    1680    156
  5        1 1-1.5l   <25     284     63 

This is a dataset of motor claims where, District is a factor, Group and Age are ordered factors and the last two columns are integers.

  > class(Insurance$Age)
  [1] "ordered" "factor" 
  > levels(Insurance$Age)
  [1] "<25"   "25-29" "30-35" ">35"


2.3.6 Lists

Lists make it possible to collect an arbitrary set of R objects together under a single name. You might for example collect together vectors of several different modes and lengths, scalars, matrices or more general arrays, functions, etc. Lists can be, and often are, a rag-tag of different objects.

 

As an illustration the list object that R creates as output from the attributes function.

  > attributes(Temps)
  $names
  [1] "town" "temp"

  $row.names
[1] "1" "2" "3"

  $class
  [1] "data.frame"

In this case, the elements are all vectors. We can access a list’s elements via subscripts, this time using the "[[" operator, or via names, using the "$" operator. For example:

  > attributes(Temps)[[1]]
  [1] "town" "temp"
  > attributes(Temps)$row.names
  [1] "1" "2" "3"


2.3.7 Other Classes

There are many other object classes that you will come across in R. As in any other object orientated programming language, classes can be defined by the user and contributed packages typically create special classes used by their functions.

 

Generally, you may never need to worry about classes. User defined classes are typically lists. The elements can be identified via the names function and then accessed as described above.

  > names(attributes(Temps))
  [1] "names"     "row.names" "class"

Comments (0)

You don't have permission to comment on this page.