Strange behavior when using apply with rank and order on a data.frame with ordered factors

I've found some weird behavior with apply.

Assume I have an arbitrary matrix of ordered variables

set.seed(4)
x <- ordered(sample(1:10, size=4, replace=T))
y <- ordered(sample(1:10, size=4, replace=T))
z <- ordered(sample(1:10, size=4, replace=T))

data1 <- data.frame(x,y,z)

Now I want to get the ranks for each variable. I could do this two ways:

With a for loop:

rankmat1 <- data1 
for(i in 1:dim(data1)[2]){ 
     rankmat1[, i] <- rank(data1 [, i])
         }

Or with apply

rankmat2 <- apply(data1, 2, rank)

So, here are the original levels:

data1 
  x  y  z
1 6  9 10
2 1  3  1
3 3  8  8
4 3 10  3

And here are the correct rankings:

rankmat1
    x y z
1 4.0 3 4
2 1.0 1 1
3 2.5 2 3
4 2.5 4 2

But why are these rankings from apply permuted differently?

rankmat2
       x y z
[1,] 4.0 4 2
[2,] 1.0 2 1
[3,] 2.5 3 4
[4,] 2.5 1 3

This happens with order too:

ordermat1 <- data1 
for(i in 1:dim(data1 )[2]){ 
     ordermat1[, i] <- order(data1 [, i])
         }
ordermat2 <- apply(data1, 2, order)

ordermat1
  x y z
1 2 2 2
2 3 3 4
3 4 1 3
4 1 4 1

ordermat2
     x y z
[1,] 2 4 2
[2,] 3 2 1
[3,] 4 3 4
[4,] 1 1 3
Asked By: Mammoth
||

Answer #1:

As requested by the OP, here is a detailed explanation which may help other R users to evade the traps.

Trap 1

As joran has pointed out, apply coerces the data frame into a matrix thereby replacing the ordered factors by characters. So, the original data.frame

data1
  x  y  z
1 6  9 10
2 1  3  1
3 3  8  8
4 3 10  3

becomes

as.matrix(data1)
     x   y    z   
[1,] "6" "9"  "10"
[2,] "1" "3"  "1" 
[3,] "3" "8"  "8" 
[4,] "3" "10" "3" 

Trap 2

Characters are sorted lexically. Thus, sorting the y column as character returns

sort(c("9", "3", "8", "10"))
[1] "10" "3"  "8"  "9" 

instead of

sort(c(9, 3, 8, 10))
[1]  3  8  9 10

This explains why apply returns a different result for the rank operation here.

Solution

You can use lapply to compute the rank of each column of the data frame.

as.data.frame(lapply(data1, rank))
    x y z
1 4.0 3 4
2 1.0 1 1
3 2.5 2 3
4 2.5 4 2

lapply returns a list and a data frame is a special kind of list.

Avoid sapply because sapply takes the output of lapplyand "simplifies" it to something what it thinks is appropriate. Here,

sapply(data1, rank)
       x y z
[1,] 4.0 3 4
[2,] 1.0 1 1
[3,] 2.5 2 3
[4,] 2.5 4 2

returns a matrix (again!) which needs to be coerced to a data frame. (See chapter 8.3.20 of The R Inferno by Patrick Burns.The text is a good read, anyway.)

Alternative Solution

The OP has not given an indication why he needs to work with ordered factors. If factors, ordered or not, are not essential to the OPs underlying problem, then applywould have worked as expected.

set.seed(4)
x2 <- sample(1:10, size = 4, replace = T)
y2 <- sample(1:10, size = 4, replace = T)
z2 <- sample(1:10, size = 4, replace = T)
data2 <- data.frame(x2, y2, z2)
data2
  x2 y2 z2
1  6  9 10
2  1  3  1
3  3  8  8
4  3 10  3
apply(data2, 2, rank) 
  x2 y2 z2
[1,] 4.0  3  4
[2,] 1.0  1  1
[3,] 2.5  2  3
[4,] 2.5  4  2

(Nevertheless, better to use lapply instead of apply with a data frame).

Trap 3

When I started to learn R, I was misled by the name of the function ordered(). It took me a while to understand that it creates a special kind of factors. Likewise, it took me some time to figure out the difference between sort() and order() and when to use which function appropriately.

Answered By: Uwe

Answer #2:

I am not sure why the extract reason for that happen to apply function. But you could try sapply to solve the problem.

rankmat3 <- as.data.frame(sapply(data1, rank))
The result would be like:
rankmat3
    x y z
1 4.0 3 4
2 1.0 1 1
3 2.5 2 3
4 2.5 4 2

Answered By: Alexc
The answers/resolutions are collected from stackoverflow, are licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0 .



# More Articles