I've found some weird behavior with apply.
Assume I have an arbitrary matrix of ordered variables
set.seed(4)
x <- ordered(sample(1:10, size=4, replace=T))
y <- ordered(sample(1:10, size=4, replace=T))
z <- ordered(sample(1:10, size=4, replace=T))
data1 <- data.frame(x,y,z)
Now I want to get the ranks for each variable. I could do this two ways:
With a for loop:
rankmat1 <- data1
for(i in 1:dim(data1)[2]){
rankmat1[, i] <- rank(data1 [, i])
}
Or with apply
rankmat2 <- apply(data1, 2, rank)
So, here are the original levels:
data1
x y z
1 6 9 10
2 1 3 1
3 3 8 8
4 3 10 3
And here are the correct rankings:
rankmat1
x y z
1 4.0 3 4
2 1.0 1 1
3 2.5 2 3
4 2.5 4 2
But why are these rankings from apply permuted differently?
rankmat2
x y z
[1,] 4.0 4 2
[2,] 1.0 2 1
[3,] 2.5 3 4
[4,] 2.5 1 3
This happens with order too:
ordermat1 <- data1
for(i in 1:dim(data1 )[2]){
ordermat1[, i] <- order(data1 [, i])
}
ordermat2 <- apply(data1, 2, order)
ordermat1
x y z
1 2 2 2
2 3 3 4
3 4 1 3
4 1 4 1
ordermat2
x y z
[1,] 2 4 2
[2,] 3 2 1
[3,] 4 3 4
[4,] 1 1 3
As requested by the OP, here is a detailed explanation which may help other R users to evade the traps.
As joran has pointed out, apply coerces the data frame into a matrix thereby replacing the ordered factors by characters. So, the original data.frame
data1
x y z
1 6 9 10
2 1 3 1
3 3 8 8
4 3 10 3
becomes
as.matrix(data1)
x y z
[1,] "6" "9" "10"
[2,] "1" "3" "1"
[3,] "3" "8" "8"
[4,] "3" "10" "3"
Characters are sorted lexically. Thus, sorting the y column as character returns
sort(c("9", "3", "8", "10"))
[1] "10" "3" "8" "9"
instead of
sort(c(9, 3, 8, 10))
[1] 3 8 9 10
This explains why apply returns a different result for the rank operation here.
You can use lapply to compute the rank of each column of the data frame.
as.data.frame(lapply(data1, rank))
x y z
1 4.0 3 4
2 1.0 1 1
3 2.5 2 3
4 2.5 4 2
lapply returns a list and a data frame is a special kind of list.
Avoid sapply because sapply takes the output of lapplyand "simplifies" it to something what it thinks is appropriate. Here,
sapply(data1, rank)
x y z
[1,] 4.0 3 4
[2,] 1.0 1 1
[3,] 2.5 2 3
[4,] 2.5 4 2
returns a matrix (again!) which needs to be coerced to a data frame. (See chapter 8.3.20 of The R Inferno by Patrick Burns.The text is a good read, anyway.)
The OP has not given an indication why he needs to work with ordered factors. If factors, ordered or not, are not essential to the OPs underlying problem, then applywould have worked as expected.
set.seed(4)
x2 <- sample(1:10, size = 4, replace = T)
y2 <- sample(1:10, size = 4, replace = T)
z2 <- sample(1:10, size = 4, replace = T)
data2 <- data.frame(x2, y2, z2)
data2
x2 y2 z2
1 6 9 10
2 1 3 1
3 3 8 8
4 3 10 3
apply(data2, 2, rank)
x2 y2 z2
[1,] 4.0 3 4
[2,] 1.0 1 1
[3,] 2.5 2 3
[4,] 2.5 4 2
(Nevertheless, better to use lapply instead of apply with a data frame).
When I started to learn R, I was misled by the name of the function ordered(). It took me a while to understand that it creates a special kind of factors. Likewise, it took me some time to figure out the difference between sort() and order() and when to use which function appropriately.
I am not sure why the extract reason for that happen to apply function. But you could try sapply to solve the problem.
rankmat3 <- as.data.frame(sapply(data1, rank))The result would be like:
rankmat3
x y z
1 4.0 3 4
2 1.0 1 1
3 2.5 2 3
4 2.5 4 2