Behavior of <- NULL on lists versus data.frames for removing data
- by Ananda Mahto
Many R users eventually figure out lots of ways to remove elements from their data. One way is to use NULL, particularly when you want to do something like drop a column from a data.frame or drop an element from a list.
Eventually, a user comes across a situation where they want to drop several columns from a data.frame at once, and they hit upon <- list(NULL) as the solution (since using <- NULL will result in an error).
A data.frame is a special type of list, so it wouldn't be too tough to imagine that the approaches for removing items from a list should be the same as removing columns from a data.frame. However, they produce different results, as can be seen in the example below.
## Make some small data--two data.frames and two lists
cars1 <- cars2 <- head(mtcars)[1:4]
cars3 <- cars4 <- as.list(cars2)
## Demonstration that the `list(NULL)` approach works
cars1[c("mpg", "cyl")] <- list(NULL)
cars1
# disp hp
# Mazda RX4 160 110
# Mazda RX4 Wag 160 110
# Datsun 710 108 93
# Hornet 4 Drive 258 110
# Hornet Sportabout 360 175
# Valiant 225 105
## Demonstration that simply using `NULL` does not work
cars2[c("mpg", "cyl")] <- NULL
# Error in `[<-.data.frame`(`*tmp*`, c("mpg", "cyl"), value = NULL) :
# replacement has 0 items, need 12
Switch to applying the same concept to a list, and compare the difference in behavior.
## Does not fully drop the items, but sets them to `NULL`
cars3[c("mpg", "cyl")] <- list(NULL)
# $mpg
# NULL
#
# $cyl
# NULL
#
# $disp
# [1] 160 160 108 258 360 225
#
# $hp
# [1] 110 110 93 110 175 105
## *Does* drop the `list` items while this would
## have produced an error with a `data.frame`
cars4[c("mpg", "cyl")] <- NULL
# $disp
# [1] 160 160 108 258 360 225
#
# $hp
# [1] 110 110 93 110 175 105
The main questions I have are, if a data.frame is a list, why does it behave so differently in this scenario? Is there a foolproof way of knowing when an element will be dropped, when it will produce an error, and when it will simply be given a NULL value? Or do we depend on trial-and-error for this?