Select, drop, and rename dataframe columns in Scheme
This post is the second in a series on the dataframe library for Scheme (R6RS). In this post, I will contrast the dataframe
library with functions from the dplyr
R package for selecting, dropping, and renaming columns.
Set up
First, let's create a very simple dataframe in both languages.
df <- data.frame("a" = 1:3, "b" = 4:6, "c" = 7:9)
(define df (make-df* (a 1 2 3) (b 4 5 6) (c 7 8 9)))
Select
With dplyr::select
, we can select and re-order columns in a single statement using bare column names.
> dplyr::select(df, c, a)
c a
1 7 1
2 8 2
3 9 3
With dataframe-select*
, we can also select and re-order columns in a single statement using bare column names.
> (dataframe-display (dataframe-select* df c a))
dim: 3 rows x 2 cols
c a
<num> <num>
7. 1.
8. 2.
9. 3.
Drop
dplyr::select
also allows for dropping columns by prefixing column names with -
.
> dplyr::select(df, -b)
a c
1 1 7
2 2 8
3 3 9
In dataframe
, dropping columns requires a separate procedure, dataframe-drop*
.
> (dataframe-display (dataframe-drop* df b))
dim: 3 rows x 2 cols
a c
<num> <num>
1. 7.
2. 8.
3. 9.
Rename
With dplyr::select
, columns can be renamed during selection, but dplyr::rename
allows for renaming without selection.
> dplyr::select(df, Bee = b, c)
Bee c
1 4 7
2 5 8
3 6 9
> dplyr::rename(df, Bee = b, Sea = c)
a Bee Sea
1 1 4 7
2 2 5 8
3 3 6 9
dataframe-select*
does not allow for renaming during selection, but dataframe-rename
works similarly to dplyr::rename
. However, in the absence of the =
syntax (where I think it is intutive for the new name to be on the left), I decided that it was more natural to write (old-name new-name)
.
> (dataframe-display (dataframe-rename* df (b Bee) (c Sea)))
dim: 3 rows x 3 cols
a Bee Sea
<num> <num> <num>
1. 4. 7.
2. 5. 8.
3. 6. 9.
When renaming all of the columns, dataframe-rename-all
allows for specifying all new names as a list rather than (old-name new-name)
pairs.
> (dataframe-display (dataframe-rename-all df '(A B C)))
dim: 3 rows x 3 cols
A B C
<num> <num> <num>
1. 4. 7.
2. 5. 8.
3. 6. 9.
Final thoughts
I haven't included any code showing how the procedures from the dataframe
library are implemented becuse they are so simple. They are simple because they don't do much. For example, dplyr::select
includes functionality that requires three procedures: dataframe-select
, dataframe-drop
, and dataframe-rename
. However, with simple Scheme code, I was able to implement procedures that cover cases representing 90% of my usage of dplyr::select
and dplyr::rename
.