Select, drop, and rename dataframe columns in Scheme

2020 Mar 29 Updated 2024 Mar 12 @travishinkelman.com

This post is the second in a series on the dataframe library for Scheme (R6RS). In this post, I will contrast the dataframe library with functions from the dplyr R package for selecting, dropping, and renaming columns.

Set up

First, let's create a very simple dataframe in both languages.

df <- data.frame("a" = 1:3, "b" = 4:6, "c" = 7:9)

(define df (make-df* (a 1 2 3) (b 4 5 6) (c 7 8 9)))

Select

With dplyr::select, we can select and re-order columns in a single statement using bare column names.

> dplyr::select(df, c, a)
  c a
1 7 1
2 8 2
3 9 3

With dataframe-select*, we can also select and re-order columns in a single statement using bare column names.

> (dataframe-display (dataframe-select* df c a))

 dim: 3 rows x 2 cols
       c       a 
   <num>   <num> 
      7.      1. 
      8.      2. 
      9.      3. 

Drop

dplyr::select also allows for dropping columns by prefixing column names with -.

> dplyr::select(df, -b)
  a c
1 1 7
2 2 8
3 3 9

In dataframe, dropping columns requires a separate procedure, dataframe-drop*.

> (dataframe-display (dataframe-drop* df b))

 dim: 3 rows x 2 cols
       a       c 
   <num>   <num> 
      1.      7. 
      2.      8. 
      3.      9. 

Rename

With dplyr::select, columns can be renamed during selection, but dplyr::rename allows for renaming without selection.

> dplyr::select(df, Bee = b, c)
  Bee c
1   4 7
2   5 8
3   6 9

> dplyr::rename(df, Bee = b, Sea = c)
  a Bee Sea
1 1   4   7
2 2   5   8
3 3   6   9

dataframe-select* does not allow for renaming during selection, but dataframe-rename works similarly to dplyr::rename. However, in the absence of the = syntax (where I think it is intutive for the new name to be on the left), I decided that it was more natural to write (old-name new-name).

> (dataframe-display (dataframe-rename* df (b Bee) (c Sea)))

 dim: 3 rows x 3 cols
       a     Bee     Sea 
   <num>   <num>   <num> 
      1.      4.      7. 
      2.      5.      8. 
      3.      6.      9. 

When renaming all of the columns, dataframe-rename-all allows for specifying all new names as a list rather than (old-name new-name) pairs.

> (dataframe-display (dataframe-rename-all df '(A B C)))

 dim: 3 rows x 3 cols
       A       B       C 
   <num>   <num>   <num> 
      1.      4.      7. 
      2.      5.      8. 
      3.      6.      9. 

Final thoughts

I haven't included any code showing how the procedures from the dataframe library are implemented becuse they are so simple. They are simple because they don't do much. For example, dplyr::select includes functionality that requires three procedures: dataframe-select, dataframe-drop, and dataframe-rename. However, with simple Scheme code, I was able to implement procedures that cover cases representing 90% of my usage of dplyr::select and dplyr::rename.