Not too long ago, I was trying to find a simple way to get files that were in a particular year range in R. I tried writing regular expressions, but these always failed.
However, I eventually stumbled upon the ‘rebus‘ package that could make this extremely easy. Here, I have created a very simple demo to show the function ‘number_range’, which will generate the regular expression. You will need to download the files, which are just empty .rtf files. Download the files and put them in your working directory. I simply named the folder “DummyFiles”.
Finally, the regular expression that was generated by ‘number_range’ can now be used as the pattern in the ‘grep’ function and applied to the list of files.
# Specify a directory for the dummy rtf files.
DUMMY.DIR <- “./DummyFiles/”
# Get list of all file names in the dummy directory.
dummy.files <- list.files(paste0(DUMMY.DIR), recursive=T, full.names=T)
# Use the rebus package to generate a regular expression for a number range,
# which will allow for the extraction of specific years in the dummy data (i.e., 1948-1956).
rx <- number_range(1948, 1956)
# Now use the expression stored in rx, and trim list of files to to 1948:1956.
dummy.files <- grep(rx, dummy.files, value = TRUE, perl = TRUE)