labelmachine
is an
R package that helps assigning meaningful labels to R
data sets. Manage your labels in yaml files, so called
lama-dictionary files. This makes it very easy using
the same label translations in multiple projects that share similar data
structure.
Labeling your data can be easy!
Let us assume, you want to label the following data frame
df
:
df <- data.frame(
pupil_id = rep(1:4, each = 3),
subject = rep(c("eng", "mat", "gym"), 4),
result = c(1, 2, 2, NA, 2, NA, 1, 0, 1, 2, 3, NA),
stringsAsFactors = FALSE
)
df
#> pupil_id subject result
#> 1 1 eng 1
#> 2 1 mat 2
#> 3 1 gym 2
#> 4 2 eng NA
#> 5 2 mat 2
#> 6 2 gym NA
#> 7 3 eng 1
#> 8 3 mat 0
#> 9 3 gym 1
#> 10 4 eng 2
#> 11 4 mat 3
#> 12 4 gym NA
The column subject
contains the subject codes the pupils
were tested in and result
contains the test results.
In order to assign labels to the values in the columns of
df
, we need a lama-dictionary which holds
the translations of the variables. With the command
new_lama_dictionary()
we can create such a
lama-dictionary:
library(labelmachine)
dict <- new_lama_dictionary(
sub = c(eng = "English", mat = "Mathematics", gym = "Gymnastics"),
res = c(
"1" = "Good",
"2" = "Passed",
"3" = "Not passed",
"4" = "Not passed",
NA_ = "Missed",
"0" = NA
)
)
dict
#>
#> --- lama_dictionary ---
#> Variable 'sub':
#> eng mat gym
#> "English" "Mathematics" "Gymnastics"
#>
#> Variable 'res':
#> 1 2 3 4 NA_ 0
#> "Good" "Passed" "Not passed" "Not passed" "Missed" NA
Each entry in dict
is a translation for a variable
(column) of the data frame df
. The translation
sub
can be used to assign labels to the values given in
column subject
in df
and translation
res
can be used to assign labels to the values in column
result
in df
. The expression NA_
is used to escape the missing value symbol NA
. Hence, the
last assignment NA_ = "Missed"
defines that missing values
should be labeled with the string "Missed"
. For further
details on creating lama-dictionaries see Creating
lama-dictionaries and Altering
lama-dictionaries.
With the command lama_translate
, we can use the
lama-dictionary dict
in order to translate the variables
given in data frame df
:
df_new <- lama_translate(
df,
dict,
subject_lab = sub(subject),
result_lab = res(result)
)
str(df_new)
#> 'data.frame': 12 obs. of 5 variables:
#> $ pupil_id : int 1 1 1 2 2 2 3 3 3 4 ...
#> $ subject : chr "eng" "mat" "gym" "eng" ...
#> $ result : num 1 2 2 NA 2 NA 1 0 1 2 ...
#> $ subject_lab: Factor w/ 3 levels "English","Mathematics",..: 1 2 3 1 2 3 1 2 3 1 ...
#> $ result_lab : Factor w/ 4 levels "Good","Passed",..: 1 2 2 4 2 4 1 NA 1 2 ...
As we can see, the resulting data frame df_new
now holds
two extra columns subject_lab
and result_lab
holding the factor variables with the right labels. The command
lama_translate
has multiple features, which are described
in more detail in Translating
variables.
With the command lama_write
it is possible to save the
lama-dictionary object to a yaml file:
The resulting yaml file is a plain text file with a special text structure, see dictionary.yaml.
Lama-dictionary files make it easy to share lama-dictionaries with
other projects holding similar data structures. With the command
lama_read
a lama-dictionary file can be read: