Title: | Make Labeling of R Data Sets Easy |
---|---|
Description: | Assign meaningful labels to data frame columns. 'labelmachine' manages your label assignment rules in 'yaml' files and makes it easy to use the same labels in multiple projects. |
Authors: | Adrian Maldet [aut, cre] |
Maintainer: | Adrian Maldet <[email protected]> |
License: | GPL-3 |
Version: | 1.0.0 |
Built: | 2024-11-16 04:34:57 UTC |
Source: | https://github.com/a-maldet/labelmachine |
This function allows two types of arguments:
named list: A named list object holding the translations.
data.frame: A data.frame with one ore more column pairs. Each column
pair consists of a column holding the original values, which should be replaced,
and a second character column holding the new labels which should be
assigned to the original values. Use the arguments col_old
and col_new
in order to define which columns are holding original values and which
columns hold the new labels. The names of the resulting translations
are defined by a character vector given in argument translation
.
Furthermore, each translation can have a different ordering which can be
configured by a character vector given in argument ordering
.
as.lama_dictionary(.data, ...) ## S3 method for class 'list' as.lama_dictionary(.data, ...) ## S3 method for class 'lama_dictionary' as.lama_dictionary(.data, ...) ## Default S3 method: as.lama_dictionary(.data = NULL, ...) ## S3 method for class 'data.frame' as.lama_dictionary(.data, translation, col_old, col_new, ordering = rep("row", length(translation)), ...)
as.lama_dictionary(.data, ...) ## S3 method for class 'list' as.lama_dictionary(.data, ...) ## S3 method for class 'lama_dictionary' as.lama_dictionary(.data, ...) ## Default S3 method: as.lama_dictionary(.data = NULL, ...) ## S3 method for class 'data.frame' as.lama_dictionary(.data, translation, col_old, col_new, ordering = rep("row", length(translation)), ...)
.data |
An object holding the translations.
|
... |
Various arguments, depending on the data type of |
translation |
A character vector holding the names of all translations |
col_old |
This argument is only used, if the argument given in |
col_new |
This argument is only used, if the argument given in |
ordering |
This argument is only used, if the argument given in
|
A new lama_dictionary class object holding the passed in translations.
A translation is a named character vector of non zero length.
This named character vector defines
which labels (of type character) should be assigned to which values
(can be of type character, logical or numeric)
(e.g. the translation c("0" = "urban", "1" = "rural")
assigns the label
"urban"
to the value 0
and "rural"
to the value 1
, for example the
variable x = c(0, 0, 1)
is translated to x_new = c("urban", "urban", "rural")
).
Therefore, a translation (named character vector) contains the following information:
The names of the character vector entries correspond to the
original variable levels.
Variables of types numeric
or logical
are turned automatically into a
character vector (e.g. 0
and 1
are treated like "0"
and "1"
).
The entries (character strings) of the character vector correspond to
the new labels, which will be assigned to the original variable levels.
It is also allowed to have missing labels (NA
s).
In this case, the original values are mapped onto missing values.
The function lama_translate()
is used in order to apply a translation on a variable.
The resulting vector with the assigned labels can be of the following types:
character: An unordered vector holding the new character labels.
factor with character levels: An ordered vector holding the new character labels.
The original variable can be of the following types:
character vector: This is the simplest case. The character values will replaced by the corresponding labels.
numeric or logical vector: Vectors of type numeric or logical will be turned into character vectors automatically before the translation process and then simply processed like in the character case. Therefore, it is sufficient to define the translation mapping for the character case, since it also covers the numeric and logical case.
factor vector with levels of any type: When translating factor variables one can decide whether or not to keep the original ordering. Like in the other cases the levels of the factor variable will always be turned into character strings before the translation process.
It is also possible to handle missing values with lama_translate()
.
Therefore, the used translation must contain a information that tells how
to handle a missing value. In order to define such a translation
the missing value (NA
) can be escaped with the character string "NA_"
.
This can be useful in two situations:
All missing values should be labeled
(e.g. the translation c("0" = "urban", "1" = "rural", NA_ = "missing")
assigns the character string "missing"
to all missing values of a variable).
Map some original values to NA
(e.g. the translation c("0" = "urban", "1" = "rural", "2" = "NA_", "3" = "NA_")
assigns NA
(the missing character) to the original values 2
and 3
).
Actually, in this case the translation definition does not always have to
use this escape mechanism, but only
when defining the translations inside of a YAML
file,
since the YAML
parser does not recognize missing values.
Each lama_dictionary class object can contain multiple translations,
each with a unique name under which the translation can be found.
The function lama_translate()
uses a lama_dictionary class object
to translate a normal vector
or to translate one or more columns in a
data.frame
.
Sometimes it may be necessary to have different translations
for the same variable, in this case it is best to have multiple
translations with different names
(e.g. area_short = c("0" = "urb", "1" = "rur")
and
area = c("0" = "urban", "1" = "rural")
).
## Example-1: Initialize a lama-dictionary from a list oject ## holding the translations obj <- list( country = c(uk = "United Kingdom", fr = "France", NA_ = "other countries"), language = c(en = "English", fr = "French") ) dict <- as.lama_dictionary(obj) dict ## Example-2: Initialize a lama-dictionary from a data frame ## holding the label assignment rules df_map <- data.frame( c_old = c("uk", "fr", NA), c_new = c("United Kingdom", "France", "other countries"), l_old = c("en", "fr", NA), l_new = factor(c("English", "French", NA), levels = c("French", "English")) ) dict <- as.lama_dictionary( df_map, translation = c("country", "language"), col_old = c("c_old", "l_old"), col_new = c("c_new", "l_new"), ordering = c("row", "new") ) # 'country' is ordered as in the 'df_map' # 'language' is ordered differently ("French" first) dict
## Example-1: Initialize a lama-dictionary from a list oject ## holding the translations obj <- list( country = c(uk = "United Kingdom", fr = "France", NA_ = "other countries"), language = c(en = "English", fr = "French") ) dict <- as.lama_dictionary(obj) dict ## Example-2: Initialize a lama-dictionary from a data frame ## holding the label assignment rules df_map <- data.frame( c_old = c("uk", "fr", NA), c_new = c("United Kingdom", "France", "other countries"), l_old = c("en", "fr", NA), l_new = factor(c("English", "French", NA), levels = c("French", "English")) ) dict <- as.lama_dictionary( df_map, translation = c("country", "language"), col_old = c("c_old", "l_old"), col_new = c("c_new", "l_new"), ordering = c("row", "new") ) # 'country' is ordered as in the 'df_map' # 'language' is ordered differently ("French" first) dict
lama_translate_all()
and lama_to_factor_all()
Check and translate function used by lama_translate_all()
and lama_to_factor_all()
check_and_translate_all(.data, dictionary, prefix, suffix, fn_colname, keep_order, to_factor, is_translated, err_handler)
check_and_translate_all(.data, dictionary, prefix, suffix, fn_colname, keep_order, to_factor, is_translated, err_handler)
.data |
Either a data frame, a factor or a vector. |
dictionary |
A lama_dictionary object, holding the translations for various variables. |
prefix |
A character string, which is used as prefix for the new column names. |
suffix |
A character string, which is used as suffix for the new column names. |
fn_colname |
A function, which transforms character string into a new character string. This function will be used to transform the old column names into new column names under which the labeled variables will then be stored. |
keep_order |
A logical of length one, defining if the original order (factor order or alphanumerical order) of the data frame variables should be preserved. |
to_factor |
A logical of length one, defining if the resulting labeled
variables should be factor variables ( |
is_translated |
A boolean vector of length one or the same length as the
number of translations. If the vector has length one, then the same
configuration is applied to all variable translations.
If |
err_handler |
An error handling function |
Checks arguments and translate a data.frame
check_and_translate_df(.data, dictionary, args, keep_order, to_factor, is_translated, err_handler)
check_and_translate_df(.data, dictionary, args, keep_order, to_factor, is_translated, err_handler)
.data |
Either a data frame, a factor or an atomic vector. |
dictionary |
A lama_dictionary object, holding the translations for various variables. |
args |
The list of arguments given in ... when calling |
keep_order |
A boolean vector of length one or the same length as the
number of translations. If the vector has length one, then the same
configuration is applied to all variable translations. If the vector has
the same length as the number of arguments in |
to_factor |
A boolean vector of length one or the same length as the
number of translations. If the vector has length one, then the same
configuration is applied to all variable translations. If the vector has
the same length as the number of arguments in |
is_translated |
A boolean vector of length one or the same length as the
number of translations. If the vector has length one, then the same
configuration is applied to all variable translations.
If |
err_handler |
An error handling function |
Checks arguments and translate a data.frame (standard eval)
check_and_translate_df_(.data, dictionary, translation, col, col_new, keep_order, to_factor, is_translated, err_handler)
check_and_translate_df_(.data, dictionary, translation, col, col_new, keep_order, to_factor, is_translated, err_handler)
.data |
Either a data frame, a factor or an atomic vector. |
dictionary |
A lama_dictionary object, holding the translations for various variables. |
translation |
A character vector holding the names of the variable
translations which
should be used for assigning new labels to the variable. This names must be
a subset of the translation names returned by |
col |
Only used if |
col_new |
Only used if |
keep_order |
A boolean vector of length one or the same length as the
number of translations. If the vector has length one, then the same
configuration is applied to all variable translations. If the vector has
the same length as the number of arguments in |
to_factor |
A boolean vector of length one or the same length as the
number of translations. If the vector has length one, then the same
configuration is applied to all variable translations. If the vector has
the same length as the number of arguments in |
is_translated |
A boolean vector of length one or the same length as the
number of translations. If the vector has length one, then the same
configuration is applied to all variable translations.
If |
err_handler |
An error handling function |
Checks arguments and translate a vector
check_and_translate_vector(.data, dictionary, args, keep_order, to_factor, is_translated, err_handler)
check_and_translate_vector(.data, dictionary, args, keep_order, to_factor, is_translated, err_handler)
.data |
Either a data frame, a factor or an atomic vector. |
dictionary |
A lama_dictionary object, holding the translations for various variables. |
args |
The list of arguments given in ... when calling |
keep_order |
A boolean vector of length one or the same length as the
number of translations. If the vector has length one, then the same
configuration is applied to all variable translations. If the vector has
the same length as the number of arguments in |
to_factor |
A boolean vector of length one or the same length as the
number of translations. If the vector has length one, then the same
configuration is applied to all variable translations. If the vector has
the same length as the number of arguments in |
is_translated |
A boolean vector of length one or the same length as the
number of translations. If the vector has length one, then the same
configuration is applied to all variable translations.
If |
err_handler |
An error handling function |
Checks arguments and translate a character vector (standard eval)
check_and_translate_vector_(.data, dictionary, translation, keep_order, to_factor, is_translated, err_handler)
check_and_translate_vector_(.data, dictionary, translation, keep_order, to_factor, is_translated, err_handler)
.data |
Either a data frame, a factor or an atomic vector. |
dictionary |
A lama_dictionary object, holding the translations for various variables. |
translation |
A character vector holding the names of the variable
translations which
should be used for assigning new labels to the variable. This names must be
a subset of the translation names returned by |
keep_order |
A boolean vector of length one or the same length as the
number of translations. If the vector has length one, then the same
configuration is applied to all variable translations. If the vector has
the same length as the number of arguments in |
to_factor |
A boolean vector of length one or the same length as the
number of translations. If the vector has length one, then the same
configuration is applied to all variable translations. If the vector has
the same length as the number of arguments in |
is_translated |
A boolean vector of length one or the same length as the
number of translations. If the vector has length one, then the same
configuration is applied to all variable translations.
If |
err_handler |
An error handling function |
lama_translate()
and lama_translate_()
Function that applies some general checks to the arguments of lama_translate()
and lama_translate_()
check_arguments(.data, dictionary, col_new, keep_order, to_factor, err_handler)
check_arguments(.data, dictionary, col_new, keep_order, to_factor, err_handler)
.data |
Either a data frame, a factor or an atomic vector. |
dictionary |
A lama_dictionary object, holding the translations for various variables. |
col_new |
Only used if |
keep_order |
A boolean vector of length one or the same length as the
number of translations. If the vector has length one, then the same
configuration is applied to all variable translations. If the vector has
the same length as the number of arguments in |
to_factor |
A boolean vector of length one or the same length as the
number of translations. If the vector has length one, then the same
configuration is applied to all variable translations. If the vector has
the same length as the number of arguments in |
err_handler |
An error handling function |
lama_rename()
and lama_rename_()
Function that checks the passed in arguments for lama_rename()
and lama_rename_()
check_rename(.data, old, new, err_handler)
check_rename(.data, old, new, err_handler)
.data |
A lama_dictionary object, holding the variable translations |
old |
A character vector holding the names of the variable translations, that should be renamed. |
new |
A character vector holding the new names of the variable translations. |
err_handler |
A error handling function |
lama_select()
and lama_select_()
Function that checks the passed in arguments for lama_select()
and lama_select_()
check_select(.data, key, err_handler)
check_select(.data, key, err_handler)
.data |
A lama_dictionary object, holding the variable translations |
key |
A character vector holding the names of the variable translations, that should be renamed. |
err_handler |
A error handling function |
The functions composerr()
, composerr_()
and composerr_parent()
modify error handlers by
appending character strings to the error messages of the error handling
functions:
composerr()
uses non-standard evaluation.
composerr_()
is the standard evaluation alternative of composerr()
.
composerr_parent()
is a wrapper of composerr()
, defining the parent
environment as the lookup environment of the err_handler
.
This function looks up the prior error handling function in the parent
environment of the current environment and allows you to store
the modified error handling function under the same name as the
error handling function from the parent environment without running into
recursion issues.
This is especially useful when doing error handling
in nested environments (e.g. checking nested list objects) and you don not
want to use different names for the error handling functions in the
nested levels.
If you don't have a nested environment situation, better use
composerr()
or composerr_()
.
composerr_(text_1 = NULL, err_prior = NULL, text_2 = NULL, sep_1 = ": ", sep_2 = ": ", env_prior = parent.frame()) composerr(text_1 = NULL, err_prior = NULL, text_2 = NULL, sep_1 = ": ", sep_2 = ": ", env_prior = parent.frame()) composerr_parent(text_1 = NULL, err_prior = NULL, text_2 = NULL, sep_1 = ": ", sep_2 = ": ", env_prior = parent.frame())
composerr_(text_1 = NULL, err_prior = NULL, text_2 = NULL, sep_1 = ": ", sep_2 = ": ", env_prior = parent.frame()) composerr(text_1 = NULL, err_prior = NULL, text_2 = NULL, sep_1 = ": ", sep_2 = ": ", env_prior = parent.frame()) composerr_parent(text_1 = NULL, err_prior = NULL, text_2 = NULL, sep_1 = ": ", sep_2 = ": ", env_prior = parent.frame())
text_1 |
A character string, which will be appended
at the beginning of the error message. The argument |
err_prior |
There are three valid types:
|
text_2 |
A character string, which will be appended
at the end of the error message. The argument |
sep_1 |
A character string that is used as separator for the
concatenation of |
sep_2 |
A character string that is used as separator for the
concatenation of |
env_prior |
An environment where the error handling function given in
|
A new error handling function that has an extended error message.
Check if a character vector contains NA replacement strings
contains_na_escape(x)
contains_na_escape(x)
x |
A character vector that should be checked. |
TRUE
if the vector contains NA replacement strings. FALSE
else.
In the lama_dictionary class object the data has the structure vars (named list) > translations (named character vector) This structure is transformed to the yaml file structure vars (named list) > translations (named list)
dictionary_to_yaml(data)
dictionary_to_yaml(data)
data |
A list that has lama-dictionary structure. |
An object similar to lama-dictionary object, but each translation is not a named character vector, but a named list holding character strings.
"NA_"
by NA
Replace "NA_"
by NA
escape_to_na(x)
escape_to_na(x)
x |
A character vector that should be modified. |
A character vector, where the NA replacement strings are replaced by NA
s.
Check if an object is a lama_dictionary class object
is.lama_dictionary(obj)
is.lama_dictionary(obj)
obj |
The object in question |
TRUE
if the object is a
lama_dictionary class object, FALSE
otherwise.
validate_lama_dictionary()
, as.lama_dictionary()
, new_lama_dictionary()
,
lama_translate()
, lama_to_factor()
, lama_translate_all()
,
lama_to_factor_all()
, lama_read()
, lama_write()
,
lama_translate()
, lama_read()
, lama_write()
, lama_select()
,
lama_rename()
, lama_mutate()
, lama_merge()
# check if an object is a 'lama_dictionary' class object dict <- new_lama_dictionary(country = c(uk = "United Kingdom", fr = "France")) is.lama_dictionary(dict)
# check if an object is a 'lama_dictionary' class object dict <- new_lama_dictionary(country = c(uk = "United Kingdom", fr = "France")) is.lama_dictionary(dict)
This function was suggested by 'Hadley Wickham' in a forum
is.syntactic(x)
is.syntactic(x)
x |
A character string that should be checked, if it contains a valid object name. |
TRUE
if valid, FALSE
else.
http://r.789695.n4.nabble.com/Syntactically-valid-names-td3636819.html
The functions lama_get()
and lama_get_()
take a
lama_dictionary and extract a specific translation.
The function lama_get()
uses non-standard evaluation, whereas
lama_get_()
is the standard evaluation alternative.
lama_get(.data, translation) ## S3 method for class 'lama_dictionary' lama_get(.data, translation) lama_get_(.data, translation) ## S3 method for class 'lama_dictionary' lama_get_(.data, translation)
lama_get(.data, translation) ## S3 method for class 'lama_dictionary' lama_get(.data, translation) lama_get_(.data, translation) ## S3 method for class 'lama_dictionary' lama_get_(.data, translation)
.data |
A lama_dictionary object |
translation |
Depending on which function was used:
|
The wanted translation (named character vector).
A translation is a named character vector of non zero length.
This named character vector defines
which labels (of type character) should be assigned to which values
(can be of type character, logical or numeric)
(e.g. the translation c("0" = "urban", "1" = "rural")
assigns the label
"urban"
to the value 0
and "rural"
to the value 1
, for example the
variable x = c(0, 0, 1)
is translated to x_new = c("urban", "urban", "rural")
).
Therefore, a translation (named character vector) contains the following information:
The names of the character vector entries correspond to the
original variable levels.
Variables of types numeric
or logical
are turned automatically into a
character vector (e.g. 0
and 1
are treated like "0"
and "1"
).
The entries (character strings) of the character vector correspond to
the new labels, which will be assigned to the original variable levels.
It is also allowed to have missing labels (NA
s).
In this case, the original values are mapped onto missing values.
The function lama_translate()
is used in order to apply a translation on a variable.
The resulting vector with the assigned labels can be of the following types:
character: An unordered vector holding the new character labels.
factor with character levels: An ordered vector holding the new character labels.
The original variable can be of the following types:
character vector: This is the simplest case. The character values will replaced by the corresponding labels.
numeric or logical vector: Vectors of type numeric or logical will be turned into character vectors automatically before the translation process and then simply processed like in the character case. Therefore, it is sufficient to define the translation mapping for the character case, since it also covers the numeric and logical case.
factor vector with levels of any type: When translating factor variables one can decide whether or not to keep the original ordering. Like in the other cases the levels of the factor variable will always be turned into character strings before the translation process.
It is also possible to handle missing values with lama_translate()
.
Therefore, the used translation must contain a information that tells how
to handle a missing value. In order to define such a translation
the missing value (NA
) can be escaped with the character string "NA_"
.
This can be useful in two situations:
All missing values should be labeled
(e.g. the translation c("0" = "urban", "1" = "rural", NA_ = "missing")
assigns the character string "missing"
to all missing values of a variable).
Map some original values to NA
(e.g. the translation c("0" = "urban", "1" = "rural", "2" = "NA_", "3" = "NA_")
assigns NA
(the missing character) to the original values 2
and 3
).
Actually, in this case the translation definition does not always have to
use this escape mechanism, but only
when defining the translations inside of a YAML
file,
since the YAML
parser does not recognize missing values.
Each lama_dictionary class object can contain multiple translations,
each with a unique name under which the translation can be found.
The function lama_translate()
uses a lama_dictionary class object
to translate a normal vector
or to translate one or more columns in a
data.frame
.
Sometimes it may be necessary to have different translations
for the same variable, in this case it is best to have multiple
translations with different names
(e.g. area_short = c("0" = "urb", "1" = "rur")
and
area = c("0" = "urban", "1" = "rural")
).
This function takes multiple lama_dictionary class
objects and merges them together into
a single lama_dictionary class object.
In case some class objects have entries with the same name, the
class objects passed in later overwrite the class objects passed in first
(e.g. in lama_merge(x, y, z)
: The lexicon z
overwrites
x
and y
. The lexicon y
overwrites x
).
lama_merge(..., show_warnings = TRUE) ## S3 method for class 'lama_dictionary' lama_merge(..., show_warnings = TRUE)
lama_merge(..., show_warnings = TRUE) ## S3 method for class 'lama_dictionary' lama_merge(..., show_warnings = TRUE)
... |
Two or more lama_dictionary class objects, which should be merged together. |
show_warnings |
A logical flag that defines, whether warnings should be
shown ( |
The merged lama_dictionary class object
lama_translate()
, lama_to_factor()
, lama_translate_all()
,
lama_to_factor_all()
, new_lama_dictionary()
,
as.lama_dictionary()
, lama_rename()
, lama_select()
, lama_mutate()
,
lama_read()
, lama_write()
# initialize lama_dictinoary dict_1 <- new_lama_dictionary( subject = c(en = "English", ma = "Mathematics"), result = c("1" = "Very good", "2" = "Good", "3" = "Not so good") ) dict_2 <- new_lama_dictionary( result = c("1" = "Super", "2" = "Fantastic", "3" = "Brilliant"), grade = c(a = "Primary School", b = "Secondary School") ) dict_3 <- new_lama_dictionary( country = c(en = "England", "at" = "Austria", NA_ = "Some other country") ) dict <- lama_merge(dict_1, dict_2, dict_3) # The lama_dictionary now contains the translations # 'subject', 'result', 'grade' and 'country' # The translation 'result' from 'dict_1' was overwritten by the 'result' in 'dict_2' dict
# initialize lama_dictinoary dict_1 <- new_lama_dictionary( subject = c(en = "English", ma = "Mathematics"), result = c("1" = "Very good", "2" = "Good", "3" = "Not so good") ) dict_2 <- new_lama_dictionary( result = c("1" = "Super", "2" = "Fantastic", "3" = "Brilliant"), grade = c(a = "Primary School", b = "Secondary School") ) dict_3 <- new_lama_dictionary( country = c(en = "England", "at" = "Austria", NA_ = "Some other country") ) dict <- lama_merge(dict_1, dict_2, dict_3) # The lama_dictionary now contains the translations # 'subject', 'result', 'grade' and 'country' # The translation 'result' from 'dict_1' was overwritten by the 'result' in 'dict_2' dict
The functions lama_mutate()
and lama_mutate_()
alter a
lama_dictionary object. They can be used to alter,
delete or append a translations to a
lama_dictionary object.
The function lama_mutate()
uses named arguments to assign the translations
to the new names (similar to dplyr::mutate
), whereas the function
lama_mutate_()
is takes a character string key
holding the
name to which the translation should be assigned and a named character
vector translation
holding the actual translation mapping.
lama_mutate(.data, ...) ## S3 method for class 'lama_dictionary' lama_mutate(.data, ...) lama_mutate_(.data, key, translation) ## S3 method for class 'lama_dictionary' lama_mutate_(.data, key, translation)
lama_mutate(.data, ...) ## S3 method for class 'lama_dictionary' lama_mutate(.data, ...) lama_mutate_(.data, key, translation) ## S3 method for class 'lama_dictionary' lama_mutate_(.data, key, translation)
.data |
A lama_dictionary object |
... |
One or more unquoted expressions separated by commas. Use named
arguments, e.g. |
key |
The name of the variable translation that should be altered. It can also be variable translation name that does not exist yet. |
translation |
A named character vector holding the new variable
translation that should be assigned to the name given in argument |
An updated lama_dictionary class object.
lama_translate()
, lama_to_factor()
, lama_translate_all()
,
lama_to_factor_all()
, new_lama_dictionary()
,
as.lama_dictionary()
, lama_rename()
, lama_select()
,
lama_merge()
, lama_read()
, lama_write()
# initialize lama_dictinoary dict <- new_lama_dictionary( subject = c(en = "English", ma = "Mathematics"), result = c("1" = "Very good", "2" = "Good", "3" = "Not so good") ) ## Example-1: mutate and append with 'lama_mutate' # add a few subjects and a few grades dict_new <- lama_mutate( dict, subject = c(bio = "Biology", subject, sp = "Sports"), result = c("0" = "Beyond expectations", result, "4" = "Failed", NA_ = "Missed") ) # the subjects "Biology" and "Sports" were added # and the results "Beyond expectations", "Failed" and "Missed" dict_new ## Example-2: delete with 'lama_mutate' dict_new <- lama_mutate( dict, subject = NULL ) dict_new ## Example-3: Alter and append with 'lama_mutate_' # generate the new translation (character string) subj <- c( bio = "Biology", lama_get(dict, subject), sp = "Sports" ) # save the translation under the name "subject" dict_new <- lama_mutate_( dict, key = "subject", translation = subj ) # the translation "subject" now also contains # the subjects "Biology" and "Sports" dict_new ## Example-4: Delete with 'lama_mutate_' # save the translation under the name "subject" dict_new <- lama_mutate_( dict, key = "subject", translation = NULL ) # the translation "subject" was deleted dict_new
# initialize lama_dictinoary dict <- new_lama_dictionary( subject = c(en = "English", ma = "Mathematics"), result = c("1" = "Very good", "2" = "Good", "3" = "Not so good") ) ## Example-1: mutate and append with 'lama_mutate' # add a few subjects and a few grades dict_new <- lama_mutate( dict, subject = c(bio = "Biology", subject, sp = "Sports"), result = c("0" = "Beyond expectations", result, "4" = "Failed", NA_ = "Missed") ) # the subjects "Biology" and "Sports" were added # and the results "Beyond expectations", "Failed" and "Missed" dict_new ## Example-2: delete with 'lama_mutate' dict_new <- lama_mutate( dict, subject = NULL ) dict_new ## Example-3: Alter and append with 'lama_mutate_' # generate the new translation (character string) subj <- c( bio = "Biology", lama_get(dict, subject), sp = "Sports" ) # save the translation under the name "subject" dict_new <- lama_mutate_( dict, key = "subject", translation = subj ) # the translation "subject" now also contains # the subjects "Biology" and "Sports" dict_new ## Example-4: Delete with 'lama_mutate_' # save the translation under the name "subject" dict_new <- lama_mutate_( dict, key = "subject", translation = NULL ) # the translation "subject" was deleted dict_new
yaml
file holding translations for one or multiple variablesRead in a yaml
file holding translations for one or multiple variables
lama_read(yaml_path)
lama_read(yaml_path)
yaml_path |
Path to yaml file holding the labels and translations for multiple variables |
A lama_dictionary class object holding the variable translations defined in the yaml file
path_to_file <- system.file("extdata", "dictionary_exams.yaml", package = "labelmachine") dict <- lama_read(path_to_file)
path_to_file <- system.file("extdata", "dictionary_exams.yaml", package = "labelmachine") dict <- lama_read(path_to_file)
The functions lama_rename()
and lama_rename_()
are used to rename one or more variable translations inside of a
lama_dictionary class object.
The function lama_rename()
uses non-standard evaluation,
whereas lama_rename_()
is the standard evaluation alternative.
lama_rename(.data, ...) ## S3 method for class 'lama_dictionary' lama_rename(.data, ...) lama_rename_(.data, old, new) ## S3 method for class 'lama_dictionary' lama_rename_(.data, old, new)
lama_rename(.data, ...) ## S3 method for class 'lama_dictionary' lama_rename(.data, ...) lama_rename_(.data, old, new) ## S3 method for class 'lama_dictionary' lama_rename_(.data, old, new)
.data |
A lama_dictionary object, holding the variable translations |
... |
One or more unquoted expressions separated by commas. Use named arguments, e.g. |
old |
A character vector holding the names of the variable translations, that should be renamed. |
new |
A character vector holding the new names of the variable translations. |
The updated lama_dictionary class object.
lama_translate()
, lama_to_factor()
, lama_translate_all()
,
lama_to_factor_all()
, new_lama_dictionary()
,
as.lama_dictionary()
, lama_select()
, lama_mutate()
,
lama_merge()
, lama_read()
, lama_write()
# initialize lama_dictinoary dict <- new_lama_dictionary( country = c(uk = "United Kingdom", fr = "France", NA_ = "other countries"), language = c(en = "English", fr = "French"), result = c("1" = "Very good", "2" = "Good", "3" = "Not so good") ) ## Example-1: Usage of 'lama_rename' # rename translations 'result' and 'language' to 'res' and 'lang' dict_new <- lama_rename(dict, res = result, lang = language) dict_new ## Example-2: Usage of 'lama_rename_' # rename translations 'result' and 'language' to 'res' and 'lang' dict_new <- lama_rename_(dict, c("result", "language"), c("res", "lang")) dict_new
# initialize lama_dictinoary dict <- new_lama_dictionary( country = c(uk = "United Kingdom", fr = "France", NA_ = "other countries"), language = c(en = "English", fr = "French"), result = c("1" = "Very good", "2" = "Good", "3" = "Not so good") ) ## Example-1: Usage of 'lama_rename' # rename translations 'result' and 'language' to 'res' and 'lang' dict_new <- lama_rename(dict, res = result, lang = language) dict_new ## Example-2: Usage of 'lama_rename_' # rename translations 'result' and 'language' to 'res' and 'lang' dict_new <- lama_rename_(dict, c("result", "language"), c("res", "lang")) dict_new
The functions lama_select()
and lama_select_()
pick one or more
variable translations from a lama_dictionary class object
and create a new lama_dictionary class object.
The function lama_select()
uses non-standard evaluation, whereas
lama_select_()
is the standard evaluation alternative.
lama_select(.data, ...) ## S3 method for class 'lama_dictionary' lama_select(.data, ...) lama_select_(.data, key) ## S3 method for class 'lama_dictionary' lama_select_(.data, key)
lama_select(.data, ...) ## S3 method for class 'lama_dictionary' lama_select(.data, ...) lama_select_(.data, key) ## S3 method for class 'lama_dictionary' lama_select_(.data, key)
.data |
A lama_dictionary object, holding the variable translations |
... |
One or more unquoted translation names separated by commas. |
key |
A character vector holding the names of the variable translations that should be picked. |
A new lama_dictionary class object, holding the picked variable translations.
lama_translate()
, lama_to_factor()
, lama_translate_all()
,
lama_to_factor_all()
, new_lama_dictionary()
,
as.lama_dictionary()
, lama_rename()
, lama_mutate()
,
lama_merge()
, lama_read()
, lama_write()
# initialize lama_dictinoary dict <- new_lama_dictionary( country = c(uk = "United Kingdom", fr = "France", NA_ = "other countries"), language = c(en = "English", fr = "French"), result = c("1" = "Very good", "2" = "Good", "3" = "Not so good") ) ## Example-1: Usage of 'lama_select' # pick the translations 'result' and 'language' # and add them to a new lama_dictionary dict_sub <- lama_select(dict, result, language) dict_sub ## Example-2: Usage of 'lama_select_' # pick the translations 'result' and 'language' # and add them to a new lama_dictionary dict_sub <- lama_select_(dict, c("result", "language")) dict_sub
# initialize lama_dictinoary dict <- new_lama_dictionary( country = c(uk = "United Kingdom", fr = "France", NA_ = "other countries"), language = c(en = "English", fr = "French"), result = c("1" = "Very good", "2" = "Good", "3" = "Not so good") ) ## Example-1: Usage of 'lama_select' # pick the translations 'result' and 'language' # and add them to a new lama_dictionary dict_sub <- lama_select(dict, result, language) dict_sub ## Example-2: Usage of 'lama_select_' # pick the translations 'result' and 'language' # and add them to a new lama_dictionary dict_sub <- lama_select_(dict, c("result", "language")) dict_sub
The functions lama_translate()
and lama_translate_()
take a factor,
a vector or a data.frame
and convert one or more of its categorical variables
(not necessarily a factor variable) into factor variables with new labels.
The function lama_translate()
uses non-standard evaluation, whereas
lama_translate_()
is the standard evaluation alternative.
The functions lama_to_factor()
and lama_to_factor_()
are very similar
to the functions lama_translate()
and lama_translate_()
, but instead
of assigning new label strings to values, it is assumed that the variables
are character vectors or factors, but need to be turned into factors
with the order given in the translations:
lama_translate()
and lama_translate_()
: Assign new labels to a variable
and turn it into a factor variable with the order given in the corresponding
translation (keep_order = FALSE
) or in the same order as the original
variable (keep_order = TRUE
).
lama_to_factor()
and lama_to_factor_()
: The variable is a character
vector or a factor already holding the right label strings. The variables
are turned into a factor variable with the order given in the corresponding
translation (keep_order = FALSE
) or in the same order as the original
variable (keep_order = TRUE
).
lama_translate(.data, dictionary, ..., keep_order = FALSE, to_factor = TRUE) ## S3 method for class 'data.frame' lama_translate(.data, dictionary, ..., keep_order = FALSE, to_factor = TRUE) ## Default S3 method: lama_translate(.data, dictionary, ..., keep_order = FALSE, to_factor = TRUE) lama_translate_(.data, dictionary, translation, col = translation, col_new = col, keep_order = FALSE, to_factor = TRUE, ...) ## S3 method for class 'data.frame' lama_translate_(.data, dictionary, translation, col = translation, col_new = col, keep_order = FALSE, to_factor = TRUE, ...) ## Default S3 method: lama_translate_(.data, dictionary, translation, ..., keep_order = FALSE, to_factor = TRUE) lama_to_factor(.data, dictionary, ..., keep_order = FALSE) ## S3 method for class 'data.frame' lama_to_factor(.data, dictionary, ..., keep_order = FALSE) ## Default S3 method: lama_to_factor(.data, dictionary, ..., keep_order = FALSE) lama_to_factor_(.data, dictionary, translation, col = translation, col_new = col, keep_order = FALSE, ...) ## S3 method for class 'data.frame' lama_to_factor_(.data, dictionary, translation, col = translation, col_new = col, keep_order = FALSE, ...) ## Default S3 method: lama_to_factor_(.data, dictionary, translation, ..., keep_order = FALSE)
lama_translate(.data, dictionary, ..., keep_order = FALSE, to_factor = TRUE) ## S3 method for class 'data.frame' lama_translate(.data, dictionary, ..., keep_order = FALSE, to_factor = TRUE) ## Default S3 method: lama_translate(.data, dictionary, ..., keep_order = FALSE, to_factor = TRUE) lama_translate_(.data, dictionary, translation, col = translation, col_new = col, keep_order = FALSE, to_factor = TRUE, ...) ## S3 method for class 'data.frame' lama_translate_(.data, dictionary, translation, col = translation, col_new = col, keep_order = FALSE, to_factor = TRUE, ...) ## Default S3 method: lama_translate_(.data, dictionary, translation, ..., keep_order = FALSE, to_factor = TRUE) lama_to_factor(.data, dictionary, ..., keep_order = FALSE) ## S3 method for class 'data.frame' lama_to_factor(.data, dictionary, ..., keep_order = FALSE) ## Default S3 method: lama_to_factor(.data, dictionary, ..., keep_order = FALSE) lama_to_factor_(.data, dictionary, translation, col = translation, col_new = col, keep_order = FALSE, ...) ## S3 method for class 'data.frame' lama_to_factor_(.data, dictionary, translation, col = translation, col_new = col, keep_order = FALSE, ...) ## Default S3 method: lama_to_factor_(.data, dictionary, translation, ..., keep_order = FALSE)
.data |
Either a data frame, a factor or an atomic vector. |
dictionary |
A lama_dictionary object, holding the translations for various variables. |
... |
Only used by |
keep_order |
A boolean vector of length one or the same length as the
number of translations. If the vector has length one, then the same
configuration is applied to all variable translations. If the vector has
the same length as the number of arguments in |
to_factor |
A boolean vector of length one or the same length as the
number of translations. If the vector has length one, then the same
configuration is applied to all variable translations. If the vector has
the same length as the number of arguments in |
translation |
A character vector holding the names of the variable
translations which
should be used for assigning new labels to the variable. This names must be
a subset of the translation names returned by |
col |
Only used if |
col_new |
Only used if |
The functions lama_translate()
, lama_translate_()
, lama_to_factor()
and lama_to_factor_()
require different
arguments, depending on the data type passed into argument .data
.
If .data
is of type character, logical, numeric or factor, then
the arguments col
and col_new
are omitted, since those are only
necessary in the case of data frames.
An extended data.frame, that has a factor variable holding the assigned labels.
lama_translate_all()
, lama_to_factor_all()
, new_lama_dictionary()
,
as.lama_dictionary()
, lama_rename()
, lama_select()
, lama_mutate()
,
lama_merge()
, lama_read()
, lama_write()
# initialize lama_dictinoary dict <- new_lama_dictionary( subject = c(en = "English", ma = "Mathematics"), result = c("1" = "Very good", "2" = "Good", "3" = "Not so good") ) # the data frame which should be translated df <- data.frame( pupil = c(1, 1, 2, 2, 3), subject = c("en", "ma", "ma", "en", "en"), res = c(1, 2, 3, 2, 2) ) ## Example-1: Usage of 'lama_translate' for data frames ## Full length assignment # (apply translation 'subject' to column 'subject' and save it to column 'subject_new') # (apply translation 'result' to column 'res' and save it to column 'res_new') df_new <- lama_translate( df, dict, sub_new = subject(subject), res_new = result(res) ) str(df_new) ## Example-2: Usage of 'lama_translate' for data frames ## Abbreviation overwriting original columns # (apply translation 'subject' to column 'subject' and save it to column 'subject') # (apply translation 'result' to column 'res' and save it to column 'res') df_new_overwritten <- lama_translate( df, dict, subject(subject), result(res) ) str(df_new_overwritten) ## Example-3: Usage of 'lama_translate' for data frames ## Abbreviation if `translation_name == column_name` # (apply translation 'subject' to column 'subject' and save it to column 'subject_new') # (apply translation 'result' to column 'res' and save it to column 'res_new') df_new_overwritten <- lama_translate( df, dict, subject_new = subject, res_new = result(res) ) str(df_new_overwritten) ## Example-4: Usage of 'lama_translate' for data frames labeling as character vectors # (apply translation 'subject' to column 'subject' and # save it as a character vector to column 'subject_new') df_new_overwritten <- lama_translate( df, dict, subject_new = subject, to_factor = TRUE ) str(df_new_overwritten) ## Example-5: Usage of 'lama_translate' for atomic vectors sub <- c("ma", "en", "ma") sub_new <- df_new_overwritten <- lama_translate( sub, dict, subject ) str(sub_new) ## Example-6: Usage of 'lama_translate' for factors sub <- factor(c("ma", "en", "ma"), levels = c("ma", "en")) sub_new <- df_new_overwritten <- lama_translate( sub, dict, subject, keep_order = TRUE ) str(sub_new) ## Example-7: Usage of 'lama_translate_' for data frames # (apply translation 'subject' to column 'subject' and save it to column 'subject_new') # (apply translation 'result' to column 'res' and save it to column 'res_new') df_new <- lama_translate_( df, dict, translation = c("subject", "result"), col = c("subject", "res"), col_new = c("subject_new", "res_new") ) str(df_new) ## Example-8: Usage of 'lama_translate_' for data frames and store as character vector # (apply translation 'subject' to column 'subject' and save it to column 'subject_new') # (apply translation 'result' to column 'res' and save it to column 'res_new') df_new <- lama_translate_( df, dict, translation = c("subject", "result"), col = c("subject", "res"), col_new = c("subject_new", "res_new"), to_factor = c(FALSE, FALSE) ) str(df_new) ## Example-9: Usage of 'lama_translate_' for atomic vectors res <- c(1, 2, 1, 3, 1, 2) res_new <- df_new_overwritten <- lama_translate_( res, dict, "result" ) str(res_new) ## Example-10: Usage of 'lama_translate_' for factors sub <- factor(c("ma", "en", "ma"), levels = c("ma", "en")) sub_new <- df_new_overwritten <- lama_translate_( sub, dict, "subject", keep_order = TRUE ) str(sub_new) # the data frame which holds the right labels, but no factors df_translated <- data.frame( pupil = c(1, 1, 2, 2, 3), subject = c("English", "Mathematics", "Mathematics", "English", "English"), res = c("Very good", "Good", "Not so good", "Good", "Good") ) ## Example-11: Usage of 'lama_to_factor' for data frames ## Full length assignment # (apply order of translation 'subject' to column 'subject' and save it to column 'subject_new') # (apply order of translation 'result' to column 'res' and save it to column 'res_new') df_new <- lama_to_factor( df_translated, dict, sub_new = subject(subject), res_new = result(res) ) str(df_new) ## Example-12: Usage of 'lama_to_factor' for data frames ## Abbreviation overwriting original columns # (apply order of translation 'subject' to column 'subject' and save it to column 'subject') # (apply order of translation 'result' to column 'res' and save it to column 'res') df_new_overwritten <- lama_to_factor( df_translated, dict, subject(subject), result(res) ) str(df_new_overwritten) ## Example-13: Usage of 'lama_to_factor' for data frames ## Abbreviation if `translation_name == column_name` # (apply order of translation 'subject' to column 'subject' and save it to column 'subject_new') # (apply order of translation 'result' to column 'res' and save it to column 'res_new') df_new_overwritten <- lama_to_factor( df_translated, dict, subject_new = subject, res_new = result(res) ) str(df_new_overwritten) ## Example-14: Usage of 'lama_translate' for atomic vectors var <- c("Mathematics", "English", "Mathematics") var_new <- lama_to_factor( var, dict, subject ) str(var_new) ## Example-15: Usage of 'lama_to_factor_' for data frames # (apply order of translation 'subject' to column 'subject' and save it to column 'subject_new') # (apply order of translation 'result' to column 'res' and save it to column 'res_new') df_new <- lama_to_factor_( df_translated, dict, translation = c("subject", "result"), col = c("subject", "res"), col_new = c("subject_new", "res_new") ) str(df_new) ## Example-16: Usage of 'lama_to_factor_' for atomic vectors var <- c("Very good", "Good", "Good") var_new <- lama_to_factor_( var, dict, "result" ) str(var_new)
# initialize lama_dictinoary dict <- new_lama_dictionary( subject = c(en = "English", ma = "Mathematics"), result = c("1" = "Very good", "2" = "Good", "3" = "Not so good") ) # the data frame which should be translated df <- data.frame( pupil = c(1, 1, 2, 2, 3), subject = c("en", "ma", "ma", "en", "en"), res = c(1, 2, 3, 2, 2) ) ## Example-1: Usage of 'lama_translate' for data frames ## Full length assignment # (apply translation 'subject' to column 'subject' and save it to column 'subject_new') # (apply translation 'result' to column 'res' and save it to column 'res_new') df_new <- lama_translate( df, dict, sub_new = subject(subject), res_new = result(res) ) str(df_new) ## Example-2: Usage of 'lama_translate' for data frames ## Abbreviation overwriting original columns # (apply translation 'subject' to column 'subject' and save it to column 'subject') # (apply translation 'result' to column 'res' and save it to column 'res') df_new_overwritten <- lama_translate( df, dict, subject(subject), result(res) ) str(df_new_overwritten) ## Example-3: Usage of 'lama_translate' for data frames ## Abbreviation if `translation_name == column_name` # (apply translation 'subject' to column 'subject' and save it to column 'subject_new') # (apply translation 'result' to column 'res' and save it to column 'res_new') df_new_overwritten <- lama_translate( df, dict, subject_new = subject, res_new = result(res) ) str(df_new_overwritten) ## Example-4: Usage of 'lama_translate' for data frames labeling as character vectors # (apply translation 'subject' to column 'subject' and # save it as a character vector to column 'subject_new') df_new_overwritten <- lama_translate( df, dict, subject_new = subject, to_factor = TRUE ) str(df_new_overwritten) ## Example-5: Usage of 'lama_translate' for atomic vectors sub <- c("ma", "en", "ma") sub_new <- df_new_overwritten <- lama_translate( sub, dict, subject ) str(sub_new) ## Example-6: Usage of 'lama_translate' for factors sub <- factor(c("ma", "en", "ma"), levels = c("ma", "en")) sub_new <- df_new_overwritten <- lama_translate( sub, dict, subject, keep_order = TRUE ) str(sub_new) ## Example-7: Usage of 'lama_translate_' for data frames # (apply translation 'subject' to column 'subject' and save it to column 'subject_new') # (apply translation 'result' to column 'res' and save it to column 'res_new') df_new <- lama_translate_( df, dict, translation = c("subject", "result"), col = c("subject", "res"), col_new = c("subject_new", "res_new") ) str(df_new) ## Example-8: Usage of 'lama_translate_' for data frames and store as character vector # (apply translation 'subject' to column 'subject' and save it to column 'subject_new') # (apply translation 'result' to column 'res' and save it to column 'res_new') df_new <- lama_translate_( df, dict, translation = c("subject", "result"), col = c("subject", "res"), col_new = c("subject_new", "res_new"), to_factor = c(FALSE, FALSE) ) str(df_new) ## Example-9: Usage of 'lama_translate_' for atomic vectors res <- c(1, 2, 1, 3, 1, 2) res_new <- df_new_overwritten <- lama_translate_( res, dict, "result" ) str(res_new) ## Example-10: Usage of 'lama_translate_' for factors sub <- factor(c("ma", "en", "ma"), levels = c("ma", "en")) sub_new <- df_new_overwritten <- lama_translate_( sub, dict, "subject", keep_order = TRUE ) str(sub_new) # the data frame which holds the right labels, but no factors df_translated <- data.frame( pupil = c(1, 1, 2, 2, 3), subject = c("English", "Mathematics", "Mathematics", "English", "English"), res = c("Very good", "Good", "Not so good", "Good", "Good") ) ## Example-11: Usage of 'lama_to_factor' for data frames ## Full length assignment # (apply order of translation 'subject' to column 'subject' and save it to column 'subject_new') # (apply order of translation 'result' to column 'res' and save it to column 'res_new') df_new <- lama_to_factor( df_translated, dict, sub_new = subject(subject), res_new = result(res) ) str(df_new) ## Example-12: Usage of 'lama_to_factor' for data frames ## Abbreviation overwriting original columns # (apply order of translation 'subject' to column 'subject' and save it to column 'subject') # (apply order of translation 'result' to column 'res' and save it to column 'res') df_new_overwritten <- lama_to_factor( df_translated, dict, subject(subject), result(res) ) str(df_new_overwritten) ## Example-13: Usage of 'lama_to_factor' for data frames ## Abbreviation if `translation_name == column_name` # (apply order of translation 'subject' to column 'subject' and save it to column 'subject_new') # (apply order of translation 'result' to column 'res' and save it to column 'res_new') df_new_overwritten <- lama_to_factor( df_translated, dict, subject_new = subject, res_new = result(res) ) str(df_new_overwritten) ## Example-14: Usage of 'lama_translate' for atomic vectors var <- c("Mathematics", "English", "Mathematics") var_new <- lama_to_factor( var, dict, subject ) str(var_new) ## Example-15: Usage of 'lama_to_factor_' for data frames # (apply order of translation 'subject' to column 'subject' and save it to column 'subject_new') # (apply order of translation 'result' to column 'res' and save it to column 'res_new') df_new <- lama_to_factor_( df_translated, dict, translation = c("subject", "result"), col = c("subject", "res"), col_new = c("subject_new", "res_new") ) str(df_new) ## Example-16: Usage of 'lama_to_factor_' for atomic vectors var <- c("Very good", "Good", "Good") var_new <- lama_to_factor_( var, dict, "result" ) str(var_new)
The functions lama_translate_all()
and lama_to_factor_all()
converts all variables (which have a translation in the given lama-dictionary)
of a data frame .data
into factor variables with new labels.
These functions are special versions of the functions lama_translate()
and lama_to_factor()
.
The difference to lama_translate()
and lama_to_factor()
is,
that when using lama_translate_all()
and lama_to_factor_all()
the used translations in dictionary
must have the exact
same names as the corresponding columns in the data frame .data
.
lama_translate_all(.data, dictionary, prefix = "", suffix = "", fn_colname = function(x) x, keep_order = FALSE, to_factor = TRUE) ## S3 method for class 'data.frame' lama_translate_all(.data, dictionary, prefix = "", suffix = "", fn_colname = function(x) x, keep_order = FALSE, to_factor = TRUE) lama_to_factor_all(.data, dictionary, prefix = "", suffix = "", fn_colname = function(x) x, keep_order = FALSE) ## S3 method for class 'data.frame' lama_to_factor_all(.data, dictionary, prefix = "", suffix = "", fn_colname = function(x) x, keep_order = FALSE)
lama_translate_all(.data, dictionary, prefix = "", suffix = "", fn_colname = function(x) x, keep_order = FALSE, to_factor = TRUE) ## S3 method for class 'data.frame' lama_translate_all(.data, dictionary, prefix = "", suffix = "", fn_colname = function(x) x, keep_order = FALSE, to_factor = TRUE) lama_to_factor_all(.data, dictionary, prefix = "", suffix = "", fn_colname = function(x) x, keep_order = FALSE) ## S3 method for class 'data.frame' lama_to_factor_all(.data, dictionary, prefix = "", suffix = "", fn_colname = function(x) x, keep_order = FALSE)
.data |
Either a data frame, a factor or a vector. |
dictionary |
A lama_dictionary object, holding the translations for various variables. |
prefix |
A character string, which is used as prefix for the new column names. |
suffix |
A character string, which is used as suffix for the new column names. |
fn_colname |
A function, which transforms character string into a new character string. This function will be used to transform the old column names into new column names under which the labeled variables will then be stored. |
keep_order |
A logical of length one, defining if the original order (factor order or alphanumerical order) of the data frame variables should be preserved. |
to_factor |
A logical of length one, defining if the resulting labeled
variables should be factor variables ( |
The difference between lama_translate_all()
and lama_to_factor_all()
is the following:
lama_translate_all()
: Assign new labels to the variables
and turn them into factor variables with the order given in the corresponding
translations (keep_order = FALSE
) or in the same order as the original
variable (keep_order = TRUE
).
lama_to_factor_all()
: The variables are character
vectors or factors already holding the right label strings. The variables
are turned into a factor variables with the order given in the corresponding
translation (keep_order = FALSE
) or in the same order as the original
variable (keep_order = TRUE
).
An extended data.frame, that has a factor variable holding the assigned labels.
lama_translate()
, lama_to_factor()
, new_lama_dictionary()
,
as.lama_dictionary()
, lama_rename()
, lama_select()
, lama_mutate()
,
lama_merge()
, lama_read()
, lama_write()
## initialize lama_dictinoary dict <- new_lama_dictionary( subject = c(en = "English", ma = "Mathematics"), result = c("1" = "Very good", "2" = "Good", "3" = "Not so good") ) ## data frame which should be translated df <- data.frame( pupil = c(1, 1, 2, 2, 3), subject = c("en", "ma", "ma", "en", "en"), result = c(1, 2, 3, 2, 2) ) ## Example-1: 'lama_translate_all'' df_new <- lama_translate_all( df, dict, prefix = "pre_", fn_colname = toupper, suffix = "_suf" ) str(df_new) ## Example-2: 'lama_translate_all' with 'to_factor = FALSE' # The resulting variables are plain character vectors df_new <- lama_translate_all(df, dict, suffix = "_new", to_factor = TRUE) str(df_new) ## Example-3: 'lama_to_factor_all' # The variables 'subject' and 'result' are turned into factor variables # The ordering is taken from the translations 'subject' and 'result' df_2 <- data.frame( pupil = c(1, 1, 2, 2, 3), subject = c("English", "Mathematics", "Mathematics", "English", "English"), result = c("Very good", "Good", "Good", "Very good", "Good") ) df_2_new <- lama_to_factor_all( df_2, dict, prefix = "pre_", fn_colname = toupper, suffix = "_suf" ) str(df_new)
## initialize lama_dictinoary dict <- new_lama_dictionary( subject = c(en = "English", ma = "Mathematics"), result = c("1" = "Very good", "2" = "Good", "3" = "Not so good") ) ## data frame which should be translated df <- data.frame( pupil = c(1, 1, 2, 2, 3), subject = c("en", "ma", "ma", "en", "en"), result = c(1, 2, 3, 2, 2) ) ## Example-1: 'lama_translate_all'' df_new <- lama_translate_all( df, dict, prefix = "pre_", fn_colname = toupper, suffix = "_suf" ) str(df_new) ## Example-2: 'lama_translate_all' with 'to_factor = FALSE' # The resulting variables are plain character vectors df_new <- lama_translate_all(df, dict, suffix = "_new", to_factor = TRUE) str(df_new) ## Example-3: 'lama_to_factor_all' # The variables 'subject' and 'result' are turned into factor variables # The ordering is taken from the translations 'subject' and 'result' df_2 <- data.frame( pupil = c(1, 1, 2, 2, 3), subject = c("English", "Mathematics", "Mathematics", "English", "English"), result = c("Very good", "Good", "Good", "Very good", "Good") ) df_2_new <- lama_to_factor_all( df_2, dict, prefix = "pre_", fn_colname = toupper, suffix = "_suf" ) str(df_new)
yaml
file holding translations for one or multiple variablesWrite a yaml
file holding translations for one or multiple variables
lama_write(x, yaml_path)
lama_write(x, yaml_path)
x |
A lama_dictionary class object holding the variable translations |
yaml_path |
File path, where the yaml file should be saved |
dict <- new_lama_dictionary(results = c(p = "Passed", f = "Failed")) path_to_file <- file.path(tempdir(), "my_dictionary.yaml") lama_write(dict, path_to_file)
dict <- new_lama_dictionary(results = c(p = "Passed", f = "Failed")) path_to_file <- file.path(tempdir(), "my_dictionary.yaml") lama_write(dict, path_to_file)
lapply
and sapply
with indexImprove base::lapply()
and base::sapply()
functions by allowing
an extra index argument .I
to be passed into the function given in FUN
.
If the function given in FUN
has an argument .I
then, for each entry
of X
passed into FUN
the corresponding index is passed into
argument .I
. If the function given in FUN
has no argument .I
,
then lapplI
and sapplI
are exactly the same as
base::lapply()
and base::sapply()
.
Besides this extra feature, there is no difference to base::lapply()
and
base::sapply()
.
lapplI(X, FUN, ...) sapplI(X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE)
lapplI(X, FUN, ...) sapplI(X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE)
X |
a vector (atomic or list) or an |
FUN |
Here comes the great difference to |
... |
optional arguments to |
simplify |
logical or character string; should the result be
simplified to a vector, matrix or higher dimensional array if
possible? For |
USE.NAMES |
logical; if |
In order to replace NA
values in yaml files and in translations
the following character string is used
NA_lama_
NA_lama_
An object of class character
of length 1.
NA
by "NA_"
Replace NA
by "NA_"
na_to_escape(x)
na_to_escape(x)
x |
A character vector that should be modified. |
A character vector, where the NA
s are replaced.
lapply
from a character vectorCreate a named list with lapply
from a character vector
named_lapply(.names, FUN, ...)
named_lapply(.names, FUN, ...)
.names |
A character vector holding the names of the list |
FUN |
Here comes the great difference to |
... |
optional arguments to |
A named list
Generates an S3 class object, which holds the variable translations.
There are three valid ways to use new_lama_dictionary
in order to create a
lama_dictionary
class object:
No arguments were passed into ...
: In this case new_lama_dictionary
returns an empty lama_dictionary class object (e.g. dict <- new_lama_dictionary()
).
The first argument is a list: In this case only the first argument of
new_lama_dictionary
is used. It is not necessary to pass in a named argument.
The passed in object must be a named list object, which contains all
translations that should be added to the new lama_dictionary class object.
Each item of the named list object must be a named character vector defining a translation
(e.g. new_lama_dictionary(list(area = c("0" = "urban", "1" = "rural"), = c(l = "Low", h = "High")))
generates a lama_dictionary class object holding the translations "area"
and "density"
).
The first argument is a character vector: In this case, it is allowed to pass in
more than one argument. In this case, all given arguments must be named arguments
holding named character vectors defining translations
(e.g. new_lama_dictionary(area = c("0" = "urban", "1" = "rural"), density = c(l = "Low", h = "High"))
generates a lama_dictionary class object holding the translations "area"
and "density"
).
The names of the passed in arguments will be used as the names,
under which the given translations
will be added to the new lama_dictionary class object.
new_lama_dictionary(...) ## S3 method for class 'list' new_lama_dictionary(.data = NULL, ...) ## S3 method for class 'character' new_lama_dictionary(...) ## Default S3 method: new_lama_dictionary(...)
new_lama_dictionary(...) ## S3 method for class 'list' new_lama_dictionary(.data = NULL, ...) ## S3 method for class 'character' new_lama_dictionary(...) ## Default S3 method: new_lama_dictionary(...)
... |
None, one or more named/unnamed arguments. Depending on the type of
the type of the first argument passed into
|
.data |
A named list object, where each list entry corresponds to a
translation that should be added to the lama_dictionary object
(e.g. |
A new lama_dictionary class object holding the passed in translations.
A translation is a named character vector of non zero length.
This named character vector defines
which labels (of type character) should be assigned to which values
(can be of type character, logical or numeric)
(e.g. the translation c("0" = "urban", "1" = "rural")
assigns the label
"urban"
to the value 0
and "rural"
to the value 1
, for example the
variable x = c(0, 0, 1)
is translated to x_new = c("urban", "urban", "rural")
).
Therefore, a translation (named character vector) contains the following information:
The names of the character vector entries correspond to the
original variable levels.
Variables of types numeric
or logical
are turned automatically into a
character vector (e.g. 0
and 1
are treated like "0"
and "1"
).
The entries (character strings) of the character vector correspond to
the new labels, which will be assigned to the original variable levels.
It is also allowed to have missing labels (NA
s).
In this case, the original values are mapped onto missing values.
The function lama_translate()
is used in order to apply a translation on a variable.
The resulting vector with the assigned labels can be of the following types:
character: An unordered vector holding the new character labels.
factor with character levels: An ordered vector holding the new character labels.
The original variable can be of the following types:
character vector: This is the simplest case. The character values will replaced by the corresponding labels.
numeric or logical vector: Vectors of type numeric or logical will be turned into character vectors automatically before the translation process and then simply processed like in the character case. Therefore, it is sufficient to define the translation mapping for the character case, since it also covers the numeric and logical case.
factor vector with levels of any type: When translating factor variables one can decide whether or not to keep the original ordering. Like in the other cases the levels of the factor variable will always be turned into character strings before the translation process.
It is also possible to handle missing values with lama_translate()
.
Therefore, the used translation must contain a information that tells how
to handle a missing value. In order to define such a translation
the missing value (NA
) can be escaped with the character string "NA_"
.
This can be useful in two situations:
All missing values should be labeled
(e.g. the translation c("0" = "urban", "1" = "rural", NA_ = "missing")
assigns the character string "missing"
to all missing values of a variable).
Map some original values to NA
(e.g. the translation c("0" = "urban", "1" = "rural", "2" = "NA_", "3" = "NA_")
assigns NA
(the missing character) to the original values 2
and 3
).
Actually, in this case the translation definition does not always have to
use this escape mechanism, but only
when defining the translations inside of a YAML
file,
since the YAML
parser does not recognize missing values.
Each lama_dictionary class object can contain multiple translations,
each with a unique name under which the translation can be found.
The function lama_translate()
uses a lama_dictionary class object
to translate a normal vector
or to translate one or more columns in a
data.frame
.
Sometimes it may be necessary to have different translations
for the same variable, in this case it is best to have multiple
translations with different names
(e.g. area_short = c("0" = "urb", "1" = "rur")
and
area = c("0" = "urban", "1" = "rural")
).
is.lama_dictionary()
, as.lama_dictionary()
, lama_translate()
,
lama_to_factor()
, lama_translate_all()
, lama_to_factor_all()
,
lama_read()
, lama_write()
,
lama_select()
, lama_rename()
, lama_mutate()
, lama_merge()
## Example-1: Initialize a lama-dictionary from a list object ## holding the translations dict <- new_lama_dictionary(list( country = c(uk = "United Kingdom", fr = "France", NA_ = "other countries"), language = c(en = "English", fr = "French") )) dict ## Example-2: Initialize the lama-dictionary directly ## by assigning each translation to a name dict <- new_lama_dictionary( country = c(uk = "United Kingdom", fr = "France", NA_ = "other countries"), language = c(en = "English", fr = "French") ) dict
## Example-1: Initialize a lama-dictionary from a list object ## holding the translations dict <- new_lama_dictionary(list( country = c(uk = "United Kingdom", fr = "France", NA_ = "other countries"), language = c(en = "English", fr = "French") )) dict ## Example-2: Initialize the lama-dictionary directly ## by assigning each translation to a name dict <- new_lama_dictionary( country = c(uk = "United Kingdom", fr = "France", NA_ = "other countries"), language = c(en = "English", fr = "French") ) dict
Print a lama_dictionary class object
## S3 method for class 'lama_dictionary' print(x, ...)
## S3 method for class 'lama_dictionary' print(x, ...)
x |
The lama_dictionary class object that should be printed. |
... |
Unused arguments |
new_lama_dictionary()
, as.lama_dictionary()
,
lama_translate()
, lama_to_factor()
, lama_translate_all()
,
lama_to_factor_all()
, lama_read()
, lama_write()
,
lama_rename()
, lama_select()
, lama_mutate()
, lama_merge()
,
lama_read()
, lama_write()
Function that actually performs the renaming of the translations
rename_translation(.data, old, new)
rename_translation(.data, old, new)
.data |
A lama_dictionary object, holding the variable translations |
old |
A character vector holding the names of the variable translations, that should be renamed. |
new |
A character vector holding the new names of the variable translations. |
The updated lama_dictionary class object.
'x1', 'x2', ...
)Coerce a vector into a character string ('x1', 'x2', ...
)
stringify(x)
stringify(x)
x |
A vector that should be coerced. |
A character string holding the collapsed vector.
This function relabels several variables in a data.frame
translate_df(.data, dictionary, translation, col, col_new, keep_order, to_factor, is_translated, err_handler)
translate_df(.data, dictionary, translation, col, col_new, keep_order, to_factor, is_translated, err_handler)
.data |
Either a data frame, a factor or an atomic vector. |
dictionary |
A lama_dictionary object, holding the translations for various variables. |
translation |
A character vector holding the names of the variable
translations which
should be used for assigning new labels to the variable. This names must be
a subset of the translation names returned by |
col |
Only used if |
col_new |
Only used if |
keep_order |
A boolean vector of length one or the same length as the
number of translations. If the vector has length one, then the same
configuration is applied to all variable translations. If the vector has
the same length as the number of arguments in |
to_factor |
A boolean vector of length one or the same length as the
number of translations. If the vector has length one, then the same
configuration is applied to all variable translations. If the vector has
the same length as the number of arguments in |
is_translated |
A boolean vector of length one or the same length as the
number of translations. If the vector has length one, then the same
configuration is applied to all variable translations.
If |
err_handler |
An error handling function |
An factor vector holding the assigned labels.
This function relabels a vector
translate_vector(val, translation, keep_order, to_factor, is_translated, err_handler)
translate_vector(val, translation, keep_order, to_factor, is_translated, err_handler)
val |
The vector that should be relabeled. Allowed are all vector types (also factor). |
translation |
Named character vector holding the label assignments. |
keep_order |
A logical flag. If the vector in |
to_factor |
A logical flag. If set to |
is_translated |
A logical flag. If |
err_handler |
An error handling function |
A factor vector holding the assigned labels
This function checks if the object structure is right. It does not check class type.
validate_lama_dictionary(obj, err_handler = composerr("The object has not a valid lama_dictionary structure"))
validate_lama_dictionary(obj, err_handler = composerr("The object has not a valid lama_dictionary structure"))
obj |
An object that should be tested |
err_handler |
An error handling function |
A translation is a named character vector of non zero length.
This named character vector defines
which labels (of type character) should be assigned to which values
(can be of type character, logical or numeric)
(e.g. the translation c("0" = "urban", "1" = "rural")
assigns the label
"urban"
to the value 0
and "rural"
to the value 1
, for example the
variable x = c(0, 0, 1)
is translated to x_new = c("urban", "urban", "rural")
).
Therefore, a translation (named character vector) contains the following information:
The names of the character vector entries correspond to the
original variable levels.
Variables of types numeric
or logical
are turned automatically into a
character vector (e.g. 0
and 1
are treated like "0"
and "1"
).
The entries (character strings) of the character vector correspond to
the new labels, which will be assigned to the original variable levels.
It is also allowed to have missing labels (NA
s).
In this case, the original values are mapped onto missing values.
The function lama_translate()
is used in order to apply a translation on a variable.
The resulting vector with the assigned labels can be of the following types:
character: An unordered vector holding the new character labels.
factor with character levels: An ordered vector holding the new character labels.
The original variable can be of the following types:
character vector: This is the simplest case. The character values will replaced by the corresponding labels.
numeric or logical vector: Vectors of type numeric or logical will be turned into character vectors automatically before the translation process and then simply processed like in the character case. Therefore, it is sufficient to define the translation mapping for the character case, since it also covers the numeric and logical case.
factor vector with levels of any type: When translating factor variables one can decide whether or not to keep the original ordering. Like in the other cases the levels of the factor variable will always be turned into character strings before the translation process.
It is also possible to handle missing values with lama_translate()
.
Therefore, the used translation must contain a information that tells how
to handle a missing value. In order to define such a translation
the missing value (NA
) can be escaped with the character string "NA_"
.
This can be useful in two situations:
All missing values should be labeled
(e.g. the translation c("0" = "urban", "1" = "rural", NA_ = "missing")
assigns the character string "missing"
to all missing values of a variable).
Map some original values to NA
(e.g. the translation c("0" = "urban", "1" = "rural", "2" = "NA_", "3" = "NA_")
assigns NA
(the missing character) to the original values 2
and 3
).
Actually, in this case the translation definition does not always have to
use this escape mechanism, but only
when defining the translations inside of a YAML
file,
since the YAML
parser does not recognize missing values.
Each lama_dictionary class object can contain multiple translations,
each with a unique name under which the translation can be found.
The function lama_translate()
uses a lama_dictionary class object
to translate a normal vector
or to translate one or more columns in a
data.frame
.
Sometimes it may be necessary to have different translations
for the same variable, in this case it is best to have multiple
translations with different names
(e.g. area_short = c("0" = "urb", "1" = "rur")
and
area = c("0" = "urban", "1" = "rural")
).
is.lama_dictionary()
, as.lama_dictionary()
, new_lama_dictionary()
,
lama_translate()
, lama_to_factor()
, lama_translate_all()
,
lama_to_factor_all()
, lama_read()
, lama_write()
,
lama_select()
,
lama_rename()
, lama_mutate()
, lama_merge()
This function checks if the object structure is that of a translation (named character vector).
validate_translation(obj, err_handler = composerr("The object has not a valid translation structure"))
validate_translation(obj, err_handler = composerr("The object has not a valid translation structure"))
obj |
An object that should be tested |
err_handler |
An error handling function |
When a yaml file is read in, the data has the structure vars (named list) > translations (named list) This structure is transformed to the lama_dictionary class input structure vars (named list) > translations (named character vector)
yaml_to_dictionary(data)
yaml_to_dictionary(data)
data |
An object similar to a lama-dictionary object, but each translation is not a named character vector, but a named list holding character strings. |
A list that has lama-dictionary structure.