Title: | Viral Load and CD4 Lymphocytes Regression Models |
---|---|
Description: | Provides a comprehensive framework for building, evaluating, and visualizing regression models for analyzing viral load and CD4 (Cluster of Differentiation 4) lymphocytes data. It leverages the principles of the tidymodels ecosystem of Max Kuhn and Hadley Wickham (2020) <https://www.tidymodels.org> to offer a user-friendly experience in model development. This package includes functions for data preprocessing, feature engineering, model training, tuning, and evaluation, along with visualization tools to enhance the interpretation of model results. It is specifically designed for researchers in biostatistics, computational biology, and HIV research who aim to perform reproducible and rigorous analyses to gain insights into disease dynamics. The main focus is on improving the understanding of the relationships between viral load, CD4 lymphocytes, and other relevant covariates to contribute to HIV research and the visibility of vulnerable seropositive populations. |
Authors: | Juan Pablo Acuña González [aut, cre] |
Maintainer: | Juan Pablo Acuña González <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.3.2 |
Built: | 2024-12-27 19:20:18 UTC |
Source: | https://github.com/juanv66x/viralmodels |
Returns performance metrics for a selected model
viralmodel( traindata, semilla, target, viralvars, logbase, pliegues, repeticiones, rejilla, modelo )
viralmodel( traindata, semilla, target, viralvars, logbase, pliegues, repeticiones, rejilla, modelo )
traindata |
A data frame |
semilla |
A numeric value |
target |
A character value |
viralvars |
Vector of variable names related to viral data. |
logbase |
The base for logarithmic transformations. |
pliegues |
A numeric value |
repeticiones |
A numeric value |
rejilla |
A numeric value |
modelo |
A character value |
A table with a single model hyperparameters
library(tidyverse) library(baguette) library(kernlab) library(kknn) library(ranger) library(rules) library(glmnet) # Define the function to impute values in the undetectable range set.seed(123) impute_undetectable <- function(column) { ifelse(column <= 40, rexp(sum(column <= 40), rate = 1/13) + 1, column) } # Apply the function to all vl columns using purrr's map_dfc library(viraldomain) data("viral", package = "viraldomain") viral_imputed <- viral |> mutate(across(starts_with("vl"), ~impute_undetectable(.x))) traindata <- viral_imputed semilla <- 1501 target <- "cd_2022" viralvars <- c("vl_2019", "vl_2021", "vl_2022") logbase <- 10 pliegues <- 2 repeticiones <- 1 rejilla <- 1 modelo <- "simple_rf" set.seed(123) viralmodel(traindata, semilla, target, viralvars, logbase, pliegues, repeticiones, rejilla, modelo)
library(tidyverse) library(baguette) library(kernlab) library(kknn) library(ranger) library(rules) library(glmnet) # Define the function to impute values in the undetectable range set.seed(123) impute_undetectable <- function(column) { ifelse(column <= 40, rexp(sum(column <= 40), rate = 1/13) + 1, column) } # Apply the function to all vl columns using purrr's map_dfc library(viraldomain) data("viral", package = "viraldomain") viral_imputed <- viral |> mutate(across(starts_with("vl"), ~impute_undetectable(.x))) traindata <- viral_imputed semilla <- 1501 target <- "cd_2022" viralvars <- c("vl_2019", "vl_2021", "vl_2022") logbase <- 10 pliegues <- 2 repeticiones <- 1 rejilla <- 1 modelo <- "simple_rf" set.seed(123) viralmodel(traindata, semilla, target, viralvars, logbase, pliegues, repeticiones, rejilla, modelo)
This function predicts viral load or CD4 count values based on multiple machine learning models using cross-validation. It allows users to specify two types of predictions: normal predictions on the full dataset or observation-by-observation (obs-by-obs) predictions.
viralpreds( target, pliegues, repeticiones, rejilla, semilla, data, prediction_type = "full" )
viralpreds( target, pliegues, repeticiones, rejilla, semilla, data, prediction_type = "full" )
target |
A character string specifying the column name of the target variable to predict. |
pliegues |
An integer specifying the number of folds for cross-validation. |
repeticiones |
An integer specifying the number of times the cross-validation should be repeated. |
rejilla |
An integer specifying the number of grid search iterations for tuning hyperparameters. |
semilla |
An integer specifying the seed for random number generation to ensure reproducibility. |
data |
A data frame containing the predictors and the target variable. |
prediction_type |
A character string specifying the type of predictions to perform.
Use |
A list containing two elements: predictions
(a vector of predicted values for the target variable)
and RMSE
(the root mean square error of the best model).
library(tidyverse) library(baguette) library(kernlab) library(kknn) library(ranger) library(rules) library(glmnet) # Define the function to impute values in the undetectable range set.seed(123) impute_undetectable <- function(column) { ifelse(column <= 40, rexp(sum(column <= 40), rate = 1/13) + 1, column) } # Apply the function to all vl columns using purrr's map_dfc library(viraldomain) data("viral", package = "viraldomain") viral_imputed <- viral |> mutate(across(starts_with("vl"), ~impute_undetectable(.x))) traindata <- viral_imputed target <- "cd_2022" viralvars <- c("vl_2019", "vl_2021", "vl_2022") logbase <- 10 pliegues <- 5 repeticiones <- 2 rejilla <- 2 semilla <- 123 viralpreds(target, pliegues, repeticiones, rejilla, semilla, traindata)
library(tidyverse) library(baguette) library(kernlab) library(kknn) library(ranger) library(rules) library(glmnet) # Define the function to impute values in the undetectable range set.seed(123) impute_undetectable <- function(column) { ifelse(column <= 40, rexp(sum(column <= 40), rate = 1/13) + 1, column) } # Apply the function to all vl columns using purrr's map_dfc library(viraldomain) data("viral", package = "viraldomain") viral_imputed <- viral |> mutate(across(starts_with("vl"), ~impute_undetectable(.x))) traindata <- viral_imputed target <- "cd_2022" viralvars <- c("vl_2019", "vl_2021", "vl_2022") logbase <- 10 pliegues <- 5 repeticiones <- 2 rejilla <- 2 semilla <- 123 viralpreds(target, pliegues, repeticiones, rejilla, semilla, traindata)
Trains and optimizes a series of regression models for viral load or CD4 counts
viraltab( traindata, semilla, target, viralvars, logbase, pliegues, repeticiones, rejilla, rank_output = TRUE )
viraltab( traindata, semilla, target, viralvars, logbase, pliegues, repeticiones, rejilla, rank_output = TRUE )
traindata |
A data frame |
semilla |
A numeric value |
target |
A character value |
viralvars |
Vector of variable names related to viral data. |
logbase |
The base for logarithmic transformations. |
pliegues |
A numeric value |
repeticiones |
A numeric value |
rejilla |
A numeric value |
rank_output |
Logical value. If TRUE, returns ranked output; if FALSE, returns unranked output. |
A table of competing models
library(dplyr) library(baguette) library(kernlab) library(kknn) library(ranger) library(rules) library(glmnet) # Define the function to impute values in the undetectable range impute_undetectable <- function(column) { set.seed(123) ifelse(column <= 40, rexp(sum(column <= 40), rate = 1/13) + 1, column) } library(viraldomain) data("viral", package = "viraldomain") viral_imputed <- viral |> mutate(across(starts_with("vl"), ~impute_undetectable(.x))) traindata <- viral_imputed semilla <- 1501 target <- "cd_2022" viralvars <- c("vl_2019", "vl_2021", "vl_2022") logbase <- 10 pliegues <- 2 repeticiones <- 1 rejilla <- 1 set.seed(123) viraltab(traindata, semilla, target, viralvars, logbase, pliegues, repeticiones, rejilla, rank_output = TRUE)
library(dplyr) library(baguette) library(kernlab) library(kknn) library(ranger) library(rules) library(glmnet) # Define the function to impute values in the undetectable range impute_undetectable <- function(column) { set.seed(123) ifelse(column <= 40, rexp(sum(column <= 40), rate = 1/13) + 1, column) } library(viraldomain) data("viral", package = "viraldomain") viral_imputed <- viral |> mutate(across(starts_with("vl"), ~impute_undetectable(.x))) traindata <- viral_imputed semilla <- 1501 target <- "cd_2022" viralvars <- c("vl_2019", "vl_2021", "vl_2022") logbase <- 10 pliegues <- 2 repeticiones <- 1 rejilla <- 1 set.seed(123) viraltab(traindata, semilla, target, viralvars, logbase, pliegues, repeticiones, rejilla, rank_output = TRUE)
Plots the rankings of a series of regression models for viral load or CD4 counts
viralvis( traindata, semilla, target, viralvars, logbase, pliegues, repeticiones, rejilla )
viralvis( traindata, semilla, target, viralvars, logbase, pliegues, repeticiones, rejilla )
traindata |
A data frame |
semilla |
A numeric value |
target |
A character value |
viralvars |
Vector of variable names related to viral data. |
logbase |
The base for logarithmic transformations. |
pliegues |
A numeric value |
repeticiones |
A numeric value |
rejilla |
A numeric value |
A plot of ranking models
library(tidyverse) library(baguette) library(kernlab) library(kknn) library(ranger) library(rules) library(glmnet) # Define the function to impute values in the undetectable range set.seed(123) impute_undetectable <- function(column) { ifelse(column <= 40, rexp(sum(column <= 40), rate = 1/13) + 1, column) } # Apply the function to all vl columns using purrr's map_dfc library(viraldomain) data("viral", package = "viraldomain") viral_imputed <- viral |> mutate(across(starts_with("vl"), ~impute_undetectable(.x))) traindata <- viral_imputed semilla <- 1501 target <- "cd_2022" viralvars <- c("vl_2019", "vl_2021", "vl_2022") logbase <- 10 pliegues <- 2 repeticiones <- 1 rejilla <- 1 set.seed(123) viralvis(traindata, semilla, target, viralvars, logbase, pliegues, repeticiones, rejilla)
library(tidyverse) library(baguette) library(kernlab) library(kknn) library(ranger) library(rules) library(glmnet) # Define the function to impute values in the undetectable range set.seed(123) impute_undetectable <- function(column) { ifelse(column <= 40, rexp(sum(column <= 40), rate = 1/13) + 1, column) } # Apply the function to all vl columns using purrr's map_dfc library(viraldomain) data("viral", package = "viraldomain") viral_imputed <- viral |> mutate(across(starts_with("vl"), ~impute_undetectable(.x))) traindata <- viral_imputed semilla <- 1501 target <- "cd_2022" viralvars <- c("vl_2019", "vl_2021", "vl_2022") logbase <- 10 pliegues <- 2 repeticiones <- 1 rejilla <- 1 set.seed(123) viralvis(traindata, semilla, target, viralvars, logbase, pliegues, repeticiones, rejilla)