Data analysis | Elena Dudukina

Direction of Bias From Nondifferential Misclassification

In this post, I simulate 3 scenarios of non-differential classifications of exposure as demonstrated in Yland et al., 2022: Specificity of exposure 90% and sensitivity 60% Specificity of exposure 90% and sensitivity 70% Specificity of exposure 90% and sensitivity 80% Each of these scenarios will be simulated in 10 thousand datasets of varying sizes: n=100, n=1000, n=10000.

Competing events survival analyses

This is an account of R code by Paloma Rojas-Saunero, PhD available here, which I “translated” to tidyverse code and added some clarifications and figures. This is a longer read, but, hopefully, also is easier to follow for those who use tidyverse.

Chp7 Splines - Chp 8 KNN

Boook club "Hands-On Machine Learning with R"

Using `WeightIt` R package for causal inference analyses

I recently discovered WeightIt R package and was very happy with its functionality and performance. I “delegated” my code computing IPTW to WeightIt and it was faster while producing the same results, as expected.

Iterative visualizations with ggplot2: no more copy-pasting

Are you tired of copy-pasting some chunks of your code over and over again? I am, too. Let’s dig into how we can improve our workflow with a bit of tidy evaluation and writing our own functions to avoid copy-pasting.

Finding your R

A story of how I started using R, struggled, and ultimately found my way and motivation to keep learning and using Rstats

Data simulation and propensity score estimation

In this post, I will play around with simulated data. The things I’ll be doing: Simulating my own dataset with null associations between two different exposures (x1 and x2) and outcomes y1 and y2 for each of exposures (4 exposure-outcome pairs) Computing propensity scores (PS) for each exposure, trimming non-overlapping areas of PS distribution between exposed and unexposed Running several logistic regression models Crude Conventionally adjusted Adjusted with standardized mortality ratio (SMR) weighting using PS Calculating how biased the the estimates are compared with the true (null) effect Data simulation First, I simulate the data for 10 confounders c1-c10, 2 exposures x1 and x2 (with 7% and 20% prevalences, respectively), 2 outcomes (y1 and y2), two exposure predictors c11-c12, and 2 predictors of the outcome c13-c14.

Bootstrapping and plotting 95% confidence bands: 'Causal Inference: What If' Causal Survival Analysis. Parametric g-formula

In this post, I explore parametric g-formula fitting in the causal survival analysis context. I use the machinery of the tidyverse throughout the post and finish with plotting the 95% confidence band around the g-formula fitted survival curve for smokers vs non-smokers (see Chapter 17, Hernán MA, Robins JM (2020).

Bootstrapping and plotting 95% confidence bands: 'Causal Inference: What If' Causal Survival Analysis

In this post, I have a look inside the Chapter 17 on Causal Survival Analysis of the “Causal Inference: What If” book by M. Hernan and J. Robins. I explore IPTW fitting following the chapter’s narrative and use the machinery of the tidyverse throughout.

Using R & tidyverse with publically available Nordic databases

Case study of using R & tidyverse to wrangle and graphically describe publicly available Nordic datasets