--- title: "statar" author: "Matthieu Gomez" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Data.frames function} %\VignetteEngine{knitr::rmarkdown} %\usepackage[utf8]{inputenc} --- ### sum_up = summarize `sum_up` prints detailed summary statistics (corresponds to Stata `summarize`) ```R N <- 100 df <- tibble( id = 1:N, v1 = sample(5, N, TRUE), v2 = sample(1e6, N, TRUE) ) sum_up(df) df %>% sum_up(starts_with("v"), d = TRUE) df %>% group_by(v1) %>% sum_up() ``` ## tab = tabulate `tab` prints distinct rows with their count. Compared to the dplyr function `count`, this command adds frequency, percent, and cumulative percent. ```R N <- 1e2 ; K = 10 df <- tibble( id = sample(c(NA,1:5), N/K, TRUE), v1 = sample(1:5, N/K, TRUE) ) tab(df, id) tab(df, id, na.rm = TRUE) tab(df, id, v1) ``` ## join = merge `join` is a wrapper for dplyr merge functionalities, with two added functions - The option `check` checks there are no duplicates in the master or using data.tables (as in Stata). ```r # merge m:1 v1 join(x, y, kind = "full", check = m~1) ``` - The option `gen` specifies the name of a new variable that identifies non matched and matched rows (as in Stata). ```r # merge m:1 v1, gen(_merge) join(x, y, kind = "full", gen = "_merge") ``` - The option `update` allows to update missing values of the master dataset by the value in the using dataset