f_boxplot now accepts numeric vectors in addition to data.frames and formulas. A single vector like f_boxplot(my_vec) produces one box labelled with the vector name on the y-axis; multiple unnamed vectors like f_boxplot(hp, cyl) produce side-by-side boxes, matching base R's boxplot() convention. A new color argument controls the palette: the default "rainbow" preserves existing behaviour, "bw" gives publication-style white boxes with black lines, outliers and mean marker, a single colour name like "steelblue" applies one hue to all boxes (with a light-tinted fill and darkened outline derived in HSV space), and a vector of colours is recycled for custom per-group palettes. A new boxwidth argument exposes the relative width of each box (passed as boxwex to boxplot()) for finer control over plot appearance.
f_scan now accepts loose numeric vectors in the same spirit as f_boxplot. A single vector like f_scan(disp1) produces a one-group diagnostic dashboard with the vector name carried through as the column label. A formula built from bare vectors works identically to the data.frame form, so f_scan(disp1 + hp1 ~ cyl1) assembles the data.frame internally from the variable names in the formula. A positional shorthand is also supported: f_scan(disp1, cyl1) is equivalent to f_scan(disp1 ~ cyl1), treating the first vector as the response and any additional vectors as grouping variables, with length checks against the response and clear errors on mismatch.
f_summary() gains a show_ci argument (default FALSE) that adds
CI_lower and CI_upper columns, the bounds of a confidence interval for
the mean. The interval is a parametric t-interval, computed as
mean +/- qt(1 - (1 - conf_level)/2, df = n - 1) * se, matching the interval
reported by t.test(). A companion conf_level argument (default 0.95)
sets the confidence level. Groups with fewer than two non-missing
observations return NA bounds.Removed an internal package startup/shutdown file zzz.R that printed a spurious "Package unloaded from:" message on unload. Package loading and unloading are now silent on the rfriend side.
Improved the boxplot explanation in the introduction section ("Understanding Boxplots: A Visual Guide") of the output files from f_boxplot().
f_boxplot() with a formula and explicit data (e.g. f_boxplot(hp ~ cyl, mtcars)) now plots only the response variable named on the LHS of the formula. Previously the LHS was ignored and a plot was generated for every numeric column in data.
f_boxplot() with a formula referencing bare vectors (e.g. f_boxplot(hp1 ~ cyl1)) no longer errors with "argument 'data' is missing, with no default", and the output filename is derived from the formula variables.
check_lhs_is_names() (internal LHS guard) no longer emits a misleading "Expressions on the LHS of the formula are ignored: NULL" warning when called with formula = NULL or with a one-sided formula. This affected any rfriend function accepting a data.frame without a formula (f_boxplot(mtcars), f_summary(mtcars), etc.).
f_summary(), f_scan() and f_outliers() now accept a bare data.frame without requiring columns. When columns is omitted, all numeric columns in data are used (excluding any named in group_vars and, for f_outliers(), id_var). This matches the behaviour added to f_boxplot() in the same release and mirrors base R's summary(mtcars).
f_scan() no longer crashes with "Column All Data not found" on the second response variable when called without group_vars. The dummy grouping column was being added only on the first iteration of a multi-column loop.
The print methods for f_summary() and f_outliers() now show a header naming each response variable when several are summarised. Previously, multi-column calls produced a stack of unlabelled tables.
f_summary() now computes the standard error (se) using the number of non-missing observations rather than the full vector length. Previously a column containing NA values produced a standard error that was biased towards zero, because the NA entries were counted in the denominator sqrt(n). The new confidence interval relies on the same corrected count.
f_model_comparison() has been renamed to f_model_compare(). Please
update any scripts that used the previous name.
f_summary() no longer accepts unquoted column names. Columns must now be
supplied either via a formula (e.g.
f_summary(disp + hp ~ gear + cyl, data = mtcars)) or as quoted character
names passed to the columns argument (e.g. columns = c("disp", "hp")).
This change was required to support the new formula method.
The output_type argument of file-producing functions now defaults to
"default" instead of "off" (or "console"). The new "default" mode
returns an S3 object and lets R decide whether to print: the object is
auto-printed when the call is unassigned, and silent when the result is
assigned to a variable. Set output_type = "console" to force immediate
console printing regardless of assignment. Affects f_aov(),
f_kruskal_test(), f_glm(), f_chisq_test(), f_bestNormalize(),
f_boxcox(), and the new f_lmer(), f_t_test(), f_wilcox_test(),
f_scan() and f_stat_wizard().
The default transformation in f_aov() is now "boxcox" (previously
"bestnormalize"). Box-Cox is faster, easier to back-transform and
sufficient for most ANOVA use cases.
f_lmer() fits linear mixed-effects models using lme4::lmer() with
p-values supplied by lmerTest, and produces a fully formatted report
containing the fixed-effects ANOVA table, random-effects variance
components and ICC, marginal and conditional R-squared (Nakagawa and
Schielzeth), AIC, BIC, log-likelihood, residual and BLUP Q-Q diagnostics,
prominent surfacing of singular-fit and convergence messages, and
emmeans pairwise post hoc on factor fixed effects with compact letter
display. Supports output_type of "console", "pdf", "word",
"excel" and "rmd", mirroring f_aov() and f_kruskal_test(). The
intro section explains LMM assumptions and walks the user through the
(1 | group) random-effects syntax in study-design terms. Denominator
degrees of freedom are selectable via ddf = "Satterthwaite" (default),
"Kenward-Roger" or "lme4".
f_t_test() wraps stats::t.test() with both a formula interface
(y1 + y2 ~ group, supporting multiple responses in sequence) and a
classic vector interface. Supports one-sample, two-sample and paired
tests, adds automated Shapiro-Wilk, Bartlett and Levene diagnostics,
optional Box-Cox or bestNormalize transformation of non-normal responses,
and formatted output to console, pdf, Word, Excel or R Markdown.
f_wilcox_test() wraps stats::wilcox.test() with the same formula and
vector interfaces as f_t_test(). The function explicitly labels and
reports the Hodges-Lehmann pseudo-median (one-sample and paired) or
location shift (two-sample), alongside descriptive sample medians, to
avoid the common "CI for the median" mislabelling found in textbooks and
software output.
f_scan() creates a 3-panel diagnostic dashboard (density, boxplot,
Q-Q) for one or more response columns, optionally split by up to three
grouping variables (colour, facet wrap, facet grid). It returns a
summary table and a Tukey-fence outlier table, and can optionally call
f_stat_wizard() to append a test recommendation for each response.
f_long() converts wide (Excel-style) data to long format in a single
call, selecting measurement columns, keeping ID columns and optionally
renaming categories. Returns an object of class f_long with dedicated
plot() and summary() methods. Extra arguments are forwarded to
tidyr::pivot_longer().
f_stat_wizard() (BETA) analyses your data structure from a formula and
recommends an appropriate statistical test. It detects response type
(binary, count, multinomial, ratio normal or non-normal), checks
normality of residuals and homogeneity of variance, and evaluates
whether a Box-Cox transformation would resolve non-normality. The
recommendation is returned as ready-to-run code using the appropriate
rfriend function as primary code, with a base R fallback. Supports
y ~ ., interaction terms and paired or repeated-measures designs via
id_col. With run = TRUE, the recommended function is executed
automatically.
f_outliers() scans numeric columns for outliers using Tukey's fences
(IQR multiplier configurable via coef), optionally within groups.
Returns a data frame containing only the outlier rows, adds a row_id
column for traceability, and optionally exports to Excel. A formula
interface is supported, e.g. col1 + col2 ~ group1 + group2.
f_remove_outliers() removes rows from a data frame based on the output
of f_outliers() or a custom vector of IDs or row numbers, using safe
anti-join semantics so the original data structure is preserved.
df_to_table() converts a data frame to a base R contingency table.
The label column is auto-detected (first character or factor column, or
meaningful rownames()) but can be specified explicitly. Used
internally by f_chisq_test() and exported for manual use.
Formula interfaces have been added to f_summary(), f_boxplot(),
f_scan(), f_outliers() and f_stat_wizard() via S3 dispatch
(data.frame and formula methods). This makes iterative use very
concise. For example, f_summary(disp + hp ~ gear + cyl, data = mtcars)
summarises disp and hp grouped by gear and cyl.
f_summary() gained show_skew (Skewness, measure of asymmetry) and
show_kurtosis (Excess Kurtosis, measure of tail heaviness).
f_aov() gained a force_aov argument to run ANOVA even when at least
one cell has n = 1 (saturated model). The default (FALSE) skips such
responses with a warning, because F-statistics and p-values are
undefined for saturated models.
f_corplot() has been rewritten. The upper triangle now displays
Pearson r, Spearman rho and Kendall tau simultaneously for every pair.
Ordinal variables are supported via the new ordinal_vars argument:
their diagonal labels are italicised and Pearson r is greyed and
bracketed for any pair that involves them. New arguments
factor_select, factor_exclude, unique_num_treshold and
repeats_threshold give finer control over automatic factor detection.
f_aov() and f_glm() post hoc summary tables now display
back-transformed data where a transformation has been applied. A data
summary table has also been added to both functions.
f_boxplot() now integrates with f_outliers() and can append an
outlier table to the report (new arguments outliers, coef,
limit_columns).
f_chisq_test() now uses the new df_to_table() helper when a data
frame instead of table is supplied, giving clearer messages about which column was used
as row labels.
New plot() methods for objects of class f_kruskal_test, f_lmer,
f_long, f_scan, f_t_test and f_wilcox_test.
New print() methods for f_lmer, f_outliers, f_scan,
f_stat_wizard, f_t_test and f_wilcox_test.
New summary() methods for f_long and f_scan.
New predict() method for f_boxcox, allowing forward transformation
of new values using a fitted f_boxcox object.
The intro text and summary text of f_aov(), f_kruskal_test() and
f_glm() have been reworked to be more user-friendly and consistent
across functions.
f_open_file() has been improved for Linux users.
Formatting of Word output has been updated and is now compatible with LibreOffice Writer (tested on version 24.2.7.2).
New imports: dplyr, gridExtra, lme4, lmerTest, magrittr,
png, rlang and tidyr.
MASS, nnet, pbkrtest, testthat (>= 3.0.0) and tibble have been
added to Suggests. The package now ships a testthat (edition 3) test
suite (Config/testthat/edition: 3).
New internal helpers for formula handling, left-hand-side checking, safe Shapiro-Wilk testing and session-state management.
BREAKING CHANGE: Replaced the output_file and output_dir arguments with a single save_as argument for all file-saving functions.
The save_as argument now controls the full save path (directory, filename and extension).
It accepts relative paths (e.g., "example/filename.pdf") or full paths (e.g., "c:/users/tom/docs/filename.pdf").
If a file extension (like .pdf or .word) is provided, save_as will override the output_type argument using this extension.
Changed the default argument from output_type = "off" to output_type = "console" for f_aov(), f_kruskal_test(), f_glm(), and f_chisq_test(). This ensures results are printed to the console by default, aligning with user expectations.
The arguments show_assumptions_text from f_glm(), kruskal_assumptions_text from f_kruskal_test(), aov_assumptions_text from f_aov() and boxplot_explanation from f_boxplot were all replace by the argument intro_text to have a short and uniform argument.
Added a force_transformation argument to f_aov() to allow transformations on specific response variables (e.g., force_transformation = c("col1", "col2")).
The transformation name (if used) is now added to the f_aov summary table and included as a subscript in the aov call formula.
f_bestNormalize() now applies a transformation even if the input data is already normal. This is to ensure transformations can be applied when the original data is normal but model residuals are not.Fixed an issue where assumption violation warnings from f_aov() were not visible in the final output reports.
Improved several functions to deal better with NA.
Other general minor bug fixes.