Package: dqcheckr 0.2.1

dqcheckr: Automated Data Quality Checks for Recurring Dataset Deliveries

Automates quality verification of recurring external dataset deliveries. For each new file arrival, it runs single-snapshot quality checks, compares the file to the previous delivery, writes a self-contained 'HTML' report, and records summary statistics in a local 'SQLite' database for long-term trend tracking. Supports 'CSV' and fixed-width formats. Custom organisation-specific checks can be supplied as plain R files.

Authors:Mick Mioduszewski [aut, cre]

dqcheckr_0.2.1.tar.gz
dqcheckr_0.2.1.zip(r-4.7)dqcheckr_0.2.1.zip(r-4.6)dqcheckr_0.2.1.zip(r-4.5)
dqcheckr_0.2.1.tgz(r-4.6-any)dqcheckr_0.2.1.tgz(r-4.5-any)
dqcheckr_0.2.1.tar.gz(r-4.7-any)dqcheckr_0.2.1.tar.gz(r-4.6-any)
dqcheckr_0.2.1.tgz(r-4.6-emscripten)
manual.pdf |manual.html
DESCRIPTION |NEWS
card.svg |card.png
dqcheckr/json (API)

# Install 'dqcheckr' in R:
install.packages('dqcheckr', repos = c('https://mickmioduszewski.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/mickmioduszewski/dqcheckr/issues

On CRAN:

Conda:

quarto

4.30 score 1 stars 405 downloads 30 exports 75 dependencies

Last updated from:2eac68911b. Checks:7 ERROR, 2 OK. Indexed: yes.

TargetResultTimeFilesSyslog
linux-devel-x86_64ERROR185
source / vignettesOK185
linux-release-x86_64ERROR186
macos-release-arm64ERROR152
macos-oldrel-arm64ERROR158
windows-develERROR152
windows-releaseERROR166
windows-oldrelERROR196
wasm-releaseOK138

Exports:check_allowed_valuescheck_col_countcheck_distinct_countscheck_duplicate_rowscheck_empty_columncheck_inferred_typescheck_key_uniquenesscheck_min_row_countcheck_missing_ratecheck_non_numericcheck_numeric_boundscheck_numeric_statscheck_outlierscheck_patterncheck_row_countcheck_schema_contractcompare_snapshotsdetect_filesdq_resultinfer_col_typelist_snapshotsload_configoverall_statusread_datasetread_recent_snapshotsresolve_col_typerun_comparison_checksrun_custom_checksrun_dq_checkrun_qc_checks

Dependencies:base64encbitbit64blobbslibcachemclicliprcpp11crayonDBIdigestdplyrevaluatefarverfastmapfontawesomefsgenericsggplot2gluegridExtragtablehighrhmshtmltoolsisobandjquerylibjsonlitekableExtraknitrlabelinglaterlifecyclemagrittrmemoisemimepillarpkgconfigprettyunitsprocessxprogresspspurrrquartoR6rappdirsRColorBrewerRcppreadrrlangrmarkdownRSQLiterstudioapiS7sassscalesstringistringrsvglitesystemfontstextshapingtibbletidyrtidyselecttinytextzdbutf8vctrsviridisLitevroomwithrxfunxml2yaml

dqcheckr — Software Specification
Part A — Purpose and Intent | Why this exists | What we are building | Design principles | Part B — Business-Level Description | B.1 How it is used | B.2 What is checked | B.3 What is compared to the previous delivery | B.4 What is produced | Part C — Technical Specification | C.1 Package structure | C.2 Configuration | C.3 File version detection | C.4 Ingestion and whitespace trimming | C.5 Type inference | C.6 Quality checks | C.7 Custom checks | C.8 SQLite schema | C.9 Main entry points | C.10 Error handling | C.11 Dependencies | Part D — Unresolved and Deferred Issues | D.1 Needs design before implementation | D.2 Backlog — agreed | D.3 Known limitations

Last update: 2026-06-07
Started: 2026-06-01

Getting started with dqcheckr
How it works | Installation | Configuration | Global config — dqcheckr.yml | Per-dataset config — .yml | What is required vs optional | Fixed-width files | The quality checks | Single-snapshot checks (QC series) | Schema contract checks (SC series) | Version comparison checks (CP series) | Type inference | Running a check | Calling individual checks | Custom checks | The snapshot database | Worked example — Star Wars dataset | Error handling | Design principles

Last update: 2026-06-01
Started: 2026-06-01

Readme and manuals

Help Manual

Help pageTopics
QC-09: Check for values outside the allowed setcheck_allowed_values
QC-05: Report column countcheck_col_count
QC-08: Report distinct value counts for character columnscheck_distinct_counts
QC-03: Check for fully-duplicate rowscheck_duplicate_rows
QC-02: Check for entirely empty columnscheck_empty_column
QC-06: Report inferred column typescheck_inferred_types
QC-12: Check uniqueness of key column(s)check_key_uniqueness
QC-14: Check row count bounds and optional file sizecheck_min_row_count
QC-01: Check missing rate per columncheck_missing_rate
QC-11: Check non-numeric rate in numeric columnscheck_non_numeric
QC-10: Check for out-of-range numeric valuescheck_numeric_bounds
QC-07: Report numeric summary statisticscheck_numeric_stats
QC-15: Detect statistical outliers in numeric columnscheck_outliers
QC-13: Check values against a regex patterncheck_pattern
QC-04: Report row countcheck_row_count
SC-01 / SC-02: Check columns against the expected schema contractcheck_schema_contract
Compare two snapshots from the SQLite databasecompare_snapshots
Detect current and previous dataset filesdetect_files
Construct a data quality result objectdq_result
Infer the logical type of a character columninfer_col_type
List snapshots available in the databaselist_snapshots
Load and merge dataset configurationload_config
Compute the worst status across a list of dq_result objectsoverall_status
Read a dataset file into a data frameread_dataset
Read recent snapshot history from the SQLite databaseread_recent_snapshots
Resolve the effective type of a column, respecting config overridesresolve_col_type
Run all version comparison checks between two dataset snapshotsrun_comparison_checks
Run organisation-specific custom checksrun_custom_checks
Run a full data quality check pipelinerun_dq_check
Run all generic quality checks on a datasetrun_qc_checks