Using the CLI

Running pipeline with CLI

Phaser includes a command-line tool that can run an existing pipeline on a new source. As an example, if you have cloned the phaser repository, you can run the EmployeeReviewPipeline in the tests/pipeline directory. Running on the command-line will produce the warnings and errors with row numbers that persist throughout the pipeline.

% cd tests
% python -m phaser run employees ../phaser_output fixture_files/employees.csv
Running pipeline 'EmployeeReviewPipeline'

% cat ~/phaser_output/errors_and_warnings.txt
-------------
Beginning errors and warnings for Validator
-------------
DROPPED_ROW in step drop_rows_with_no_id_and_not_employed, row 3: message: 'DropRowException raised (Employee Garak has no ID and inactive, dropping row)'
-------------
Beginning errors and warnings for Transformer
-------------
WARNING in step consistency_check, row 1: message: 'New field 'Full name' was added to the row_data and not declared a header'
WARNING in step consistency_check, row 1: message: 'New field 'salary' was added to the row_data and not declared a header'
WARNING in step consistency_check, row 1: message: 'New field 'Bonus percent' was added to the row_data and not declared a header'

Showing the ‘diffs’ or changes made by phases or entire pipeline

After running a pipeline and having its output saved in a working directory (for example with the pipeline run command above), the ‘diff’ tool can show exactly what changed with each Phase and over the whole Pipeline, in a table-aware format.

% cd tests
% python -m phaser run employees ~/phaser_output
Diff of source and Validator_output_employees.csv will be saved in ../phaser_output/diff_to_Validator.html
    0 rows added
    1 rows removed
    3 rows changed
    0 rows unchanged
Diff of Validator_output_employees.csv and Transformer_output_employees.csv will be saved in ../phaser_output/diff_to_Transformer.html
    0 rows added
    0 rows removed
    3 rows changed
    0 rows unchanged
Entire pipeline changes in ../phaser_output/diff_pipeline.html
    0 rows added
    1 rows removed
    3 rows changed
    0 rows unchanged

After printing these summary statistics, the diff tool automatically opens an HTML file with links to each diff file formatted as HTML.