Making the Data Quality Rules Discovery Process Easier
EigenRules
#1 Software for auto discovering data quality rules directly from your data. Point EigenRules to the data, wait a few minutes, and out comes the auto-discovered DQ rules in plain English!
What it does:
- Uses AI/ML to understand the expected behavior of your data and gives you the essential data quality rules you MUST have in place to validate it.
- Complement the rules already have with these auto-discovered rules.
- Or, use this to jump start on-boarding new data sets in less than 5 mins.
Benefits
Give these auto-discovered rules to your Subject Matter Experts (SMEs)and they’ll be amazed!
- DataBuck will accelerate SME’s work.
- Reduce time to market for onboarding new data sources or apps.
Ask What ETL Can Do for You and What EigenRules Can Do for You
Streamlines data quality rule discovery process
The SME’s can piggy back on the auto-discovered DQ rules to accelerate their rule discovery
If you already use an ETL tool and you are writing rules, find gaps in your rule set
Augment what you already have with a thorough set of rules
Very quick “time to market”: For every data source, you can cut 3-4 weeks of work to just 15 mins with only 1 resource to discover DQ rules including multicolumn relationships
Examples of Types of Data Quality Rules Auto Discovered
Every data set will have few 100s of essential data quality rules that must be checked to validate data thoroughly. EigenRules will discover rules in all 6 data quality dimensions. Below are examples of the actual rules discovered for a loan data set and printed out by the software in plain English. User gave ZERO inputs as to the meanings and relevance of the columns, EigenRules auto discovers relationships and rules that govern every microsegment of data.
Uniqueness, Loan Number, Cannot be duplicate
Completeness, Loan Closing Date, Cannot be Null
Conformity, Loan Closing Date, Valid Format, yyyyMMdd
Validity, Inter Column Relationships , IF `Property State`=GA AND `Loan Source`=4 AND `Product Type`=1 THEN `Investor Type`=3
Very quick “time to market”: For every data source, you can cut 3-4 weeks of work to just 15 mins with only 1 resource to discover DQ rules including multicolumn relationships
Drift, `Income Documentation`, Acceptable Values 1, 3, 4, 6
Timeliness, First_Payment_Date must be within 90 days of the Loan_Closing_Date
Consistency, Differences between Original_Credit_Score – Current-Credit_Score must have Lower_Limit: -221.2, Upper_Limit:207.2
Accuracy, IF `Property State`=GA AND `Loan Source`=4 AND `Product Type`=1 AND `Investor Type`=3 Then `Unpaid Principal Balance` will have the following range: Lower_Limit:0 Upper_Limit:260,853
Accuracy, IF `Property State`=CT AND `Loan Source`=2 AND `Product Type`=6 AND `Investor Type`=7 Then `Unpaid Principal Balance` will have the following range: Lower_Limit:0 Upper_Limit:1,929,964