Defend Timelines from “Perfect Data”


There is a lot of truth and wisdom in the quote, “The perfect is the enemy of the good.” The quote definitely applies when trying to get the most out of data. However, in this context, it should really be “Perfect is the enemy of budget and time.”

Pragmatic data governance protects your budget and timelines against perfectionism. The end goal is, of course, to have all issues resolved, but being able to start governing the data before the end is vital. To do this, data governance has to be built around the concept of managing “imperfect” data and incrementally improving that data.

A key failure of data governance in almost every organization is the wanting to come to agreement before making data available. A far more practical and useful approach is to bring in data and metadata and then identify and manage where issues reside. Here are some typical types of issues:

  • Differing definitions for attributes—probably the most common issue with reporting
  • Incomplete data
  • Erroneous and stale attributes

A caveat: Never assume that differences in data are errors. These differences are usually between multiple functioning applications, and in most cases, the data is suitable for each system and may actually exist for a very good reason. In some cases, immediately fixing differences in data may not be critical.

To manage data issues, cataloging your data is important. This can be done by collecting the metadata and reporting on issues. This catalog then becomes the place for tracking priorities, issues and resolutions. Even modest capability in this area can greatly improve an organization’s use of its already existing data assets.

Remember, the fastest way to perfect data for your use is to figure out how to make the data a little better on an incremental basis.

By Jeff Klagenberg

Top ten truths about data projects



Money is like data if you invest it, manage it and protect it well it can pay off immensely. But do any poorly and you’ll regret it.

Development methodologies keep changing… mostly in name.

The only thing more expensive than free software is free software implemented by the lowest bidder.

Master Data Management is a transitional state until you get to the fully integrated environment… And once you’re there you’ll need to add another system.

Big data is incredibly valuable, unless someone forgot to govern it.

Agile is great, but knowing your real requirements is better.

If data governance is painful, too slow or too costly its being done wrong.

Choosing the lowest cost integrator is like choosing the cheapest plumber…  Once they’re done it looks great!!The flood comes later…

Data is great but like a teenager it has a tendency to just sit there; it really can be useful at least when it’s finally in motion.

Business logic and data handling are like two parts of epoxy; once they’re mixed you are stuck for a long time.


How much does your data weigh?

WieghingNumbersBusiness Improvement via data metrics

Measurement can be key to improving data. But, there are too many potential measures when it comes to data. Every column, every row, every table, every relationship can be measured. And that does not even get into the possibilities of metadata or data quality. With all these possibilities coming up with a measurement scheme can seem too costly. And without proper focus it will be.

So what to focus on?

The four areas to really need the most focus:

  1. Check if objectives are being met
  2. See how the expected “control points” are changing
  3. Make sure the processes put in place work as intended
  4. Watch to see when sizing and other assumptions will be violated

1. Objective Metrics: Check if objectives are being met

As part of Data Governance it is important the business visit this topic on a regular basis. Here are some examples of objectives I have discussed with clients recently

  • Reducing the time it takes to onboard a new customer / product / location
  • Reduce bounced communications (e.g. mailings, emails, phone calls, …)
  • Improve Customer response (e.g. conversion rate, click throughs, ..)
  • Improve Compliance (e.g. Know The Customer, Physician Spend, Conflict Minerals, …)

There are many other examples I could give, but in all cases this is one of the key areas to measure. As much as possible these items should measured based on historic data so that a baseline can be created to obtain a before and after view.

Any new data governance initiatives (e.g. an MDM or data cleansing implementation) need to have identified requirements expected to be met. As these requirements are developed the corresponding metrics to measure success should also be created. Then the data governance team should review these metrics going forward compared to historic data.

2. “Control Point” Metrics: See how the expected “control points” are changing

“Control points” just refers to the data elements that are expected to actually effect the objectives. For example in the case on on boarding a customer, what are the data elements that would slow down the process. This could be an invalid addresses, duplicate entries in the SFA tool, missing phone number, etc. Each of the potential causes would be a “control point” and should be measured.

Each new data project should included a design that showing what data changes need to occur to meet the requirements / goals. As these designs are created metrics should be identified to measure. Note these may be direct data, i.e. counting the customer records with and without a home phone. Other may be metadata, i.e. counting missing field descriptions for customer data sources. The data governance team should review control point metrics along with the business objective measurements.

3. Process Metrics: Make sure the processes put in place are working as intended

As new processes and systems are put in place it important to measure the activity of these systems. Like the control point metrics the process metrics need to be based on the design work for data projects. These metrics will ensure the design is meeting functional and non-functional requirements. They are a key way on ensuring SLAs are met.

Process metrics are also likely to be specific to underlying technology choices. For example user of the Informatica MDM Hub can use a product like the Hub Analyzer by GlobalSoft ( ). Tools like this can be vital in tracking day to day operations and help in tuning system configuration. Process metrics should be collected and review as early in the development cycle as possible to create baselines. Process metrics should be reviewed by the operation team on a regular basis. The data governance team should track if process metrics are varying unexpectedly.

4. Assumption Metrics: Watch to see when sizing and other assumptions will be violated

As part of the design process key assumptions should be collected. These should also be turned into metrics to ensure that the assumption are met. Collecting and reviewing these metrics will allow more proactive planning if trends so they will be violated at some point. A common example of this is sign assumptions. These metrics should be reviewed by the operations team and the data governance team as any projections show limits begin exceeded.

By focusing on a few metrics in each of these four areas will allow a data governance team to make sure data initiatives are on track and to identify new opportunities.

I am not suggesting these are the only metrics. There should be someone always looking at new potential metrics that are not part of the initial design. For these it is key to take a good “data science” approach and understand what actions the potential metrics suggest. If an action can’t be determined more need to be done.

To help discover new metrics it is best that key data assets be organized in such a way that meta data, data changes and other operations can be measured at points of time in the future. In other words design data repositories, both “Big Data” and “small data”, to be measured as potential “control points” in the future.


How Dense is Your Information?

Big vs small data

Critical understanding to get the most out of Big Data

To appreciate what it takes to get the most out of Big Data let’s look at what Big Data is and “information density”. Information density is the amount of valuable information per byte of data.

What is Big Data?

Big data is typically data from sources that are collecting interactions of people or things.

  • Social data – This data can deliver business information such as: Reviews / Sentiment, Reputational Risks, or Personalization. Social data usually comes in high volume and has Informal structure that requires text analysis and/or natural language processing.
  • Sensor data (Internet of things) – This data can deliver business information such as: alerts for complex automated systems/networks, new services based on personal sensors, or controls for automated factories. Sensor data tends to be very high volume and highly structured. To obtain the business value large set or combinations of set need to be analyzed.

What is “small data” i.e. not Big Data
small data is the data used in typical business processes

  • Transactions – Purchases, orders, registrations, etc.. Transactions tend to be highly structured and of medium to high volume.
  • Master Data (Key entities) – Customers, Employees, Vendors, Products, Assets, Locations, etc.. Master data tends to be structured and of low to medium volume. This data also provides the connection between many data sets.
  • Relationships – The relationship between business entities for example the costumers of a given product or the subsidiary companies of a given business partner.

Relative Information density
Because small data is built for specific business processes Byte for Byte “small data” has more direct value to the business. This is why business applications have focused on this data. It may take less work but governing this data is still not done well in many businesses.

Big data is less dense so more work is needed to obtain value, e.g. processing text to derive business context. Because big data is not typically focused on business processes it also has a higher noise to information ratio and needs more analysis/filtering to obtain business information.

But … Pound for pound there is more big data available
So working on Big Data can add tremendous value, even if it is more work. This is why businesses are so interested in Big Data.

The other But … The value of Big Data is strongest when tied to small data
To really understand what profile of customers have what preferences requires tying together all the master data and transactions in about the customer. Knowing the sentiment around specific products / vendors requires knowing the relationship between your customers and products.

So to obtain the biggest gains form Big Data it is important to realize more work and filtering needs to be done on the Big Data. And that your Big Data needs to be integrated with well governed small data.