Secrets of Operational MDM – Part 1 : Choosing System Behaviors

Ok I’m probably a bit crazy but despite all the angst, minutia and roadblocks I really enjoy the work in getting the most out of data. And while a lot of focus is currently toward analytics implementing Master Data Management (MDM) to improve operational systems is a very useful first step. This yields near term benefits for days to day operations AND improves downstream analytics.

In fact I enjoy this enough that I decided to write a three article series on Operational MDM. In each article I will discuss a key architectural concepts that make implementations more successful. They are:

  1. Categorizing systems as consumers, creators and managers of data entities
  2. Understand how each should contribute to the mastered entity
  3. Determine at what level consumers should integrate with the MDM system


A very simplified example, if you buy something in a store that has a loyalty program you walk up to a cash register hand someone something to buy and start a wonderful set off processes,  They need to figure out who you are for the payment system, and for the loyalty system.  The inventory control system wants to know the product as does the loyalty system (to figure out offered for you in the future).   All of these systems are maintaining transactional information and most have independent ways for they customer sand product information to be managed.

Each of these systems provides a vital business function and is likely doing it just fine today.  But often adding new functionality or understanding all the transactions associated with an entity across these systems is tedious and rife with errors.  This is where Operational MDM helps. And it needs to provide the ability to ensure each system’s versions of an entity can be harmonized while being flexible enough not to minimize disruption to those systems.  And oh by the way need to allow that other legacy system from next month’s merger to be on-boarded…

The first step to doing this is look at all the systems that touch the master data and determine what they do with it.  The main behaviors are

  • Consumers – Systems that need to understand a entity (e.g. customer, product, …) exists to identify the entity and capture transactional information about that entity
  • Managers – Systems that allow updating of the information about an entity
  • Creators – Systems that collect information about an entity that is currently unknown and then create the entity so that future transactions can be created

These are not mutually exclusive, and for example it is pretty common that a system that creates an entity also manages and consumes them.  Also capture the behaviors a system has today and consider what they should be after implementing MDM.  Often as part of designing and building an Operational MDM system it is desirable to start moving systems that create, manage and consume entity data to primarily being consumers.

For Managers and Creators of entities it is also important to note what specific types or sub types of entities they operate on and what process step makes them do so. This becomes critical information when trying to improve the workflow and improve operational effectiveness. By clearly categorizing the integrated systems by their interaction with master data you will be prepared to implement best practice integration decisions on how they will find and use entity data from you MDM implementation.

In the next posting I will focus on how each of these type of systems need to contribute to the MDM.

Top ten truths about data projects



Money is like data if you invest it, manage it and protect it well it can pay off immensely. But do any poorly and you’ll regret it.

Development methodologies keep changing… mostly in name.

The only thing more expensive than free software is free software implemented by the lowest bidder.

Master Data Management is a transitional state until you get to the fully integrated environment… And once you’re there you’ll need to add another system.

Big data is incredibly valuable, unless someone forgot to govern it.

Agile is great, but knowing your real requirements is better.

If data governance is painful, too slow or too costly its being done wrong.

Choosing the lowest cost integrator is like choosing the cheapest plumber…  Once they’re done it looks great!!The flood comes later…

Data is great but like a teenager it has a tendency to just sit there; it really can be useful at least when it’s finally in motion.

Business logic and data handling are like two parts of epoxy; once they’re mixed you are stuck for a long time.


How much does your data weigh?

WieghingNumbersBusiness Improvement via data metrics

Measurement can be key to improving data. But, there are too many potential measures when it comes to data. Every column, every row, every table, every relationship can be measured. And that does not even get into the possibilities of metadata or data quality. With all these possibilities coming up with a measurement scheme can seem too costly. And without proper focus it will be.

So what to focus on?

The four areas to really need the most focus:

  1. Check if objectives are being met
  2. See how the expected “control points” are changing
  3. Make sure the processes put in place work as intended
  4. Watch to see when sizing and other assumptions will be violated

1. Objective Metrics: Check if objectives are being met

As part of Data Governance it is important the business visit this topic on a regular basis. Here are some examples of objectives I have discussed with clients recently

  • Reducing the time it takes to onboard a new customer / product / location
  • Reduce bounced communications (e.g. mailings, emails, phone calls, …)
  • Improve Customer response (e.g. conversion rate, click throughs, ..)
  • Improve Compliance (e.g. Know The Customer, Physician Spend, Conflict Minerals, …)

There are many other examples I could give, but in all cases this is one of the key areas to measure. As much as possible these items should measured based on historic data so that a baseline can be created to obtain a before and after view.

Any new data governance initiatives (e.g. an MDM or data cleansing implementation) need to have identified requirements expected to be met. As these requirements are developed the corresponding metrics to measure success should also be created. Then the data governance team should review these metrics going forward compared to historic data.

2. “Control Point” Metrics: See how the expected “control points” are changing

“Control points” just refers to the data elements that are expected to actually effect the objectives. For example in the case on on boarding a customer, what are the data elements that would slow down the process. This could be an invalid addresses, duplicate entries in the SFA tool, missing phone number, etc. Each of the potential causes would be a “control point” and should be measured.

Each new data project should included a design that showing what data changes need to occur to meet the requirements / goals. As these designs are created metrics should be identified to measure. Note these may be direct data, i.e. counting the customer records with and without a home phone. Other may be metadata, i.e. counting missing field descriptions for customer data sources. The data governance team should review control point metrics along with the business objective measurements.

3. Process Metrics: Make sure the processes put in place are working as intended

As new processes and systems are put in place it important to measure the activity of these systems. Like the control point metrics the process metrics need to be based on the design work for data projects. These metrics will ensure the design is meeting functional and non-functional requirements. They are a key way on ensuring SLAs are met.

Process metrics are also likely to be specific to underlying technology choices. For example user of the Informatica MDM Hub can use a product like the Hub Analyzer by GlobalSoft ( ). Tools like this can be vital in tracking day to day operations and help in tuning system configuration. Process metrics should be collected and review as early in the development cycle as possible to create baselines. Process metrics should be reviewed by the operation team on a regular basis. The data governance team should track if process metrics are varying unexpectedly.

4. Assumption Metrics: Watch to see when sizing and other assumptions will be violated

As part of the design process key assumptions should be collected. These should also be turned into metrics to ensure that the assumption are met. Collecting and reviewing these metrics will allow more proactive planning if trends so they will be violated at some point. A common example of this is sign assumptions. These metrics should be reviewed by the operations team and the data governance team as any projections show limits begin exceeded.

By focusing on a few metrics in each of these four areas will allow a data governance team to make sure data initiatives are on track and to identify new opportunities.

I am not suggesting these are the only metrics. There should be someone always looking at new potential metrics that are not part of the initial design. For these it is key to take a good “data science” approach and understand what actions the potential metrics suggest. If an action can’t be determined more need to be done.

To help discover new metrics it is best that key data assets be organized in such a way that meta data, data changes and other operations can be measured at points of time in the future. In other words design data repositories, both “Big Data” and “small data”, to be measured as potential “control points” in the future.


Zero Wait Information ≠ Real time


How to Choose the Right Data Movement:  Real-time or Batch?

We all want a “zero waitinfrastructure.  This has spurred many organizations to push all data through a real-time infrastructure.  It’s important to recognize that “zero wait” means that the information is in ready form when a user needs it, so if the user needs information that includes averages, sums, and/or comparisons, there is a natural need to have a data set that has been fully processed (e.g., cleaned, combined, augmented, etc.).  Building the data infrastructure with this in mind is very important.

The popular point of view is that real-time processing is the “modern” solution and that batch processing is the “archaic” way.  However, real-time processing has also been around for a long time, and each mode of processing exist for different purposes.

One trade-off between real-time and batch processing is high throughput versus low latency.  Choosing one process over the other can be somewhat counterintuitive for the broader team, so it is important to determine what the throughput and latency requirements are, independently of each other.  A great example of throughput versus latency is the question, “What is the best way to get from Boston to San Francisco?”  You might answer, “By plane.”  That would be true for transporting a small group of people at a time as that would result in the lowest latency, but would by plane be the best way to move a million people at once?  How would you get the highest throughput?

Real-time processing is very good for collecting input continuously and responding immediately to a user, but it is not the solution for all data movement.  It’s not even necessarily the fastest mode of processing.  When deciding whether data should be moved in real time or in batch, it is important to define the nature of the business need and the method of data acquisition.