Secrets of Operational MDM – Part 3 : Consuming Systems

In my previous posts Secrets of Operational MDM – Part 1 : Choosing System Behaviors and Secrets of Operational MDM – Part 2 : Contributing Systems we categorized systems connected to our MDM environment as Consumers, Managers and/or Creators of master data. We also looked at what how these system should contribute to an MDM environment. Now let’s look at these systems should consume data from MDM.

For consumption there is another very important system characteristic to consider. That is if the consuming system can handle merging of records.  While ability to group and “merge” and “unmerge” records is one of the key benefits of MDM it is not necessarily a strength or need of operational systems.  The real power an MDM tool is supporting integration on either the merged (consolidated) or the unmerged (unconsolidated) record.

So two additional categories to introduce are merge aware and non-merge aware systems.

A merge aware system can be informed that multiple entities it contains are actually duplicates and will correspondingly merge those entities in its own operational store.  Be cautious here, errors in merge are possible;  so it is important to understand and design for the unmerge case as well.

For many systems it is not practical nor appropriate to be merge aware.  Operational systems manage their own independent transactional data.  If a duplicate mastered entity ends up being created and some transactions are recorded against both entities. It may be cumbersome and even violate some business rules to force these to entities merge. So it can it is vital to support the ability to find their own independent records in many cases.

To best support both merge aware and non-merge aware integration it is best to have the following IDs

    • Consolidated ID that uniquely identifies any existing merged entity
      • Note a given Consolidated ID will disappear if it is merged and re-appear if unmegred
    • Global ID” that uniquely identifies every record that is loaded in the MDM
      • Note a Global ID is managed and independent of the contributing source and should never change
    • Source Key is a source specific key uniquely identifies records contributed to the MDM


Merge Aware: Systems that only consume data from and can accommodate merging for mastered entities simply need to contain a Consolidated ID for each master entity.  And also register to receive notification of any merge / unmerge.  To cover the unmerge case this usually requires keeping the Global ID Identifiers as well.

Non-Merge Aware: In this case the consuming system still needs to get an ID but since it will not be updated in the case of a merge so utilizing a Global ID is the best practice.


Merge Aware: Systems that manage data from and can accommodate merging for mastered entities simply need to contain Consolidated ID and its Source Key for the record it contributed.  Also these system will need to receive any notification of merges / unmerges.  Since managers will have their own IDs for their contributed records these can be used to resolve unmerges from MDM.

Non-Merge Aware: In this case the managing system could just use its own Source Key.  However it is a good idea to use a Global ID which would be consistent across sources.  This is particularly helpful when data from these systems is brought into a data lake / warehouse.


In all cases any system tat can create master data should first search the MDM environment prior to actually creating a record.  This “search before create” simply reduces the number of potential duplicates and the amount of manual entry. This is true of merge aware and non-merge aware systems

Other than the addition of search before create creators should consume data form MDM environments the same way that managing system do.

One of the major and costly errors I’ve seen in implementations is assuming that systems should always want to integrate at the consolidated level.  If this ends up being desired make certain to validate update and unmerge behavior.

In terms of making consuming application merge aware or non-merge aware the guiding principle needs to be the operational benefit. Getting a insight based on merging master data entities  is very important.  But a great MDM environment goes beyond this and also allows systems to operationally efficient and work on unconsolidated data as desired.

Secrets of Operational MDM – Part 2 : Contributing Systems

In my previous post Secrets of Operational MDM – Part 1 : Choosing System Behaviors, we categorized systems connected to our MDM environment as Consumers, Managers and Creators of master data. We know that systems often have different sets of behaviors for different entities, and can have multiple behaviors for a single master data entity. For example one system may consume and manage customer data but create and manage product data. These behaviors will guide the best way to integrate these systems. In particular, we will look how each system can best contribute master data into the MDM environment—we’re not looking at the data model, but rather understanding which systems will contribute records to be mastered.

Consumers – Systems that only consume master data without the ability to create or edit master data, by design, do not contribute any records to your MDM tool. Since these systems do not provide any new master data information, they would not be used to contribute any master data entities. Bringing data into MDM that is consumed by these systems would just add to record volumes without contributing any new information. These system often do have transactional data that is important, but that data can be integrated outside the MDM environment.

Managers – Some systems manage master data but do not actually create new master data entities. They will also consume master data, since they’d have nothing to manage otherwise :). From these systems you should bring into MDM all master data entities actually edited or augmented by these systems. If possible, avoid bringing in records that the managing system has simply consumed and not touched. This, incidentally, is an example of a system exhibiting multiple behaviors (consumption and management) for a single entity type.

Creators – Finally, there are systems that will actually create new master data entities. Usually these will also manage/edit entities, and will sometimes consume entities. As with managing systems, master data entities that are created or updated by these systems should be added to your MDM tool—but any entities simply consumed should not be contributed. For the creation of new entities in these systems, adding a search before create capability will avoid the creation of unnecessary duplicate records.

One additional note: editing data takes effort, so it is very unusual for operational systems to simply make “bad” edits to master data entities; data is typically only edited to support some specific need. Of course sometimes these needs are not shared, and other systems may see this as “bad” data. However, the ability to see the all operational versions of master data is almost always helpful. This if often truest for the systems that have edited/impacted master data the most, even if data from those systems aren’t “trusted”. Configuring survivorship/trust rules in MDM allows this data from managers and creators to be brought into MDM in order to get the most value from the data while preventing any undesired changes to the trusted master data.

The power of MDM is that it allows each operational system to keep data fit for a specific purpose, while enabling sharing the same entities across different systems. To do this, you really want each and every version of you master data entity to be available. The best news is that if you get this right, it will also improve business intelligence at the same time. Don’t forget insights are great, but actions are better. With proper configuration of operational MDM insightful actions become possible.

The next and final installment of Secrets of Operational MDM will look specifically at how operational systems should consume data from your MDM environment.

Secrets of Operational MDM – Part 1 : Choosing System Behaviors

Ok I’m probably a bit crazy but despite all the angst, minutia and roadblocks I really enjoy the work in getting the most out of data. And while a lot of focus is currently toward analytics implementing Master Data Management (MDM) to improve operational systems is a very useful first step. This yields near term benefits for days to day operations AND improves downstream analytics.

In fact I enjoy this enough that I decided to write a three article series on Operational MDM. In each article I will discuss a key architectural concepts that make implementations more successful. They are:

  1. Categorizing systems as consumers, creators and managers of data entities
  2. Understand how each should contribute to the mastered entity
  3. Determine at what level consumers should integrate with the MDM system


A very simplified example, if you buy something in a store that has a loyalty program you walk up to a cash register hand someone something to buy and start a wonderful set off processes,  They need to figure out who you are for the payment system, and for the loyalty system.  The inventory control system wants to know the product as does the loyalty system (to figure out offered for you in the future).   All of these systems are maintaining transactional information and most have independent ways for they customer sand product information to be managed.

Each of these systems provides a vital business function and is likely doing it just fine today.  But often adding new functionality or understanding all the transactions associated with an entity across these systems is tedious and rife with errors.  This is where Operational MDM helps. And it needs to provide the ability to ensure each system’s versions of an entity can be harmonized while being flexible enough not to minimize disruption to those systems.  And oh by the way need to allow that other legacy system from next month’s merger to be on-boarded…

The first step to doing this is look at all the systems that touch the master data and determine what they do with it.  The main behaviors are

  • Consumers – Systems that need to understand a entity (e.g. customer, product, …) exists to identify the entity and capture transactional information about that entity
  • Managers – Systems that allow updating of the information about an entity
  • Creators – Systems that collect information about an entity that is currently unknown and then create the entity so that future transactions can be created

These are not mutually exclusive, and for example it is pretty common that a system that creates an entity also manages and consumes them.  Also capture the behaviors a system has today and consider what they should be after implementing MDM.  Often as part of designing and building an Operational MDM system it is desirable to start moving systems that create, manage and consume entity data to primarily being consumers.

For Managers and Creators of entities it is also important to note what specific types or sub types of entities they operate on and what process step makes them do so. This becomes critical information when trying to improve the workflow and improve operational effectiveness. By clearly categorizing the integrated systems by their interaction with master data you will be prepared to implement best practice integration decisions on how they will find and use entity data from you MDM implementation.

In the next posting I will focus on how each of these type of systems need to contribute to the MDM.

Truth from Data? : Would we choose if it we knew...

Avoiding Lies and Clicking on TruthHave you heard the news about Fake News?…  Of course you have.  And despite all the fervor around it not only continues to exists  but thrive.  Because of one simple fact, it quite literally pays.

You see those news items that get clicked on the most end up getting revenue.  And the more attention the more it drives those clicks.  Fake news is simply created to capture those clicks. Different stories crafted with words, memes, images and people that cry out to click me!!

Though it seems like an art form there is also a lot of science behind it.  For example marketing research companies study subject lines that will cause email to more likely be read.  Studying the click behavior of many different populations.  All aimed at finding the most effective communication to reach your audience.  Unfortunately Fake News uses this science making falsehoods payoff but not caring about truth.  Can data science make truth more valuable?

I’m not sure we can get there today but it is conceivable, and here’s how.

  1. Build a “Truth Grader” that reads ahead “news articles” that appear in your browser
  2. Then the grader breaks the article into opinion vs fact, capturing the fact / opinion ratio
  3. Next the grader tests the facts against trusted sources and determines an overall fact score
  4. Finally the Truth Grader puts a code based on the strength of facts with rill over details for the reader

Technically we’re actually not as far off as it may sound.  Step 1 and 4 are relatively straight forward, Step 3 maybe possible with a Watson like interface. Though determining “trusted sources” would likely be an area of much “discussion”.  It’s step 2 that would currently be the only real blocker.  And of course this gets harder with video and pictures

Of course the real question is would knowing that some juicy sounding story was probably fake stop us from clicking on it…  I am hopeful we’d teach the web we do value truth.  But at least we’d have the data to know.

SaaS 2.0 time for an Uncloud™ approach

SaaS has been around for a while so now it’s important to start thinking about SaaS 2.0.  SaaS 1.0 allows applications to be run and maintained by a SaaS provider in a remote data center on behalf of your organization.  This gives your business users the agility to quickly obtain the benefits of these new software packages while reducing the requirement of your IT build and maintain expertise needed to manage the applications.

However as organizations change, grow rapidly, merge, or divest they require even more agility.  They need to be able to seamlessly move data in and out of, and integrate across these SaaS applications.  This is exacerbated by the number of different SaaS solutions organizations adopt.  Depending on the data movement, privacy requirements, and the IT skills of an organization where the actual software is deployed matters.  Often it can make more sense to deploy inside your network where you have more bandwidth and lower latency.

Currently there are advocates to take a hybrid cloud approach which marks the beginning of this. The hybrid approach is really about combining traditional on premise and cloud solutions. However by using available automation for deploying and maintaining systems as well as current containers like Docker SaaS vendors can maintain their software within your network. This is really starting an UncloudTM  approach that I see defining SaaS 2.0 going forward.  With an UncloudTM  approach SaaS 2.0 vendors will maintain and manage applications that may be in the cloud or in your data center, and applications can be migrated freely between the two locations as needed.

With an UncloudTM approach you will gain the advantage of having your different SaaS products in the same network enhancing connectivity and allowing vendors to manage the software.  Allowing you to decide of what SaaS vendors to use and separately decide where to have your data.

Defend Timelines from “Perfect Data”


There is a lot of truth and wisdom in the quote, “The perfect is the enemy of the good.” The quote definitely applies when trying to get the most out of data. However, in this context, it should really be “Perfect is the enemy of budget and time.”

Pragmatic data governance protects your budget and timelines against perfectionism. The end goal is, of course, to have all issues resolved, but being able to start governing the data before the end is vital. To do this, data governance has to be built around the concept of managing “imperfect” data and incrementally improving that data.

A key failure of data governance in almost every organization is the wanting to come to agreement before making data available. A far more practical and useful approach is to bring in data and metadata and then identify and manage where issues reside. Here are some typical types of issues:

  • Differing definitions for attributes—probably the most common issue with reporting
  • Incomplete data
  • Erroneous and stale attributes

A caveat: Never assume that differences in data are errors. These differences are usually between multiple functioning applications, and in most cases, the data is suitable for each system and may actually exist for a very good reason. In some cases, immediately fixing differences in data may not be critical.

To manage data issues, cataloging your data is important. This can be done by collecting the metadata and reporting on issues. This catalog then becomes the place for tracking priorities, issues and resolutions. Even modest capability in this area can greatly improve an organization’s use of its already existing data assets.

Remember, the fastest way to perfect data for your use is to figure out how to make the data a little better on an incremental basis.

By Jeff Klagenberg

Top Ten Thoughts for Product Development

There is nothing that a framework can offer that can’t be gained with Java and a compiler, unless you count speed to market and development time.

The magic triangle is made of Scalability, Performance & Security. Oh and Reliability, Cost, Availability, Supportability, etc. OK not so much a magic triangle as a faustian circle….

Quality is never job 1, figuring out how to deliver quality while maintaining speed to market will save your support margins though.

Building a cloud product means never having to say sorry, because by then your customer has moved on.

There is nothing more expensive than a cheap plumber….

Its about all the data, big small and in-between.

Once you’ve decided where to combine business logic and data figure out who owns the underlying technology you’re depending on; it’s always good to know your new overlords.

Like Crack the first taste of The Cloud is often free…  Unlike Crack there is not a good treatment option to get off The Cloud…

When you build an API make sure your own product team actually uses it.

It is always important to catch the waves of new technology that drive business, it is also important not to be left high and dry when those waves recede…

Top ten truths about data projects



Money is like data if you invest it, manage it and protect it well it can pay off immensely. But do any poorly and you’ll regret it.

Development methodologies keep changing… mostly in name.

The only thing more expensive than free software is free software implemented by the lowest bidder.

Master Data Management is a transitional state until you get to the fully integrated environment… And once you’re there you’ll need to add another system.

Big data is incredibly valuable, unless someone forgot to govern it.

Agile is great, but knowing your real requirements is better.

If data governance is painful, too slow or too costly its being done wrong.

Choosing the lowest cost integrator is like choosing the cheapest plumber…  Once they’re done it looks great!!The flood comes later…

Data is great but like a teenager it has a tendency to just sit there; it really can be useful at least when it’s finally in motion.

Business logic and data handling are like two parts of epoxy; once they’re mixed you are stuck for a long time.


How much does your data weigh?

WieghingNumbersBusiness Improvement via data metrics

Measurement can be key to improving data. But, there are too many potential measures when it comes to data. Every column, every row, every table, every relationship can be measured. And that does not even get into the possibilities of metadata or data quality. With all these possibilities coming up with a measurement scheme can seem too costly. And without proper focus it will be.

So what to focus on?

The four areas to really need the most focus:

  1. Check if objectives are being met
  2. See how the expected “control points” are changing
  3. Make sure the processes put in place work as intended
  4. Watch to see when sizing and other assumptions will be violated

1. Objective Metrics: Check if objectives are being met

As part of Data Governance it is important the business visit this topic on a regular basis. Here are some examples of objectives I have discussed with clients recently

  • Reducing the time it takes to onboard a new customer / product / location
  • Reduce bounced communications (e.g. mailings, emails, phone calls, …)
  • Improve Customer response (e.g. conversion rate, click throughs, ..)
  • Improve Compliance (e.g. Know The Customer, Physician Spend, Conflict Minerals, …)

There are many other examples I could give, but in all cases this is one of the key areas to measure. As much as possible these items should measured based on historic data so that a baseline can be created to obtain a before and after view.

Any new data governance initiatives (e.g. an MDM or data cleansing implementation) need to have identified requirements expected to be met. As these requirements are developed the corresponding metrics to measure success should also be created. Then the data governance team should review these metrics going forward compared to historic data.

2. “Control Point” Metrics: See how the expected “control points” are changing

“Control points” just refers to the data elements that are expected to actually effect the objectives. For example in the case on on boarding a customer, what are the data elements that would slow down the process. This could be an invalid addresses, duplicate entries in the SFA tool, missing phone number, etc. Each of the potential causes would be a “control point” and should be measured.

Each new data project should included a design that showing what data changes need to occur to meet the requirements / goals. As these designs are created metrics should be identified to measure. Note these may be direct data, i.e. counting the customer records with and without a home phone. Other may be metadata, i.e. counting missing field descriptions for customer data sources. The data governance team should review control point metrics along with the business objective measurements.

3. Process Metrics: Make sure the processes put in place are working as intended

As new processes and systems are put in place it important to measure the activity of these systems. Like the control point metrics the process metrics need to be based on the design work for data projects. These metrics will ensure the design is meeting functional and non-functional requirements. They are a key way on ensuring SLAs are met.

Process metrics are also likely to be specific to underlying technology choices. For example user of the Informatica MDM Hub can use a product like the Hub Analyzer by GlobalSoft ( ). Tools like this can be vital in tracking day to day operations and help in tuning system configuration. Process metrics should be collected and review as early in the development cycle as possible to create baselines. Process metrics should be reviewed by the operation team on a regular basis. The data governance team should track if process metrics are varying unexpectedly.

4. Assumption Metrics: Watch to see when sizing and other assumptions will be violated

As part of the design process key assumptions should be collected. These should also be turned into metrics to ensure that the assumption are met. Collecting and reviewing these metrics will allow more proactive planning if trends so they will be violated at some point. A common example of this is sign assumptions. These metrics should be reviewed by the operations team and the data governance team as any projections show limits begin exceeded.

By focusing on a few metrics in each of these four areas will allow a data governance team to make sure data initiatives are on track and to identify new opportunities.

I am not suggesting these are the only metrics. There should be someone always looking at new potential metrics that are not part of the initial design. For these it is key to take a good “data science” approach and understand what actions the potential metrics suggest. If an action can’t be determined more need to be done.

To help discover new metrics it is best that key data assets be organized in such a way that meta data, data changes and other operations can be measured at points of time in the future. In other words design data repositories, both “Big Data” and “small data”, to be measured as potential “control points” in the future.


How Dense is Your Information?

Big vs small data

Critical understanding to get the most out of Big Data

To appreciate what it takes to get the most out of Big Data let’s look at what Big Data is and “information density”. Information density is the amount of valuable information per byte of data.

What is Big Data?

Big data is typically data from sources that are collecting interactions of people or things.

  • Social data – This data can deliver business information such as: Reviews / Sentiment, Reputational Risks, or Personalization. Social data usually comes in high volume and has Informal structure that requires text analysis and/or natural language processing.
  • Sensor data (Internet of things) – This data can deliver business information such as: alerts for complex automated systems/networks, new services based on personal sensors, or controls for automated factories. Sensor data tends to be very high volume and highly structured. To obtain the business value large set or combinations of set need to be analyzed.

What is “small data” i.e. not Big Data
small data is the data used in typical business processes

  • Transactions – Purchases, orders, registrations, etc.. Transactions tend to be highly structured and of medium to high volume.
  • Master Data (Key entities) – Customers, Employees, Vendors, Products, Assets, Locations, etc.. Master data tends to be structured and of low to medium volume. This data also provides the connection between many data sets.
  • Relationships – The relationship between business entities for example the costumers of a given product or the subsidiary companies of a given business partner.

Relative Information density
Because small data is built for specific business processes Byte for Byte “small data” has more direct value to the business. This is why business applications have focused on this data. It may take less work but governing this data is still not done well in many businesses.

Big data is less dense so more work is needed to obtain value, e.g. processing text to derive business context. Because big data is not typically focused on business processes it also has a higher noise to information ratio and needs more analysis/filtering to obtain business information.

But … Pound for pound there is more big data available
So working on Big Data can add tremendous value, even if it is more work. This is why businesses are so interested in Big Data.

The other But … The value of Big Data is strongest when tied to small data
To really understand what profile of customers have what preferences requires tying together all the master data and transactions in about the customer. Knowing the sentiment around specific products / vendors requires knowing the relationship between your customers and products.

So to obtain the biggest gains form Big Data it is important to realize more work and filtering needs to be done on the Big Data. And that your Big Data needs to be integrated with well governed small data.