What is Data Match and how does it work?

What is Data Match and how does it work?

Data matching is the process of comparing and finding matches and dissimilarities between data that may come from different data sets. The problem it tries to solve is to know if two “entities” are, in fact, the same “entity”.

There are many ways to perform data matching. Often, the process is based on an algorithm or a programmed cycle, where each piece of the data set is identified and compared to each piece of the other data set.

Why is it necessary to achieve data matching?

Data matching can serve many purposes. For example, it is a way to avoid duplicate content. It is also useful in different types of data mining. Data matching can also serve to identify links between two data sets and to validate which one is correct.

What type of data? The most common are transactions, products, payments, purchases, customers, prices, performance, formulas, components, inventory and many others. This type of procedure offers many advantages for validation and subsequent remediation of erroneous data, automation of manual processes, market analysis of products, detection of shortages, sales intelligence and many more as appropriate.

According to Width, companies with poor data workflows have lost around 12% of their revenue due to data quality.

What is it used for?

All industries and sectors can perform these processes in different areas. Some of them are:

Finance and audits: It allows to compare sales transactions, for example, with collections coming from different payment methods and thus corroborate that all sales are being reflected in revenues.

E-commerce: A case of daily use are the platforms that compare prices. They use data matching to locate identical products from different stores, even if they do not have the same description.

Customer data: This procedure can help clean databases, remove duplicates and dirty data, and understand if a customer is the same or some of their data has changed.

Pharmaceutical and laboratories: They can combine records of different formulas or components to detect matching and discrepant elements.

Human resources: It is often used to compare work records with payrolls and verify on a monthly basis that what has been executed matches what has been agreed for each collaborator.

In general, nowadays all industries handle constantly increasing volumes of data of all kinds, and to work without verifying and controlling these operations can represent an important risk in several scenarios. On the contrary, performing such data validations brings considerable savings in time and costs in all cases.


By handling large amounts of data, Data Matching enables more accurate searches and data analysis at a more advanced level and with more reliable and validated results.

It also facilitates the reconciliation of data, the identification of patterns and the finding of irregularities.

In addition, it is a good practice for migrating data between different systems or platforms.

Automated data matching

Data matching can be performed in an automated way using algorithm-based and machine learning tools, such as Conciliac EDM.

In the past, rule-based approaches and approximate matching were used. But today, platforms such as Conciliac provide greater accuracy and extensive data handling possibilities.

With machine learning, there is a powerful architecture that leverages the learning capabilities of algorithms such as language processing, text inference, or multiple matching (one record against many) and provides flexibility in some key categories.

For example, it can be tuned for a specific use. This redefines what a match is on an as-needed basis for a wide range of data types, long integers, product descriptions, SSNs, and many others.

Machine learning, moreover, performs a deeper relationship between data, beyond what can be considered a match in a specific instance. And it allows you to use it with a higher level of accuracy. In addition, extreme cases or false positives are detected.

Another core benefit is that fewer adjustments must be made throughout the product lifecycle and transitioning your model to new data is easier.

How Conciliac’s Data match works

The Conciliac EDM integrated data management platform has a powerful Data Match module that automates these processes in an effective and flexible way. The basic operation it performs is to individualize data from more than one source and search for their equal pairs based on multiple criteria, from the most general to the most specific.

Usually customers use this functionality to reconcile data such as banks, credit cards, stock, sales, customers, payroll, formulas, and operations of all kinds. The usage is very simple and does not require technical knowledge, just by applying the same criteria that a person would use to make these matches, it is enough to reflect those same criteria in the platform. With a couple of clicks we can classify the types of data and indicate to Conciliac what we consider as important keys for the crossing, for example: dates, amounts, description, e-mails, addresses, among others. All this analysis and parameterization is “taught” to the platform the first time we make a matching of two sources. These criteria are then stored in a set of rules that will be automatically applied each time the same type of source is imported.

In addition, it is possible to eliminate duplicates, apply filters, perform searches and replacements, apply coefficients or formulas to complex records and other practical spreadsheet functionalities. The platform also has intelligent features such as:

-A powerful text inference algorithm that allows you to find similar, but not the same terms.

-Multiple records: you can search for a record from one source and match it against several records grouped in another source.

-You can record equivalences and create a dictionary that will be interpreted in the next executions.

These are just some of the main features of a tool that allows you to automate a process of matching or reconciliation of thousands or even millions of data in minutes, something that manually would take hours, days or even become impractical.

If you want to understand how a tool like this can contribute to your organization, please contact us and we will be happy to help.