Today I would like to compare the 2 ETL tools in Data Quality –

  • SAP Data Services – BODS and
  • Syniti ADM

for Data Duplication.

Note: I am bringing insights based on my experiences through these tools. This could be a debate, let’s have a discussion 😉

let’s start with our BODS Match technology:

 

BODS Match:

 

In BODS, under data quality, you can find the Base Match transformation:

SAP%20BODS%20Match

SAP BODS Match

We must connect to the source and apply Match Key through this Match transformation and get duplicates…

Data%20Services%20Match%20Transformation

Data Services Match Transformation

 

Syniti Match:

Syniti Match covers this traditional way of finding duplicates and,  as well covers a bit advanced like Antony called Tony with a pet name, Syniti will try to recognize these kinds of data and report.

Syniti’s matching technology delivers better results more quickly when compared to conventional solutions.

 

Conventional BODS Matching Syniti Match
Requires match-ready data which needs significant preprocessing, such as standardized data with consistent schemas. No preprocessing is needed. Bring your data as it is.
Requires significant manual effort from SMEs to assess and remediate data quality problems. Automates manual interpretations, recognizing patterns, non-Latin characters, and spelling differences.
Users must understand the nuances of off-the-shelf algorithms. Utilizes a proprietary phonetic algorithm, specifically built for contact and business data.
Requires data coders or data scientists to implement matching. Create custom matching using a friendly interface with easy drag-and-drop functionality.
Processing is slow. Processing occurs in minutes instead of hours or days.
Matchkeys are the basis for comparison, which reflects errors in the data. Contextual scoring mirrors human-like perception and is much more accurate.

 

For example, the following does NOT produce a match in conventional tools; comparison is based on the match key. Syniti’s Match produces a match for all three records.

 

MATCHKEY: First_Name (3) + Last_Name (3) + Street_Number (4) + ZIP(5)

 

MATCHKEY NAME ADDRESS CITY STATE ZIP
TAMMAY350078746 Tamas Mayer 3500 N Capital of Texas Hwy #230 AUSTIN TX 78746
TOMMOO350078746 Tom Moore 3500 N Capital of Texas Hwy #502 AUSTIN TX 78746
TMOO35078746 Mr. T R Moore 3500 N Capital of Texas Hwy #502 AUSTIN TX 78746

Continued

Syniti’s Matching Technology

Match uses the following technologies as it processes your data to find and score records that are possible matches.

  1. Normalization

When data enters the matching engine, the first step is breaking it into multiple fields. To do this, Match:

  1. Splits up the name.
  2. Pulls company out of address.
  3. Parses concatenated addresses, and so on.

 

diagram%20example%20of%20normalization

diagram example of normalization

  1. Pattern Recognition

Pattern Recognition is another facet ofNormalization, where Match recognizes and either removes or translates the following:

  • Prefixes and suffixes, such as DR or JR.
  • Business words such as INC, LLC, or DBA.
  • Context, such as street, suite, flat, and so forth.
  • Abbreviations, such as Mfg for Manufacturing or ACCT for Accounting.
  • Nicknames, such as Tony vs Anthony.
  1. Transliterate

Match converts global Unicode characters, such as Chinese, into English-Latin characters.

In this example, the Chinese character  means prosperous and is pronounced change, and the Chinese character  means plum and is pronounced li.

diagram%20example%20of%20transliterate

diagram example of transliteration

      4. Phonetic Algorithm

Now that Match has isolated values into separate fields, like first name, last name, company, street, and city, you can generate phonetic translations on these fields to help circumvent errors.

For example, the name Naugton could be misspelled or typed incorrectly, and could likely be the same as a record with the name Naughton.

  1. Grouping

Many business databases have a massive quantity of records. To facilitate working at this scale, Match makes a pass at the data and identifies similar records, creating Candidate Groups.

This recognizes similar records based on multiple datapoints.

Match is not finding matches at this point but is simply identifying good candidates for further comparison. Match can then use these groups to locate records that match but have nothing exactly in common.

For example, Match could look for records with:

  • Last names that match phonetically and the same zip code.
  • Or, last names and street names that match phonetically.diagram%20example%20of%20groupingdiagram example of grouping

    6. Contextual Scoring

    Once Match has aligned data by Candidate Groups, it performs scoring. It compares two records at a time and grades them for similarities.

    • It compares and scores multiple fields individually, such as name, company, address, zip, phone, email, and so on.
    • It establishes an overall similarity score between the two records.
    • The higher the score is, the more confident the system thinks it’s a match.
    • You specify the score threshold and Match presents any records that score above the threshold as a match.

    diagram%20example%20of%20contextual%20scoring

diagram example of contextual scoring

 

Conclusion:

Considering the advanced features like contextual scoring, Grouping, and transliterating, I give a few more marks to Syniti Matching 

 

In the next blog posts, we can discuss more processing.

That’s all about this blog post.

Thanks for reading, please provide your feedback. ?

Happy Learning, see you in my next blog 🙂

 

Sara Sampaio

Sara Sampaio

Author Since: March 10, 2022

0 0 votes
Article Rating
Subscribe
Notify of
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x