Marketo Merge Leads in Bulk using the API

This post will walk you through how to use the Marketo merge leads REST API endpoint to merge leads in bulk with Python code that will allow you to set custom rules for how you want to prioritize conflicting values for the same fields on different leads.

If it is your first time using the Marketo REST API or you need a quick refresher then check out the Marketo API Quick-Start Guide to see how to make your first Marketo REST API requests in Postman before transitioning to making requests in code or in the Zapier automation tool.

You can find the main Python script and all the supporting functions linked in the paragraphs below in the Bulk Merge Github repository.

Take a look at the video walkthrough I gave my colleagues to get an in-depth look at how the Python script is set up and if you find it useful then please give it a like 🙂

Supporting Functions

Getting Access Token

The getToken function makes a get request to the token endpoint, passing the client id and client secret, to return the API access token, and the length of time that this token will be valid for.

Marketo Merge Leads

The mergeLead function makes a post request to the Marketo merge leads endpoint and takes in the winning lead id, an array of losing lead ids, and a boolean value to indicate whether or not the leads should be merged in your CRM tool e.g. Salesforce.

When the “mergeInCRM” parameter is set to false the winning lead and all losing lead ids can be merged in a single call. However, when this parameter is set to true Marketo is restricted to merging only 2 leads at a time so if there is a triple duplicate i.e. three leads with the same email address, then the winner must first be merged with one of the losers and once this is complete the new resultant merged lead must then be merged with the other loser. Therefore, to achieve this functionality a for loop is used to iterate through each of the loser ids and successively merge them with the winning id.

Rules for Winning Field Values

Choosing winning field values when merging leads in Marketo


Choosing winning field values when merging leads in Marketo

When manually merging leads in Marketo with discrepancies between the lead field values, the interface allows you to choose which values you want to win and therefore keep on the resultant merged lead. The script does the exact same thing, using custom rules specified by you to determine which values are prioritized for lead fields in a merge.

To do this it uses the following functions to compare the disparate values of the leads for the same field.

Get First Created Date

The createdAt function returns the earliest created at date and the list index at which it occurs

Get Maximum Lead Score

The leadScore function returns the maximum score and the list index at which it occurs

Get First Non-Null Value

The notNull function returns the first non-null value to appear in the list, excluding null-like values such as “empty”, “unknown”, “n/a”, and ‘none’. If no such values exist this function returns the value and index of the first list element.

Define Field Value Rankings

The priority function makes use of a dictionary, which contains a list for each field where the list is sorted from values of highest priority to lowest priority e.g. 'leadSource': ["Advertising", "Paid Search", "Organic"].

The list of prioritized values is then iterated through starting from the highest priority value e.g. “Advertising”, and each value from the prioritized list is compared to each lead value. If either of the lead values match the prioritized value then the value and index of this lead value are returned.

Else the for loop continues with the next highest priority value to see if this is present in the lead values. In this way, values for a lead field can be prioritized according to a custom user-specified order so that these values will be retained on the merged lead.

Get First False Value

The boolTest function was built for the “Unsubscribed” and “Account Blocked” boolean fields where the priority was to retain a TRUE value for either of these fields if they existed e.g. you do not want to email the merged lead if one of the constituent leads was unsubscribed. Therefore, this function returns the value and index of the first TRUE value in the list of lead values.

Call Appropriate Prioritization Function

The ruler function gets called from the main program within the script and then takes the list of lead values given for the field being evaluated and passes them into the rules dictionary. The rules dictionary maps a field to the function (one of the functions described above) that should be called to get the highest priority value for that field from the lead values that were passed in.

Append Dictionary

The appendDict function does what it says on the tin. It takes an input dictionary and for each value this dictionary has for a key it will append each value to the value(s) already in the master dictionary for the same key.

Updating Winning Lead Values

The createUpdateLead function makes a post request to the create/update lead endpoint to update a lead with the winning field values after the merge is complete. The winning lead id and winning field values are passed in JSON format to the payload of the request to make the update.

Although this endpoint allows for updating 300 leads at once in a single call, it was found to be safer to update each winning lead with the winning field values straight after the merge was complete in case the script were to fail before 300 leads were merged meaning that none of the winning leads would have been updated.

Marketo Merge Leads Main Script

Flow diagram of the Marketo Merge Leads main Python script


Flow diagram of the Marketo Merge Leads main Python script

Getting Lead Data & Access Token

Before running the main merging script, go to the “Possible Duplicates” system smart list in Marketo, sort by email address (the script assumes that the input data is sorted by email), and then export all the duplicates from a view that contains all the fields you are interested in setting up priority rules for when merging.

N.B. This script was originally configured to pull lead data using the API, which returns createdAt date in ISO 8601 (2016-11-03T19:17:57Z) standard and boolean values as either “TRUE” or “FALSE”. When exporting from Marketo as a CSV the date needs to be converted to ISO 8601 standard and the 1’s or 0’s representing boolean values need to be converted to “TRUE” or “FALSE”. Importing the CSV into a sheet tool will get this done quickly or you can modify the createdAt and boolTest functions in the priority script to work with the default exported values. To save on API calls and make the script faster to run it is recommended to pull in the lead information via CSV rather than relying on the API to pull in lead information from lead ids.

Once the data has been extracted from Marketo it can then be imported into the main script as a dictionary using the Pandas library in Python. Before looping through this dictionary of duplicates, it is necessary to get an access token from Marketo and ensure that there are more than 60secs left on the token life so that there is enough time to complete the merge and update the winning lead in a single iteration of the loop. This is achieved using nested while loops with the inner while loop checking that token life is greater than 60 secs before every run and if this is not true the outer while loop will wait until the access token has expired and then get a new access token with 3600secs of life.

Getting, Storing, & Prioritizing Lead Values

Since the input dictionary is sorted by email address the field values from successive rows with the same email are pulled into the field_dict dictionary which stores the lead values using the field names as the keys. Once this is done the comparison of the lead values for each field is achieved using a for loop to iterate through each one of the field keys in field_dict.

If the values for a particular field match then this value is put in the final_dict dictionary for that field, otherwise the field values are evaluated using the custom rules declared in the script using the ruler function (see the “Rules for Winning Field Values” section above)

The first non-null sfdcLeadId value is stored in final_dict and the corresponding index can be used to pull in the 'id' and 'mcUserId__c" values from the same lead who has this sfdcLeadId value and who will be the winner of the merge. If none of the leads exist in Salesforce then the id , mcUserId__c, and createdAt values of the lead who was created first are stored in final_dict.

Similarly, the "Lead_Source_Detail__c" and the three "utm_xxx__c" values are obtained from the lead who has the highest priority leadSource value.

Marketo Merge Leads & Update Winner

Once all the prioritized values have been stored in final_dict, they are logged in a timestamped text file along with the field_dict values before the script proceeds to merge the two leads by making a request to the Marketo merge leads endpoint.

The response of the merge is logged and then if the merge is successful the newly merged lead will be updated with the prioritized values for the fields stored in final_dict. Sometimes even though the script logic selects a winning id (see section above) Marketo has an overriding rule that if a lead and contact are being merged in SFDC then the contact id will always win.

Therefore in some cases, the supposed “winning” lead id specified in final_dict will be the losing id if this “winning” id was a lead and the other id corresponded to a contact. If this is the case "status":"skipped" will be returned in the response meaning it is necessary to update one of the other lead ids stored in field_dict with the prioritized values using a while loop to iterate over the remaining ids until the actual winner is found and updated.

You can avoid the need for this while loop and ensure that the winner selected in your Python logic always matches that of Marketo’s inbuilt logic. This can be done by modifying the main merging script to pull in the sfdcType field to ensure that an SFDC contact will always be selected as the winner if the other people to be merged are SFDC leads.

N.B. Out of the 13,000 leads I merged with this script there were a stubborn 1000 leads who would not be merged because of a vague 611 error returned from SFDC and a lot of these just had to be merged manually. If you are encountering this error and have too many leads to merge manually I can give some tips to try that showed some success in merging leads with a previous 611 error. Just leave a comment below 🙂

Rinse & Repeat

After merging, a 0.2-second delay is implemented so that Marketo’s REST API rate limit of 100 calls in 20 seconds is not exceeded. Finally, the remaining time on the token is calculated at the end of the inner while loop and if this time is greater than 60secs and there are still more leads to merge then the next iteration of this while loop will proceed.

If the remaining time is below 60secs, the script breaks out into the outer while loop and then waits for this time to elapse before requesting a new access token and diving back into the inner while loop. This process will be repeated until all the leads in the CSV file have been merged.

Take a look at the merging script in action by viewing a sample output file from the merging script.

Marketo Merge Leads Automatically Upon Creation

Great! You are now able to clean up that HUGE pile of duplicates that has been eyeballing you for the past year by taking this GitHub code and making it your own.

In the scenario, where you are not able to stop the cause of Marketo duplicates or you are in the process or working on a fix and you do not want these duplicates to build up then you can merge leads automatically upon creation using Zapier.

Take a look at the Marketo Merge Automatically with Zapier post to see how you can use the Marketo merge leads endpoint that you know and love so well to merge leads as soon as they are created in Marketo.

About The Author — Tyron Pretorius
Tyron Pretorius

Greetings fellow MOPs pros! As a previous mechanical engineer turned marketing operations lead, I now fancy myself as a full-stack marketer capable of designing websites, writing SEO-optimized content, and doing all the tracking and automation on the backend. That being said my bread and butter is marketing operations and I love to get down and dirty with the Marketo API, Python, Zapier, and Google Scripts.

Related free workshops

Google Sheets data being accessed in a Zap action in Zapier

How to use Zapier with Google Sheets

Integrating Google Sheets with Zapier will allow you to unlock a whole new array of automation possibilities for the data in your spreadsheets. This post will show you how to use Zapier with Google Sheets by walking through the 3 different options for importing data into Zapier using the Google Sheets app.

Become a member
Timestamp being used as the Google Sheets lookup value in Zapier

Zapier Google Sheets Lookup Value

When trying to pull a specific row from your Google Sheet into Zapier you can use a lookup value dynamically populated with a variable from a previous step. This is useful but the real power of using the Google Sheets lookup value in Zapier comes when you need to import large amounts of data into your zap.

Become a member
Button in Google Sheets to trigger a Google Script to send multiple rows to Zapier

Zapier Import Multiple Rows from Google Sheets

Anyone who has worked with Google Sheets in Zapier is likely to have run into the issue of needing to import more than the 20-row maximum imposed when using the Google Sheets app. This post will show you how to overcome this limitation and import multiple rows from Google Sheets into Zapier using some JavaScript and webhooks.

Become a member