Working with Big Data

Bigger and better data give companies both more panoramic and more granular views of their business environment. The ability to see what was previously invisible improves operations, customer experiences, and strategy. You also need to get creative about the potential of external and new sources of data. Social media generates terabytes of nontraditional, unstructured data in the form of conversations, photos, and video. Add to that the streams of data flowing in from sensors, monitored processes, and external sources ranging from local demographics to weather forecasts. Every time search and buying processes take place online, information about customers practically collects itself.

The data only needs to be analyzed and used strategically. Product suggestions based on recently viewed items, ads personalized to the consumer and automated CRM processes are the norm by now and the use of sensors and the smart use of collected data will assist in this endeavor. With more numbers of sensors, microchips, and actuators in place, it is expected to have a large impact on businesses. As more devices get connected and gain smart features, more data will be gathered, and consumer experience can be improved which will eventually lead to a growth in revenue, a better management of inventory, increase in shopper intelligence. There will be more intuitive websites, making it possible to create more exciting and convenient shopping experiences or to practice more effective customer relationship management with new and accurate data.

D·engage’s Customer Data Platform supports the retrieval, collection and use of these kinds of high volume, high velocity data flows from different external sources and stores them in a big data repository containing all available data about every customer, always ready for analysis and targeting. The repository is then used for dynamic customer targeting to select the specific customers who will be eligible to receive each real-time campaign. It also serves for realtime customer activity tracking and trigger detection by use of relevant customer activity, as it happens, along with an analysis engine to calculate when pre-defined triggers occur. D·engage Big Data Repository consists of user created big data tables, of which their creation is only allowed through the GUI. Regular table and Send List structures are not suitable to store such kind of high-volume data flows. The following sections describes how to handle big data flows in D·engage platform.

Please note that Big Data tables are only allowed to be created through D·engage platform’s GUI for security purposes. Thus, API functionality provided is unidirectional and is only used to push data into the selected big data table.

How to Create a Big Data Table

Big Data Tables are created the same way as you would create a Regular Data table or a Send List through the GUI. As shown in the below figure, following the Data Space – All Tables – New Table flow brings you the selection of table type, where “Big Data Table” should be selected. After entering a table name and description, the column settings screen is displayed.

Upon creation, each Big Data Table comes with two predefined columns which are “key” and “event_date”. The user has the option to manually add any custom column to the table as well as pre-defined user-agent columns by clicking “Add Client Information Columns” link. This will automatically add 8 pre-defined columns that specifically define user-agent’s device, OS, connectivity properties.

Client info columns are added like the ones shown in the next figure. Remove Client Information Columns can be used to remove these pre-defined columns.

Apart from those 8 pre-defined columns, the users have the option of adding unlimited number of columns of any type to the big data table.

How to upload data to a Big Data Table

There is a single API call defined in the system to push data into big data tables. This function does not require a regular authentication token to be issued by the platform but instead it requires a unique account identifier (UAID) in each call. You must contact your account manager to get the valid UAID for your account.

A request definition of the Push Event call looks like below:

accountId is the UAID eventTable is the name of the big data table that must be created through the GUI prior to the execution of this call key is the parameter store the unique contact_key information asscociated with a specific customer. eventDetails stores key and value pairs of the data to be uploaded to the big data table. They are actually the column name/ column value pairs of the big data table.

A request example of the Push Event call looks like below:

{ “accountId”: “82e7e586-5efa-ef76-7663-1413870c3b76”, “eventTable”: “web_events”, “key”: “contact_key_123”, “eventDetails”: { “event_name”: “event1”, “event_time”: “2019-04-01 23:40:00”, “event_severity”: 0, “event_is_secure”: “false” } }

Data Upload Flow

When PushEvent API call is executed, the request is handled by the eventAPI in the system. The validity of UAID is checked by controlling the accountId parameter value in the request against a valid UAID in the system. If no matching UAID is found, the call is terminated without any further process.

Having a valid UAID, the system checks the eventTable parameter value. This value must point to a name of a predefined big data table in the system. If the parameter value points to a Regular Data Table or a Send List name rather than a Big Data Table, the request is dropped without further process.

The valid request values are not directly uploaded to the tables but are stored in the memory for 1 minute or when the total number of requests per account reaches 1000 (whichever happens first). Then the requests are enriched by user-agent information and written directly to RabbitMQ queues.

RabbitMQ queues are consumed by EventProcessor service in the system. It asynchronously retrieves the request values from the queues, performs validity checks on UAID, table names, syntax checks on column values and commits them into the big data tables.