Data Transformations

Content

Purpose
Aggregating (grouping) data records
Splitting a data source in two parts


Purpose

The 'Data Transformation' functions in the data source panel can be used to transform an existing in-memory data source within Synop Analyzer into one or two new data sources with slightly different properties. The new data sources will be available in Synop Analyzer in addition to the original data source.

At the moment, the following data transformation functions are available:


Aggregating (grouping) data records

This transformation function creates a new data source in which the data records are aggregated into larger groups, each group defining one data record of the new data. Optionally, some of the data fields of the original data source can be suppressed during that transformation.

In the following paragraphs, we will demonstrate that function using a concrete example. To that purpose, we open the data doc/sample_data/RETAIL_PURCHASES_BY_TIME.txt and read them into Synop Analyzer using the default settings. The file contains supermarket checkout data: 1000 purchased articles, sorted by the date and time of purchase.

We want to create a list of the most expensive purchase article (and the customer who purchased it) of each week. By clicking the button record grouping, we open a pop-up window in which we can specify the parameters for a data aggregation:

image file img/grouping_panel_467.png not found

In the screenshot shown abive, we have already performed the following modifications in the panel:

By clicking the OK button we start the data transformation. A new tab pops up on the left side of the Synop Analyzer workbench. The new tab contains the transformed data source and offers the same functional buttons for data transformations and data analysis functions as the input data tab of the original data source. Clicking the button multivariate and then in the new window the button show data shows the data records of the new, aggregated data source. As expected, the new data contain only two records, one for each week covered by the original data. Surprisingly, on each of the two weeks, the most expensive purchased article was the same one and it was purchased by the same customer.

image file img/grouping_result_658.png not found

Splitting a data source in two parts

This transformation splits the data in two parts. Each data record of the original data is assigned to exactly one of the two new parts. The assignment is performed by means of a random number generator. The data can be split symmetrically (50:50) or asymmetrically.

Clicking the button split data opens the following pop-up dialog:

image file img/split_data.png not found

In the first input field of the dialog, we define the size ratio of the two data parts. The predefined value of 0.5 creates two parts of equal size.

The second, third and fourth Input field specify the directory path, the names and - indirectly via the file name endings - the types of the files in which the resulting partial data are to be persistently stored on disk. Leave these fields empty if you do not want to store the data parts persistently. Hence, the following alternatives are possible:

Finally, the check box specifies whether the original data is maintained as a separate input data tab within %IA;, or whether the original data tab is replaced by the first resulting part after the split.