AIssisted™ Data Preparation Wizard
Accurate predictions with machine learning and AI are heavily dependent on the quality of source data. Incomplete or incorrect data will almost always lead to poor-quality prediction results, so before running predictions, it is wise to ensure data quality. The Data Preparation Wizard is designed to help do this in a few steps. Start by choosing the Data Preparation Wizard from the Start Page.
The Data Preparation Overview shows the databases and cubes equipped with the data preparation scenario. Choose a database, and the cubes with information relevant to their scenario will appear below.
The fields of the Data Preparation Overview are described below:
Field | Description |
Select Database | A dropdown of all databases stored in your Jedox instance. The database is your native work environment populated with custom data. We also provide demo data to allow you to familiarize yourself with driver analysis functionality outside of your own work environment. |
CUBE (DB) | You may choose any cube for data preparation purposes, so long as it has at least three dimensions it can utilize as the Time , Version , and Measure dimensions.The database is your native work environment populated with custom data. We also provide demo data to allow you to familiarize yourself with data preparation functionality outside of your own work environment. |
SCENARIO | Shows the saved scenarios for each cube. This can be used to create different data preparation setups for the same cube. |
ALGORITHM | AIssisted™ Data Preparation uses a number of different algorithms. Each calculates outlier detection and interpolation in a different way. It can also expand and add missing data points to the beginning or end of limited data sets with extrapolation. Unlike the other wizards, the Data Preparation Wizard cannot offer a Best algorithm choice, because there is no possible accuracy to compare. The wizard provides the best default scenario and the user can change these algorithms according to his needs and specific source data. For more information, please see the Algorithm Presentation document in the Marketplace listing and the Documentation in the Report Designer/AIssisted™ Planning/Files area. |
LAST STARTED | Shows the last time each cube's scenario started. |
INPUT range | Shows the start date and end date of the time period undergoing the data preparation. |
START (button) | Runs data preparation for your cube with existing settings. |
COPY (button) | Allows you to copy an existing scenario by selecting it from the list of the "Copy Scenario" dialog, or add a new scenario, which will then appear in the "Scenario" list for selection. Click the Save Changes button to confirm the task. |
EDIT (button) | Allows you to adjust the input data of the cube (e.g. change the preparation range, use a different algorithm, or chose different dimensions/elements). |
REMOVE (button) | Removes the AIssisted™ Data Preparation scenario from the cube. The default settings for this will also delete the versions with the suggested values. If you would like to keep these versions, uncheck the option in confirmation prompt pop-up. |
PREVIEW (button) | Shows a preview of the populated values and compares the actual data with your suggested data. |
STATUS (button) | Displays the status of the Integrator job after you start it. To see eventual changes, click the refresh icon in the "Status" column. Once finished, it will show a different icon with the results. Click on the icon for more information. |
New Setup (button) | Adds AIssisted™ Data Preparation scenario to a cube. |
Manual Settings (button) | Used to change prediction setups and scenarios without entering the wizard itself. Read more about it in Manual Settings for AIssisted™ Planning. |
Click New Setup to create a new data preparation scenario. The wizard will guide you through the necessary steps.
Step 1: select and validate cube
The selected database is shown along with a dropdown menu of scenarios and a list of available cubes stored within the database.
The "Cubes" list allows you to choose the particular cube on which you would like to perform data preparation analysis. The selection will indicate with a green or red message box whether the selected cube is validated for the analysis, i.e., whether it has the required dimensions. A validated cube must have a Time
dimension, a Version
dimension, and a Measure
dimension. Click the Advanced Setting button to access the dimension type assignments.
The "Scenario" Combobox gives you the option to save more than one scenario per cube. Select your scenario from the dropdown menu or click the + to open the "Create New Scenario" dialog. There you can add a new scenario, which will then appear in the "Scenario" list for selection. Click the Save Changes button to confirm the task.
Once you have selected a validated cube, click Confirm Selection and then click Next.
Step 2: select Data Source
You have selected your cube. Now you must narrow your selection to a specific data slice.
The components of this slice are the Time
, Version
, and Measure
dimensions, located in the upper portion of the wizard. Check the box next to "Use last month as end date" in order to choose the last finished month as the end of the input range instead of using a fixed range. This feature allows you to fully automate the process of data preparation for your monthly planning.
The Other Dimensions, found in the lower portion, will further fine-tune your selection.
The Set defaults button dynamically selects the dimensions and elements based on the layout of your data. You may also select your source material manually. Simply select the start and end Time
dimensions from the Comboboxes, then click the adjustment icons ( ) to select the rest of your source material, which can include multiple elements per dimension, with the exception of the Time
dimensions.
The Mode setting for other dimensions has two options: onlyNodes and onlyBases. This sets whether the data preparation will use data at the consolidated (node) level or at the base level.
Because using data at the base level may result in extreme quantities of data being processed and long wait times for results, the wizard limits the number of dimensions that can be set by onlyBases. If more than one dimension is chosen, an error will appear when trying to move on.
If you are sure you would like to use base level data for more than one dimension, this can be set manually in the Manual Settings report. However, it is recommended that first the data preparation be run with the onlyNodes mode.
The Source View button views your source data before going through the data preparation process. You can check for missing values or possible outliers at this stage.
When you are finished, click Next.
Step 3: select Data Preparation Properties
Now you must select the Data Preparation Properties, namely the data interval, outlier and interpolation settings.
Select either a monthly or daily data interval, depending on if there is monthly data or daily data in the cube. The "Outlier settings" options, which find out-of-place or extreme values, are as follows:
-
Innovational Outlier Detection: where an unusual innovation in the generating process affects all later observations.
-
Additive Outlier Detection: outliers which affect only a single observation.
-
Level Shift Detection: detects level shifts in time series.
-
Temporary Changes Detection: represents a spike that takes a few periods to disappear.
-
Seasonal Level Shift Detection: considers seasonal level shifts while detecting outliers.
-
Time Series Detection: basic time series outlier detection.
-
Cook’s Distance: outlier detection using linear modeling.
-
Extreme Value Detection: simple extreme value detection.
-
Off: no outlier detection selected.
The "Interpolation Settings" options, which replace missing values, are grouped from the highest to lowest result quality, with the highest-quality results taking the longest time to process and the quicker results being of a lower quality. There is also an Off option, which will not run interpolation.
For more information on these algorithm settings, please see the Algorithm Presentation document in the Marketplace listing and the Documentation in the Report Designer/AIssisted™ Planning/Files area.
Select the checkboxes to apply additional data preparation management tools. They are:
-
Outlier detection: replaces zeros. In some cases, zeros mean missing values; in other cases, a zero is exactly that, i.e. zero items sold. If zeros mean missing values, check this box.
-
Add Extrapolation (fill empty start and end ranges): if your data set does not contain enough data points, extrapolation can help by suggesting values. Check this box to employ this tool.
-
Schedule Data Preparation: check this box to automate this data preparation scenario as a task in the Scheduler. The default setting is once per day, but can be changed in the Scheduler tab.
Once you have selected your Data Preparation Properties, click Next.
Summary Step:
You may now save and execute the Data Preparation or simply save your settings. Either way, your scenario for this data set will now appear in the Overview. Here it can be viewed, modified, and executed at any future point.
Once executed, the data automatically becomes available in the cube and can be implemented in reports and templates.
Updated November 4, 2024