1. Introduction


Welcome to Smarten Cloud, which is a Software as a Service (SaaS) designed for business users with average technical skills. Using the Smarten Insights guided interface, business users can quickly prepare data and create models with no coding or programming. The intelligent Smarten Insights engine provides auto-recommendations to simplify and streamline the user experience.

Smarten Cloud supports you as a Citizen Data Scientist, helping you add value to the organization and advance your knowledge, skills, and career.

Here are the quick getting started steps you can follow to create predictive models in just five minutes.

Let us create a medical cost prediction model as an example that predicts medical cost for a person based on such predictors as BMI and Sex.

2. Create a Dataset


You will need a sample data file to create a predictive model.

  • Download the medical cost prediction data file (.csv) from our welcome email or click here, or copy the link below and paste it into your browser.
    URL: https://app.smarten.com/csv/medical_cost_prediction_sample_data.csv

  • After downloading the data file, login to https://app.smarten.com

    Login Page

    Login Page

  • Click the option "I want to create dataset" from the home page.

    Home Page

    Home Page

  • Give the name of the dataset, upload the downloaded data file, and click "Next".

    Upload File

    Upload File

  • Once data from the data file is validated, the system will auto-detect the data types of each field and display all fields.

    Change the data type recommended by the system if you wish, and click "Save".

    Data Preview

    Data Preview

  • You can reopen the data type selection dialogue by clicking the 'Column datatype selection' icon.

  • Click on 'OK'. The dataset is now saved and ready to use.

  • You can get the data quality score of the dataset and other data insights, such as missing value analysis, column analysis, Outliers, Column associations, Feature importance, and more.

    Let us look at some of the data insights screens.

    • Data quality score and overview

      Data Insights: Overview

      Data Insights: Overview

    • Observations

      Data Insights: Observations

      Data Insights: Observations

    • Column associations

      Data Insights: Column Association

      Data Insights: Column Association

  • You can perform various operations from the tool bar, result set menu, and context menu.

    • Toolbar menu: Save, Export, SQL query, Aggregate, Pivot data, Join, Append, Sampling, Outlier and others.

    • Result set menu: Manage columns, Update data, Properties and Information

    • Context menu: Find & Replace, Filter, Fill, Sort, Transform, Split and others.

      Dataset View

      Dataset View

3. Create AutoInsights


Now that we have the medical cost dataset available, let us use the Autoinsights option to generate and explore the most accurate predictive models for this dataset.

  • Click on the option “I want Smarten to generate Autoinsights” from the home page.

    Home Page

    Home Page

  • Select the dataset "Medical Costs" from the list of datasets and click "Next".

    Select Dataset

    Select Dataset

  • You can select the target and predictors from the dataset. If you select manual option, you can select targets and predictors manually and later on change target and predictors if you wish to change auto-selected columns. If you select auto option, the system will automatically select target and predictors.

    For this guide, let us select auto option. Select the "No" option in the target selection dialogue, and click "OK" to let the system automatically select the target.

    Select Target And Predictors

    Select Target And Predictors

  • The system will automatically generate various models for you as per the screen below.

    Autoinsights Models List

    Autoinsights Models List

  • You can change the model parameters by opening "Change model parameter" dialogue from the top-right corner and regenerate Autoinsights based on your target and columns selection. For this example, we will not change them.

    Change Model Parameters

    Change Model Parameters

  • Click on the model heading to expand and see details of the model.

    Expanded View Of Autoinsight Model

    Expanded View Of Autoinsight Model

  • Click the "Explore and Save" button of the "Predicting charges" model to open and explore the model.

    You can go back to auto insights by clicking the "Back to Autoinsights" link and explore another model.

    Explore & Save Autoinsight Model

    Explore & Save Autoinsight Model

    Explore models in more details with Interpretation, Model summary, Apply model, Key influencer analysis, simulation etc using left toolbar.

    Interpretation: You can view the interpretation of the algorithm applied for regression. The interpretation provides information about insights of the model in simple language.

    Model summary: You can view the model summary of the Smarten Cloud regression object.

    Data: You can view the data used for the Smarten Cloud regression object.

    Apply model: You can enter values for the input parameters and see the results of the model for regression.

    Key influencer analysis: The Key Influencers Analysis enables you to analyze the data, rank the factors that impact the metric of interest, display them as key influencers, and present the visualizations and interpretations in simple language.

    Simulation: You can apply the appropriate changes to the input parameters and obtain the results from the regression model.

  • Save the model by clicking the "Save" icon for future use. You can share the model with your team members by saving it in the 'Repository' folder.

    Save & Share Smarten Insight

    Save & Share Smarten Insight

4. Use Assisted Predictive Modelling to create regression


Use assisted, guided workflow of Smarten Cloud to generate predictive models for your dataset.

You will be guided through data preprocessing options, algorithm technique selection, and a few other options.

  • Click on the option "I want to try Assisted Predictive Modelling" from the home page.

    Using Assisted Predictive Modeling

    Using Assisted Predictive Modeling

  • Select the dataset "Medical Costs" from the list of datasets, and click "Next".

    Select Dataset

    Select Dataset

  • Select "Yes" to run model on sample data on the "Sampling and Filtering" screen. If you select "No", the model will be generated on full data.

  • Select "No" for apply filters option and click "Next". If you select "Yes", the system will show the columns on which you can apply filters and ignore data you want.

    For this guide, let us select Auto options and click "Next".

    Sampling & Filtering Options

    Sampling & Filtering Options

  • Select "No" to handle outliers on the "Data Cleaning" screen. If you select "Yes", the system will give you options to Remove or Replace the outliers.

    Select "Remove" option to handle missing values. If you select "Replace", the missing values will be replaced by median in measures and by mode in dimensions.

    Let us select default options and click "Next".

    Handle OutLiers & Missing Values

    Handle OutLiers & Missing Values

  • The next screen will show you the list of predictive techniques. Let us select the "Regression" option here.

    Choose The Algorithm Technique

    Choose The Algorithm Technique

    List of algorithm techniques:

    Algorithm Technique Description & Example

    Forecasting

    Forecast values for the future based on past values with one or more variables affecting future values. Forecasting can be performed based upon either the time period or unique identifier.

    Example: Forecast product sales based on past sales, inflation, and GDP growth.

    Other use cases: product/service demand forecasting, inventory management, GDP forecasting, tourism forecasting.

    Classification

    Split data into groups based on preassigned categories or classes. Classification is applicable in predicting dimension target variable with at least two categorical values.

    Example: An applicant for a new loan can be assigned likely/unlikely defaulter categories based on the preassigned defaulter/non-defaulter category for older applicants.

    Other use cases: credit card fraud, crime/no crime analysis, customer churn prediction, plant species classification.

    Clustering

    Split data into groups when preassigned categories or classes are not available (as compared with "classification," where preassigned categories or classes are available).

    Example: Segmenting online customers into heavy/moderate/low purchaser groups based on purchasing frequency, average purchase amount, income, age, etc.

    Other use cases: loan applicant risk segmentation, customer profile segmentation.

    Correlation

    Analyze how any two or more numeric variables are associated. Correlation can be performed only among measure variables.

    Example: Analyze whether or not there is a strong positive association between age and online purchasing frequency.

    Other use cases: identify association between product price and sales, between age and loan amount, etc.

    Regression

    Predict change in one measure variable based on change in one or more other variables. Answers such questions as the following: Which factors matter most? Which factors can we ignore? How do those factors interact with each other?

    Example: eCommerce companies can measure the impact of product price, product promotion, holidays, seasonality, etc., on product sales.

    Other use cases: yield management, predicting property price, medical cost prediction, house price prediction.

    Frequent pattern mining

    Finds frequent patterns from the data. Frequent Pattern Mining is applicable when your dataset contains dimension variables and a variable representing a unique identifier.

    Example: A retail store can place bakery products, such as muffins, bread, and eggs, together if these products have a high frequency of being purchased together.

    Other use cases: market basket analysis, cross-sell opportunity identification.

    Hypothesis testing

    Answers such questions as the following: Are two samples significantly different? Is the treatment effective? Are two dimensions related or independent of each other?

    Example: An eCommerce company can measure the regional influence on product category and gender influence on purchased product type.

    Other use cases: finding out if a medical treatment/promotional activity has been effective, if two river samples differ significantly in terms of pH level, etc.

    Descriptive statistics

    Provides basic statistics, such as mean, median, mode, standard deviation, variance, skewness, and kurtosis.

    Descriptive Statistics can be performed for the measure variables in the dataset.

    Explore our Citizen Data Scientist course for more on basics of Citizen Data Scientist.

  • The system will prompt you to go with the auto-recommended target and predictors or not.

    Info Prompt
  • Select "Yes"in the dialogue to let the system auto-recommend the target and predictors for now.

    If you select "No," you need to manually select the target and predictors variables. You can also choose the algorithm manually instead of auto-recommended by the system along with performing key influencer analysis or not.

  • Click "Next" in the select variables screen with the system recommended target, predictors, and other options.

    Select Target & Predictors

    Select Target & Predictors

  • The model is generated with the best-fit algorithm for you.

    Medical Cost Regression Model

    Medical Cost Regression Model

    You can perform various actions described below using the left toolbar.

    Interpretation: You can view the interpretation of the algorithm applied for regression. The interpretation provides information about insights of the model in simple language.

    Model summary: You can view the model summary of the Smarten Cloud regression object.

    Data: You can view the data used for the Smarten Cloud regression object.

    Apply model: You can enter values for the input parameters and see the results of the model for regression.

    Key influencer analysis: The Key Influencers Analysis enables you to analyze the data, rank the factors that impact the metric of interest, display them as key influencers, and present the visualizations and interpretations in simple language.

    Simulation: You can apply the appropriate changes to the input parameters and obtain the results from the regression model.

  • Save the model by clicking the "Save" icon for future use. You can share the model with your team members by saving it in the "Repository" folder.

    Save & Share Model

    Save & Share Model

5. Import PMML file to validate and use customer churn model created in python


You can use PMML files generated in other platforms, such as R and Python, to create predictive models and leverage ready-to-use Smarten Insights workflow to validate the model and use the model for predictions.

  • Download the customer churn PMML file from the welcome email or click here, or copy the link below and paste it into your browser.

    https://app.smarten.com/pmml/customer_churn.pmml

  • After downloading the file, login to https://app.smarten.com

  • Click on the option "I want to import a PMML file" from the home page.

    Select PMML File Option On Home Page

    Select PMML File Option On Home Page

  • Upload the downloaded file and click "Next".

    Upload PMML File

    Upload PMML File

    System will import PMML file, validate it, and generate the Smarten Cloud model from PMML definition.

  • The model will be loaded in Smarten and show important elements, such as Interpretation, Model Summary, and Model Information.

    PMML Model View

    PMML Model View

    You can perform various actions described below using the left toolbar.

    PMML model information: You can view the model information, such as algorithm type, created in, data dictionary, and others.

    Interpretation: Interpretation provides information about significant predictors and their influence on the target.

    Apply model: You can enter values for the input parameters and see the results of the model for a single record.

    Mass apply: You can apply the model to multiple records at a time by uploading a .csv file or selecting a dataset.

  • Save the model by clicking the "Save" icon for future use. Share the model with your team by saving it in the "Repository" folder.

    Save & Share

    Save & Share

6. Other Navigation Options


Navigation menu

The navigation menu gives below options.

Action Description
Home Icon

Go to Home page from any page.

Folder Icon

Open Model, My Folder, Repository, and Datasets.

Create Icon

Create new Dataset, Autoinsights, Assisted Predictive Model, and Import PMML file.

Account Icon

Go to My Account and Logout.

Folder Icon

Start help tour.

Navigation Menu Options

Home Page Navigation Menu

Home Page - Navigation Menu

Recent, Favourites, My folder, Repository and Datasets tabs

You can quickly search and open the models and datasets from these tabs.

Recently Used Objects & Datasets

Home Page - Recently Used Objects & Datasets

Use case examples

Open, explore, and learn about the possibilities with demo and use case libraries provided in this section.

Demos & Use Cases

Demos & Use Cases

My Account

Manage your account related activities in "My Account" section like buy/upgrade plan, add team members, transactions, change password and others.

My Account Management

My Account Management

How Can I Get Started?

It’s easy! Start today, with a FREE ten (10) Day Trial. CLICK HERE to register.

Following the FREE TRIAL period, Subscription Pricing starts at $9.99 per month.

Contact Us now! We can help you succeed on your Citizen Data Scientist journey. Get started today!