This is the official documentation of uncovr.

1.Introduction

1.1 Background & Motivation

There is an increase in adoption of Artificial Intelligence(AI), Machine Learning(ML) algorithms across multiple domains. The advent of cloud technologies, auto ML and low/no code applications have made building and deploying AI/ML applications simpler. While most of these technologies have focussed on automating the build and deployment of data science applications, only a few have focussed on auditing the robustness of the applications. For example, there is less focus on interrogating the AI/ML model to understand its performance with respect to challenges such as model fitment(over/under fitment), model bias and data drift. This product uncovr by FOYI aims to uncover the vulnerabilities of the model and thereby help the data scientists to strengthen the model.

1.2 Current state

The product is in its very early stage and has just one function i.e indepAndDep. This function generates the independent variables and dependent variable such that the relationship between them is known beforehand. The general approach of generating this data is parametric in nature i.e. a random set of values based on the model parameters are generated. Since the model formula is known beforehand, this data can be put through the machine learning process/pipeline to build an AI/ML model. Such an AI/ML model performance metrics such as rmse (root mean squared error) can be computed to see if the model is overfit or underfit based on the actual rmse. This product is constantly evolving and is open for feedback. We will keep publishing new functions and will share our product roadmap at some stage. If you have any suggestions or feedback, please email support@foyi.co.nz

1.3 Approach

The general approach of the product uncovr is empirical in nature and with the following cyclical steps.

  • Release new feature.
  • Observe the impact of that feature.
  • Draw conclusions based on the observations.
  • Plan and build new features based on the above conclusions.

2. Usage

This is a Saas(Software as a Service) product built on Azure API management service. The following are the steps to access this product.The sub-sections further down this documentation will explain the steps in detail.

  1. API subscription
  2. Call the API directly
  3. Call the API using R Package: conjurer
2.1. API subscription

The first step to accessing the API is to get the subscription key. The steps to sourcing the subscription key are as follows.

  1. Head over to the API developer portal at https://foyi.developer.azure-api.net.
  2. Click on the Sign up button on the home page.
  3. Enter your details such as email, password etc.
  4. You will then recieve an email with details regarding verifying your email id.
  5. Once you verify your email, your account will be setup and you will recieve a confirmation email.
  6. Once your account is set up, please head over to the profile section of the developer portal. You can find it on the menu at the top right hand corner of the web page.
  7. On your profile page, under the subscriptions section, click on show next to the Primary key. That is the subscription key you will need to access the API. Congratulations!!, you now can access the API 🎉 .
2.2. Call the API directly

Once you have subscribed to the API, the following are the components of the API call (POST) to access the API directly.

  1. The URL to access the end point is https://foyi.azure-api.net/uncovr/uncovr

  2. The parameter for the POST function is funcName and its value is indepAndDep. An example of calling the function directly using postman is displayed below.

  3. The header for the POST function is Ocp-Apim-Subscription-Key and its value is your subscription key that is explained in the earlier section. An example of calling the function directly using postman is displayed below.

  4. The body of the POST function is a json file. The format of the json is explained with an example below.

    { 
      "iv" : {
        "iv1" : {
          "noOfpoints"  : 100,
          "lowerBound"  : -3,
          "upperBound"  : 10,
          "dataType" : "continuous"
        },
        "iv2" : {
          "noOfpoints"  : 100,
          "lowerBound"  : 5,
          "upperBound"  : 25,
          "dataType" : "continuous"
        },
        "iv3" : {
          "noOfpoints"  : 100,
          "lowerBound"  : 100,
          "upperBound"  : 125,
          "dataType" : "continuous"
        }
      },
      "dv" : {
        "dataType" : "continuous"
      }
    }
    
  5. The first level of the json has two elements namely iv i.e independent variable(s) and dv i.e dependent variable.

  6. The element iv has one entry per independent variable. The example above has a request for three independent variables and therefore has 3 elements namely iv1, iv2 and iv3. Each of these elements has 4 key-value pairs where the key is the parameter for the function and the value is its corresponding value.

    • noOfpoints corresponds to the number of observations i.e the number of rows of data requested. Currently, this function accepts any value between 100 and 10,000.
    • lowerBound and upperBound represents the range of values from which random values will be chosen for each of the independent variable.
    • dataType defines the type of data. In the current version of the product, this can take one of the two values namely continuous and discrete. Please note that if you would like to request a categorical variable, the current work around is that you can specify discrete as dataType and give a numeric range of numbers by using lowerBound and upperBound. The resultant values can be later replaced with your own class values.
  7. The element dv corresponds to the dependent variable. Currently, there is only one parameter for this element namely, dataType. The only value this can take currently is “continuous”. An example of calling the function directly using postman is displayed below.

  8. The output of the POST action will have the following elements.

    • depVar is a list of values corresponding to the dependendent variable.
    • slope prefixed elements are the corresponding slope i.e. the coefficient of the independent variable. For example, if there are 3 independent variables in the POST request, then there will be 3 corresponding slopes for them.
    • indepVars is a list of independent variables that are requested. For example, if there are 3 independent variables in the POST request, then there will be 3 corresponding elements in the list. Furthermore, it must be noted that the length of each element of the list i.e. the independent variable will be equal to the parameter noOfpoints in the the POST request.
    • error is a list of errors for every observation.
    • Besides the above data, model performance measurements such as mae(mean absolute error), me(mean error) and rmse(root mean squared error) are also provided. An example output of calling the function directly using postman is displayed below.
  9. The output can be interpreted as follows.

    • The dependent variable can be computed as a linear function of the intercept, independent variables, slopes and error.
    • If there are 3 independent variables in the POST request, then the dependent variable can be computed as follows.
      depVar = intercept + (slope1 * iv1) + (slope2 * iv2) + (slope3 * iv3) + error
      
2.3. Call the API using R Package

The product uncovr can also be accessed using an open source R package conjurer. The steps to accessing uncovr using conjurer are as follows.

  1. Install conjurer package by using the following code.
    install.packages("conjurer")
    
  2. The following code generates data for 3 independent variables.
       
    library(conjurer)
    
    uncovrJson <- buildModelData(numOfObs = 1000, numOfVars = 3, key = "<input your subscription key here>")
    df <- extractDf(uncovrJson=uncovrJson)
    
  3. The top few rows of the data can be accessed as follows.
    print(head(df))
    
  4. The model performance measurements can be accessed as follows.
       
    meanAbsoluteError <- uncovrJson$mae
    meanError <- uncovrJson$me
    rootMeanSquaredError <- uncovrJson$rmse