Introduction

Meta Learning is a complex, yet sought-after algorithm that deals with a hurdle faced by common machine learning (ML) algorithms. It is distinct in its ability to solve problems through learning iteratively, not just through adjusting errors in the predicted output with the actual output but also through learning by the learning process itself.

What this essentially means is that a common ML algorithm has an underlying bias, defined by its assumptions, that is used to fit/understand the data. Then makes predictions that are a result of bending its assumptions based on what the current training data suggests. However, this may be applicable or perform well in a specific domain, but to other domains the investment made by the carrying out the learning process will hold little or no value. Meta learning solves this problem by providing a dynamic solution that iteratively adjusts the errors by taking the initial parameters of the model and the adjusted/optimized parameters of the model into consideration. We will break this down into how this is achieved in the methodology.

Meta learning has created a significant impact in deep learning studies and its application with real & simulated data. Commonly used techniques that deploy adversarial loss used by GAN present positive results for characterizing realistic data, however Meta Learning produces more diversified classifications through ‘Domain Randomization’ on simulated data [1]. Curriculum learning, a data optimization technique in Meta Learning has received more traction than any other data-level optimization technique. Architectures such as AlexNet & ResNet used in neural networks are being outshined by state of the art Meta learning architectures such as AmoebaNet & EfficientNet [1].

Methodology

The pseudo code implementation of Meta learning helps in understanding the otherwise complex process followed by the algorithm in a simplistic manner. We will be referring to the process of fitting a model as creating a ‘plan’ that is developed through identifying patterns in the data [2]

Breaking down the 1st Step: This consists of creating a base implementation that will be the formation of our first plan. This step is what sets the initial parameters.

  1.  Setting n = 1.
    Breaking down the 2nd Step: In this step, we define a loop that keeps on executing till we haven’t covered the entire pool size/ data and partitioned it into different plans. In the second tier, or what we develop over the base plan now considers the knowledge it acquires from the ‘development of the plans’ thereby ‘learning to learn’. The hybridization of the construction of plans on this tier are a result of the initial parameters set by the base plan and the calculated parameters post correction from the loss function. The calculation is performed on second derivatives that keeps increasing in order in the loop as we create more plans from knowledge gained from each previous tier. This process can be interpreted as a feedback correction achieved from back propagation wherein higher tier plans influence the plans on the lower levels.
    1. While (1) # Infinite Loop
    2. Initialize P(n) as the set that will contain error calculations and model parameters from nth order plansTill P(n) < Maximum Pool Size(n) execute: (Till condition isn’t met, the below three steps are repeatedly executed)
      • Create a new nth order plan and give a default name P_new
      • Set previously initialized P(n) to P(n) UNION {P_new}
      • Test_and_Replace() function is called. This tests the output and decides which plan stays, and which lower level plan is to be replaced with the new plan of a higher order, that has been created in the current iteration. This steps consists of the main functionality that resides implicitly in the process. We will deep dive into this function in the next section [2]
    3. Increment n by 1

Breaking down the Test_and_Replace() Function:

  1. If n = 1 (Loop in 1st iteration)Add the base plan, which is P_new in this case to the set P(n) and assign a value of measure to be compared later
    1. Else (Till no exit condition is reached)
      • Based on probability, identify plans from P(n-1) and create a new plan P by using information/pattern derived from P_new
      • Apply Test_and_Replace function recursively on P, considering it to be a member of P(n-1)
      • Compare the performance gain as elicited from P when compared to its lower order derivatives in P(n-1). Assign a new value to P based on the comparative performance gain.
    2. Make a decision whether P replaces a lower order derivative plan in P(n)

      The entire process defined in the methodology is terminated when the replacement of a higher order plan is not able to make notable improvements in performance to a lower order plan.

Illustration

  1.  https://towardsdatascience.com/the-rise-of-meta-learning-9c61ffac8564
  2. http://people.idsia.ch/~juergen/diploma1987ocr.pdf
  3. https://paperswithcode.com/task/meta-learning

 

References

  1. https://towardsdatascience.com/the-rise-of-meta-learning-9c61ffac8564
  2.  http://people.idsia.ch/~juergen/diploma1987ocr.pdf
  3. https://paperswithcode.com/task/meta-learning
-Authored by Mayank Lal, Data Scientist at Absolutdata

Technical articles are published from the Absolutdata Labs group, and hail from The Absolutdata Data Science Center of Excellence. These articles also appear in BrainWave, Absolutdata’s quarterly data science digest.

Subscribe to BrainWave