Abstract

Researchers  often try to capture as much information as they can, either by using  existing architectures, creating new ones, going deeper, or employing  different training methods. This paper compares different ideas and  methods that are used heavily in Machine Learning to determine what  works best. These methods are prevalent in various domains of Machine  Learning, such as Computer Vision and Natural Language Processing (NLP).

Transfer Learning is the Key

Throughout  our work, we have tried to bring generalization into context, because  that’s what matters in the end. Any model should be robust and able to  work outside your research environment. When a model lacks  generalization, very often we try to train the model on datasets it has  never encountered … and that’s when things start to get much more  complex. Each dataset comes with its own added features which we have to  adjust to accommodate our model.

One common way to do so is to transfer learning from one domain to another.

Given  a specific task in a particular domain, for which we need labelled  images for the same task and domain, we train our model on that dataset.  In practice, the dataset is usually the largest in that domain so that  we can leverage the features extracted effectively. In computer vision,  it’s mostly Imagenet, which has 1,000 classes and more than 1 million  images. When training your network upon it, it’s bound to extract  features2 that are difficult to obtain otherwise. Initial layers usually  capture small, fine details, and as we go deeper, ConvNets try to  capture task-specific details; this makes ConvNets fantastic feature  extractors.

Normally  we let ConvNet capture features by training it on a larger dataset and  then modify. Fully connected layers in the end can do whatever we  require for carrying out classification, and we can add a combination of  linear layers. This makes it easy to transfer the knowledge of our  network to carry out another task.

To read more about it, please refer to this original paper:

Using Transfer Learning to Introduce Generalization in Models

Also Transfer Learning in NLP is out now:

Visit AI Journal for more videos. Don’t forget to subscribe . Stay connected with us on Twitter to stay updated in AI Research. Please support me on Patreon
  1. An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks
  2. Visualizing and Understanding Convolutional Networks
  3. Universal Language Model Fine-tuning for Text Classification
  4. Learning Without Forgetting
  5. Deep Residual Learning for Image Recognition
  6. Theoretical Impediments to Machine Learning With Seven Sparks from the Causal Revolution