Why Neural Networks are Limited?
Updated: Oct 8, 2020
Why we are living in a world that NN's are limited from their core development and during the all development life-cycle?
We all want better Neural Networks algorithms for our AI application usage. The thing is that the finest NN's is absolutely limited by the original training dataset that trained it.
For many years we were told that we have too much data to process and too little datasets to train our ML and NN's, in the past 2 years the situation has change and we see more and more datasets released for public use by the AI community, academic researches, governments and commercial companies, all together create a situation that according to google new "dataset search engine" there are more than 25 million datasets available for the AI community and data scientists to explore and evaluate.
So if we have so many datasets why do we still need' to build our own dataset? and why most of the labeling, tagging, classification and many more tasks during the process of data preparation are done with "human in the loop" which in many cases do most of the work and takes most of the time and resources of any AI project.
In this post I would like to share one question that we at ProjAIX ask ourselves every day: "how good is the dataset?"
The problem with exiting millions of datasets is that many of them:
1. Were not used or evaluate real life use cases with real world data
2. They were created for a specific use case which cannot be replicated or used
3. The dataset is released without enough information about the prediction model or the objectives tested.
Beyond those reasons there is another challenge, especially in computer vision, where imagery files have different properties, like: shooting angle, distance, lighting, height and more that influence and impact the outcome of NN's training and the prediction model. So eventually, although we have millions of datasets for public use, practically we don't know how good they are and how data scientist can quickly adopt these datasets and use them.
Our methodology approach at ProjAIX leads us to build a platform which is trying to answer the most important questions during the process of searching datasets to be trained with NNs --> "what is the quality of the datasets? and how can I confirm it?
if you have any comments or question please do not hesitate to contact us via email firstname.lastname@example.org