Data Lake and Data Scientist

Data Lake is a new terminology of a large amount of a mix of structured, unstructured or semi-structured data’s including but no only, document, images, videos, etc.

Among the products used on the market, we can name: Apache Hadoop or Amazon S3,

The difficulty is not really to create a Data Lake but to exploit it efficiently. Tools like Pig or Hive exists to structured data but the job of the Data Scientist is to find the value among these data in accordance with the “Company Business”.

Please find below a very good article about the introduction of a Data Lake: (in French)