Basics in Data Engineering

22 Aug 2018

When preparing for my data engineering interview, I realized that there are a lot of basics people may not necessarily know which require some form of revision. There are jargon like Hadoop, AWS SageMaker, PySpark, HDFS, Hive, Pig, data mart, data warehouse etc. The panel may ask you specific technical questions related to the role that you may need to prepare for. This profession is not for the unprepared/ faint of heart. So let’s go through some of this so you will be more confident in talking about your skills:

  1. Data Lake, Data Mart and Data Warehouse

  2. Hadoop, HDFS, Hive, Pig, MapReduce, Impala

  3. Data warehouses - Hive-QL, Snowflake ANSI, RedShift SQL

  4. Amazon EMR, Glue, Athena, Data Pipeline, Lambda, SageMaker

  5. DevOps, CI/CD