Hello 👋 I’m a Junior Data Engineer in Asia. In my opinion, many Asian companies are struggling to bring data architecture into their business. So, as a junior data engineer, I have some trouble figuring our how to develop my career. I tried to find out what kind of problem a data engineer should solve by looking at AWS and Google’s interview questions. (The content of this article is taken from the attached link.)
Amazon Data Engineer Interview Questions
Amazon's data engineers play a crucial role in the company's day-to-day business operations. Data engineers work…
Amazon’s data engineers play a crucial role in the company’s days-to-day business operations.
Role of Data Engineers at Amazon
- Collect, store and manage huge quantities of data.
- Convert raw data into information that can be used to make decisions.
- Be at the forefront of data-driven decision-making by working closely with data scientists, product managers, and software engineers.
- Build and maintain database architectures.
- Coordinate with product managers, software engineers, and data scientists to work on common projects that involve leveraging datasets.
- Leverage SQL and programming to build algorithms.
- Perform data modeling and carry out ETL design, keeping with best practices.
Skills and Qualifications Required to Be a Data Engineer at Amazon
- 4+ years of experience in Python, SQL and ETL design
- Proven experience in data modeling and building data pipeline architectures
- 3+ years of experience in big data analytics, with workflow management engines(Airflow, AWS Step Functions, Google Cloud Composer…)
- Proven experience in working with cloud analytics platforms or MPP analytics platforms such as AWS Redshift, Google Big Query, Teradata, or Netezza
- Proven experience in SQL Performance Tuning
- Proven experience in designing database pipeline architectures
- Experience in using Big data analytics tools such as Spark, Impala, Hive, Presto
- Experience in E2E process optimization
- Experience with anomaly/outlier detection
- Algorithms and data structures
- Metric and visualization solution designs
- Spark, EMR
- Reporting tools like Tableau and Excel
- Data pipeline design
- DB performance tuning
- Statistics and modeling
- Customer Obsession: meet customer expectations
- Ownership: beyond your job responsibilities and work on a challenging project
- Insist on the Highest Standards: improve quality of a project and motivate others
- Think Big: significant professional achievement, make a bold and challenging decision, great impact
- Bias for Action: take a calculated risk, and take the initiative to correct a problem
- Earn Trust: speak up in a difficult or uncomfortable environment, gain the trust of your team
- Dive Deep: complicated problem you’ve had to deal with, utilize in-depth data
- Have Backbone; Disagree and Comment: something you believe in that nobody else does
- Deliver Results: push something to deliver results even though team give up on something
STAR: answer clearly in the form below
You can also refer to the links below.
Google Search processes over 3.5 billion searches per day and 1.2 trillion searches per year, positioning Google at the center of data exchange. Google used data for advertising and google gain 80% revenue from these services.
- Minimum Qualifications: 2+ year of software development, data engineering, business intelligence, data science, or related field with experience in manipulating, processing, and extracting value from datasets. 4+ years of experience in designing, building, and deploying cloud-based solution architectures.
- Collaborative Experience: skillfully communicating, organizing and analyzing
- Analytical Experience: Experiencing Designing Data Models and Data Warehouses and using SQL and NoSQL database management systems.
- Preferred Qualifications: Master’s degree
- What is your 5-year professional plan?
- Describe a time you failed to reach a goal.
- Describe how you work effectively with others and achieve the desired result.
- Tell me about a project you’re proud of.
Product sense & Business Cases
You have to interested in domain specified data.
- What kind of spam will you have on YouTube, and how to deal with them?
- How would you explain cloud computing to a 6-year-old?
- How many cans of blue paint were sold in the United states last year?
Data Analysis & Coding
- Find the number of emails received by each user under each built-in email label. The email labels are ~
- Find the total AdWords earnings for each business type. Output the business types along with the total earnings.
- Write a code to generate random normal distribution and plot it.
- Why use feature selection? If two predictors are highly correlated, what is the effect on the coefficients in the logistic regression? What are the confidence intervals of coefficients?
- For sample size n, the margin of error in 3. How many more samples do we need to make the margin of error to 0.3?