Data

Data Analyst Interview Questions 2026

22 real-world questions covering SQL queries, Excel functions, statistical analysis, case studies, and data storytelling that hiring managers actually ask.

16 min

22 Questions

Data

Build Your Resume Check Resume Score

Interview Questions

22 Questions with Answers

Click any question to reveal a detailed sample answer. Filter by category to focus your preparation.

All (22)

Technical (14)

Situational (3)

Behavioral (3)

HR (2)

Write a SQL query to find the top 5 customers by total order value in the last 6 months.

Technical

Medium

Sample Answer

SELECT c.customer_name, SUM(o.order_value) AS total_value FROM customers c JOIN orders o ON c.id = o.customer_id WHERE o.order_date >= DATEADD(MONTH, -6, GETDATE()) GROUP BY c.customer_name ORDER BY total_value DESC LIMIT 5. This query joins customers with orders, filters to the last 6 months, aggregates by customer, and sorts descending. In production, you would add indexes on order_date and customer_id for performance. Mention that you would also consider using window functions like ROW_NUMBER() for more complex ranking scenarios.

What is the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN?

Technical

Easy

Sample Answer

INNER JOIN returns only rows with matching values in both tables. LEFT JOIN returns all rows from the left table and matched rows from the right table (NULLs for non-matches). RIGHT JOIN is the reverse. FULL OUTER JOIN returns all rows from both tables, with NULLs where there is no match. In practice, INNER JOIN and LEFT JOIN cover 95% of use cases. LEFT JOIN is essential for finding records without matches, such as customers who have never placed an order. Always consider the performance implications of each join type on large datasets.

Explain the difference between WHERE and HAVING clauses in SQL.

Technical

Easy

Sample Answer

WHERE filters rows before grouping (operates on individual rows), while HAVING filters groups after aggregation (operates on aggregated results). For example, WHERE salary > 50000 filters individual employees, while HAVING COUNT(*) > 5 filters groups with more than 5 members. WHERE cannot use aggregate functions, but HAVING can. For performance, always filter as early as possible with WHERE to reduce the data that needs to be grouped. Use HAVING only when you need to filter on aggregated values like SUM, COUNT, AVG, or MAX.

How would you handle missing data in a dataset?

Technical

Medium

Sample Answer

First, understand why data is missing: is it Missing Completely At Random (MCAR), Missing At Random (MAR), or Missing Not At Random (MNAR)? Options include: deletion (listwise or pairwise) if missing percentage is small and MCAR; imputation with mean, median, or mode for numerical data; forward-fill or backward-fill for time-series data; and advanced methods like KNN imputation or MICE for complex patterns. Always document your approach and its impact on analysis. Never silently drop missing data, as it can introduce bias. Create a missing data report showing the percentage and pattern of missingness across variables.

What is a pivot table and when would you use one?

Technical

Easy

Sample Answer

A pivot table summarizes large datasets by reorganizing data into rows, columns, and values with aggregation functions. Use pivot tables to quickly summarize sales by region and product, analyze trends across time periods, compare performance metrics across categories, or create cross-tabulations. In Excel, use Insert > PivotTable, drag fields to Rows, Columns, and Values areas. Best practices include: refreshing data regularly, using slicers for interactive filtering, grouping dates by month or quarter, and using calculated fields for custom metrics. Pivot tables are the single most important Excel skill for data analysts.

Explain the difference between VLOOKUP and INDEX-MATCH. Why is INDEX-MATCH preferred?

Technical

Medium

Sample Answer

VLOOKUP searches for a value in the first column of a range and returns a value from a specified column. INDEX-MATCH combines INDEX (returns a value at a given position) with MATCH (finds the position of a value). INDEX-MATCH is preferred because it can look left (VLOOKUP only looks right), handles column insertions without breaking, is faster on large datasets, and allows both row and column matching. XLOOKUP in newer Excel versions combines the simplicity of VLOOKUP with the flexibility of INDEX-MATCH, supporting exact, approximate, and wildcard matches with built-in error handling.

Walk me through how you would approach a case study: customer churn is increasing by 15% quarter over quarter.

Situational

Hard

Sample Answer

Start by defining the metric precisely: how is churn calculated and over what period? Segment the data by customer demographics, product usage, acquisition channel, and tenure. Look for patterns: is churn concentrated in specific segments or spread evenly? Analyze the customer journey for drop-off points. Compare churned vs retained customers on engagement metrics. Investigate external factors like competitor actions or economic changes. Quantify the revenue impact. Then propose hypotheses and recommend data-driven actions: improve onboarding for high-churn segments, implement early warning systems using predictive models, or launch targeted retention campaigns. Always present your findings with clear visualizations.

What is the difference between correlation and causation?

Technical

Easy

Sample Answer

Correlation measures the statistical relationship between two variables, ranging from -1 to +1. Causation means one variable directly causes changes in another. Correlation does not imply causation because of confounding variables, reverse causality, or coincidence. For example, ice cream sales and drowning deaths are correlated, but both are caused by summer heat (confounding variable). To establish causation, you need controlled experiments (A/B tests), randomized trials, or careful causal inference methods like instrumental variables or regression discontinuity. In business analytics, always be cautious about claiming causation from observational data.

How do you choose the right chart type for your data?

Technical

Easy

Sample Answer

Match the chart to the message: use bar charts for comparisons across categories, line charts for trends over time, scatter plots for relationships between two variables, pie charts only for parts-of-a-whole with few categories, histograms for distribution shapes, and box plots for comparing distributions. Avoid 3D charts, dual-axis charts (they often mislead), and truncated y-axes. Follow Edward Tufte's principle: maximize data-ink ratio. Every element should serve the data. Use color purposefully to highlight key findings, not to decorate. For dashboards, use consistent color schemes and provide context with benchmarks or targets.

Write a SQL query using window functions to calculate a running total of sales by month.

Technical

Hard

Sample Answer

SELECT month, sales, SUM(sales) OVER (ORDER BY month ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total FROM monthly_sales. Window functions perform calculations across rows related to the current row without collapsing them. Key window functions include ROW_NUMBER() for ranking, RANK() and DENSE_RANK() for handling ties, LAG() and LEAD() for comparing with previous or next rows, and NTILE() for distributing rows into buckets. The PARTITION BY clause resets the window for each group, essential for calculating running totals per category or department.

How do you communicate technical findings to non-technical stakeholders?

Behavioral

Medium

Sample Answer

Lead with the business insight, not the methodology. Structure your presentation as: what is the key finding, why it matters to the business, and what action to take. Use simple language and avoid jargon. Support claims with clear visualizations rather than tables of numbers. Provide context by comparing to benchmarks, targets, or historical performance. Anticipate questions and prepare backup slides with detailed methodology. Practice the 'so what' test: for every data point, explain why the stakeholder should care. End with specific, actionable recommendations, not open-ended observations.

What is a p-value and how do you interpret it?

Technical

Medium

Sample Answer

A p-value is the probability of observing results as extreme as the actual results, assuming the null hypothesis is true. A p-value below the significance level (typically 0.05) means we reject the null hypothesis. However, a low p-value does not prove practical significance. A p-value of 0.001 in an A/B test might represent a statistically significant but practically meaningless 0.01% conversion lift. Always pair p-values with effect sizes and confidence intervals. Consider sample size: with large enough samples, any tiny difference becomes statistically significant. The p-value is a tool for decision-making, not a proof of truth.

Describe a time when your data analysis led to a significant business decision.

Behavioral

Medium

Sample Answer

Use the STAR method with emphasis on impact. For example: 'I analyzed customer support ticket data and discovered that 40% of churn happened within the first 14 days. By segmenting further, I found that users who did not complete onboarding step 3 had a 3x higher churn rate. I presented this to the product team with a recommendation to redesign onboarding. After implementation, 14-day churn dropped by 25%, saving approximately $200K in annual recurring revenue.' Quantify the impact and show how you went from data to insight to action to measurable result.

Explain the difference between a primary key and a foreign key.

Technical

Easy

Sample Answer

A primary key uniquely identifies each record in a table and cannot be NULL. A foreign key is a column in one table that references the primary key of another table, establishing a relationship between them. For example, in an orders table, order_id is the primary key and customer_id is a foreign key referencing the customers table. Foreign keys enforce referential integrity, preventing orphaned records. In practice, always index foreign keys for join performance. Composite primary keys use multiple columns together as the unique identifier, common in junction tables for many-to-many relationships.

How would you design a dashboard for an e-commerce company's executive team?

Situational

Medium

Sample Answer

Focus on KPIs that drive decisions: revenue, conversion rate, average order value, customer acquisition cost, and customer lifetime value. Structure the dashboard hierarchically: top-level summary with trends, drill-down capabilities by segment. Include comparisons to previous period and targets. Use sparklines for trends, large numbers for current KPIs, and conditional formatting for alerts. Limit to 6-8 metrics per view to prevent information overload. Add filters for date range, region, and product category. Automate data refresh and set up email alerts for anomalies. Design for mobile viewing since executives often check dashboards on their phones.

What is an A/B test and how do you determine sample size?

Technical

Medium

Sample Answer

An A/B test compares two versions of something (webpage, email, feature) by randomly splitting users into control and treatment groups. To determine sample size, you need: baseline conversion rate, minimum detectable effect (the smallest improvement worth detecting), statistical significance level (typically 95%), and statistical power (typically 80%). Use the formula or online calculators. A common mistake is stopping the test early when results look significant, leading to false positives. Run the test for the full predetermined duration. Account for novelty effect and seasonality. After the test, check for segment-level effects that the aggregate might mask.

How do you prioritize multiple data requests from different teams?

Behavioral

Medium

Sample Answer

Use a prioritization framework based on business impact, urgency, and effort. Ask each requester: what decision will this analysis inform, what is the deadline, and what happens if it is delayed? High-impact, time-sensitive requests (board meeting analysis, regulatory reporting) take priority. For recurring requests, automate with self-service dashboards. Communicate your prioritization transparently, setting clear expectations on timelines. Batch similar requests to improve efficiency. If overloaded, escalate to your manager with data on your current workload and proposed prioritization, letting them make the business judgment call.

What is ETL and how does it differ from ELT?

Technical

Medium

Sample Answer

ETL (Extract, Transform, Load) extracts data from sources, transforms it in a staging area, then loads it into the destination warehouse. ELT (Extract, Load, Transform) loads raw data first, then transforms it using the warehouse's computing power. ELT is gaining popularity with modern cloud warehouses like Snowflake and BigQuery that have massive compute capability. ETL is better when you need to clean sensitive data before loading or when your warehouse has limited compute. In practice, most modern data stacks use ELT with tools like dbt for transformation, Fivetran or Airbyte for extraction, and a cloud warehouse for storage and compute.

What are your salary expectations?

Easy

Sample Answer

Research market rates on Glassdoor, Levels.fyi, and Payscale for your location and experience level. Data analysts in the US typically earn between $55K-$95K for entry-level and $80K-$130K for senior roles. Frame your response as: 'Based on my experience with SQL, Python, and Tableau, and the market rate for this role in this location, I am targeting a range of X to Y. However, I am flexible and interested in understanding the full compensation package including benefits and growth opportunities.' Let the employer name a number first when possible.

Why do you want to transition to / continue in data analytics?

Easy

Sample Answer

Focus on genuine motivation and alignment with the role. For example: 'I am drawn to data analytics because I enjoy the process of turning raw data into actionable insights that drive business decisions. In my previous role, I built a customer segmentation model that increased marketing ROI by 30%, and that experience of seeing data translate directly into business impact is what excites me. I am particularly interested in this role because your company works with large-scale datasets and I would have the opportunity to work on problems with meaningful impact.' Connect your skills and experience to what the company specifically needs.

You notice a sudden 30% drop in website traffic. How would you investigate?

Situational

Hard

Sample Answer

Systematically eliminate causes starting with the most likely: first, check if it is a tracking issue (analytics code deployed incorrectly, ad blockers). Then check for technical issues (site down, slow load times, SSL certificate expired). Next, review recent changes (deployments, URL structure changes causing broken links, robots.txt updates blocking crawlers). Check for external factors (Google algorithm update, seasonal trends, competitor campaign). Segment the drop by traffic source (organic, paid, direct, referral), device type, and geography to narrow the cause. Document your investigation timeline and findings. Always distinguish between real traffic drops and measurement errors before raising alarms.

What is data normalization and why is it important?

Technical

Medium

Sample Answer

Database normalization organizes data to reduce redundancy and improve integrity through a series of normal forms. First Normal Form (1NF) eliminates repeating groups. Second Normal Form (2NF) removes partial dependencies. Third Normal Form (3NF) removes transitive dependencies. Normalization reduces data anomalies (update, insert, delete anomalies) and saves storage. However, highly normalized databases require more joins, which can impact query performance. In analytics, denormalized tables (star schema, snowflake schema) are often preferred for faster queries. The right level of normalization depends on whether the use case prioritizes write integrity (OLTP) or read performance (OLAP).

Preparation Tips

Interview Preparation Tips

Practice SQL on platforms like LeetCode, HackerRank, or Mode Analytics — most data analyst interviews include live SQL coding.

Build a portfolio of data analysis projects on GitHub or a personal blog demonstrating your end-to-end analytical process.

Know your tools deeply: be prepared to demonstrate Excel pivot tables, VLOOKUP/INDEX-MATCH, and conditional formatting live.

Prepare a case study response framework: define the metric, segment the data, identify patterns, quantify impact, recommend actions.

Practice explaining technical concepts in simple language — data storytelling is as important as data analysis.

Review basic statistics concepts: mean vs median, standard deviation, p-values, confidence intervals, and A/B testing methodology.

Avoid These

Common Mistakes to Avoid

Presenting data without actionable insights or business context — the 'so what' factor is missing.

Writing inefficient SQL queries that would timeout on production-sized datasets — always consider performance.

Not asking clarifying questions during case study interviews, leading to misaligned analysis.

Overcomplicating visualizations when a simple bar chart or table would communicate the insight more effectively.

Forgetting to check data quality and assumptions before diving into analysis.

Not preparing portfolio examples that demonstrate the full analytics lifecycle from question to recommendation.

Related Roles

Explore Other Interview Guides

Preparing for multiple roles? Check out interview questions for related positions.

Data Scientist

Business Analyst

Financial Analyst

Product Manager

Interview Guides

Explore More Interview Questions

Browse all our interview question guides with detailed answers and preparation tips.

View All Interview Guides

Is Your Resume ATS-Ready?

Run a free ATS score check and get specific improvements in under 60 seconds.

Build Your Resume Check Resume Score