How Analysts leverage Data tools for Business Growth
Introduction
Lumilinks empowers businesses to create value with data. As a team of Data specialists, we have many tools to can help you achieve this, but today, we’re going to discuss some of the core tools used by a modern data analyst: Snowflake, SQL, and Power BI. These are the tool we use to automate visualisations that deliver valuable insights and enable strategic decision making.
In this article, I’ll provide an overview of the general data analysis workflow, demonstrating how to leverage these tools to extract, transform, and visualise data. We’ll explore the key steps involved, including data extraction from Snowflake using SQL, data cleaning and transformation, loading data into Power BI, creating data models and relationships, building visualisations, and analysing and interpreting results.
So, by the end of this article, you’ll have a solid understanding of how data analysts leverage key tools to bring about insights from data.
Step 1: Extracting Your Data from Snowflake
Think of Snowflake as your data warehouse, a vast repository of information.
This will generally be constructed by a Data Engineer who transforms raw data into tangible data table outputs.
One way we can start is to import the tables directly into Power BI. While this method has its merits, instead, let’s leverage SQL to extract specific data we want to work with.
It is like asking for a specific slice of a giant pizza. Here’s a basic example (query):
SELECT *
FROM pizza_data
WHERE has_pepperoni = 1;
This query selects all columns and all rows (the ‘*’ represents all) from a table named "pizza_database" where a certain condition is met, that the pizza has pepperoni! This condition could be things like filtering order dates or looking for specific product groups. You can customise the query to extract exactly what you need. For example, to select specific columns and filter based on multiple conditions.
Now Imagine you're at a pizza buffet, and the menu offers every possible type of pizza (this is your database with all the columns in the table named "pizza_data"). If you say, "Give me everything," the server hands you a plate with every detail about every kind of pizza slice available (this is your SELECT * query, pulling in all columns). But maybe you're only interested in slices with certain toppings—like pepperoni or mushroom—and you also want pizza that was freshly made within the last hour (this is where your conditions come in). You can customize your order by saying, "I’d like a slice with pepperoni, mushroom, or both, but only if it’s fresh." This query selects one column, pizza_name, which is what we are looking for.
In query terms, this is like selecting specific columns and filtering based on multiple conditions—just like asking for exactly the type of pizza you want from a whole buffet!
SELECT pizza_name
FROM pizza_data
WHERE (has_pepperoni = 1
OR has_mushroom = 1)
AND pizza_cooked_timestamp >=
CURRENT_TIMESTAMP - INTERVAL '10 MINUTES';
Diving Deeper
Beyond these basic queries, SQL offers a vast array of advanced techniques to extract data effectively. For example, you can use subqueries to embed a query within another query, allowing for more complex filtering and aggregation. You can also combine data from multiple tables using several types of joins (INNER JOIN, LEFT JOIN, RIGHT JOIN, FULL OUTER JOIN) based on common columns. Here’s an example of an inner join to merge pizza_data and delivery_data. The objective of this query is to return the time it has taken from the pizza having been ordered to when it was delivered, using the DATEDIFF function.
SELECT p.pizza_id, p.pizza_name,
DATEDIFF('MINUTE',
p.pizza_order_timestamp,
d.pizza_delivered _timestamp) as Minutes_to_delivery
FROM pizza_data as p
INNER JOIN delivery_data as d
ON p.pizza_id = d.pizza_id;
Additionally, window functions allow you to perform calculations over a set of rows within a result set, such as ranking, partitioning, or calculating running totals. This is useful for analysing trends and patterns within your data. Finally, common table expressions (CTEs) can be used to define temporary result sets that can be referenced multiple times within a query, improving readability and performance.
Step 2: Cleaning and Transforming Your Data
Now I’d like you to imagine you're preparing a pizza party, and all your ingredients (your raw data) arrive in bulk.
However, the cheese is scattered everywhere, some of the pepperoni slices are stuck together, and there are a few odd items in the mix, like pineapple (which not everyone wants!).
Before you can assemble the perfect pizza, you need to clean things up—separate the pepperoni slices, toss out any unwanted ingredients, and make sure each topping is in its proper place (removing duplicates and handling missing values).
You might even chop the veggies into smaller pieces to fit the pizza better (transforming the data into a more suitable format). Once everything is organized and prepared, you can create the perfect pizza (or dataset) ready for baking (analysis)!
Like your pizza party raw data is often messy and needs cleaning before you can use it effectively. This might involve removing duplicates, handling missing values, or transforming data into a more suitable format. Again, SQL is your go-to tool. Here is a simple example of removing duplicates:
SELECT DISTINCT pizza_name
FROM pizza_data;
For more complex transformations, you might use functions like CASE WHEN, COALESCE, or CONCAT. For instance, to replace null values with a default value:
SELECT COALESCE(pizza_name, 'Margherita') AS cleaned_pizza_name
FROM pizza_data;
Diving Deeper
Beyond these basic cleaning and transformation techniques, consider the following:
Handling Missing Values: Use functions like ISNULL or COALESCE to replace missing values with a default value or interpolate missing data. You might also consider removing rows with too many missing values or imputing missing values based on statistical methods.
Outlier Detection: Identify and handle outliers using statistical methods or visualisation techniques. Outliers can skew your analysis and distort results.
Data Normalisation: Ensure data consistency and avoid redundancy by normalising your data structure. This involves breaking down large tables into smaller, normalized tables to reduce data duplication and improve data integrity.
Data Enrichment: Add additional information to your data by joining it with external datasets. For example, you might enrich customer data with demographic information or product data with pricing history.
Step 3: Loading Your Data into Power BI
Now that your data is clean and ready to go, it’s time to bring it into Power BI for visualisation.
Power BI offers a user-friendly interface for connecting to data sources like Snowflake and importing your data.
This connection serves as a bridge between how the data will sit within Power BI and the Data Warehouse. This bridge will dictate how the data is accessed, refreshed, and analysed.
Sourcing data in to Power BI allows for data refreshes in Snowflake and scheduled refreshes in Power BI to create an automated solution. Let’s see how powerful the data is below, where Lumilinks, working with Acme products Demo platform, have been able to identify street locations and integrate Google maps street views, all whilst assessing the profitability of the location.
Diving Deeper
When loading data into Power BI, consider the following options:
DirectQuery: Connect directly to Snowflake and query data on the fly, ensuring data freshness but potentially impacting performance. This is suitable for when you need nearer real-time data.
Import: Load data into Power BI for offline analysis, providing faster query times but requiring periodic refreshes. This is suitable for smaller datasets or when you don't need real-time data.
Composite Models: Combine DirectQuery and Import modes to optimize performance for different use cases. This can be useful for large datasets with frequently accessed subsets.
Step 4: Creating Data Models and Relationships
Once your data is in Power BI, you will often need to create a data model to define relationships between different tables. This helps Power BI understand how the data is connected. For instance, if you have a "Customers" table and an "Orders" table, you might create a relationship based on a common column like "Customer ID". These relationships are crucial for bringing in more detail and granularity into visuals.
Diving Deeper
When creating data models, consider the following:
Star Schema: A common data modelling technique where a central fact table is surrounded by dimension tables. This is useful for analysing transactional data.
Role-Playing Dimensions: Using the same dimension table in multiple relationships to analyse data from different perspectives. For example, you might use a "Date" dimension table to analyse sales data by month, quarter, or year.
Step 5: Creating Visualisations
This is the fun part!
Power BI offers a wide range of visualisations, ranging from simple to complex. You can even import custom visuals from the AppSource library. Use these to explore your data and uncover insights. For example, you might create a line chart to visualise sales trends over time or a scatter plot to identify correlations between variables.
Diving Deeper
When creating visualisations, consider the following:
Choosing the Right Visualisation: Select visualisations that best convey your message and align with your data type. For example, use bar charts for categorical data, line charts for time series data, and scatter plots for numerical data. Reference this great article from SQLBI to understand choosing the right visualisation more.
Formatting and Styling: Customise the appearance of your visualisations to enhance readability and aesthetics. Use appropriate colours, fonts, and labels to make your visuals clear and engaging.
Interactive Dashboards: Create interactive dashboards with filters, slicers, and drill-down capabilities to explore data dynamically. This allows users to interact with your visualisations and uncover insights on their own.
Step 6: Analysing and Interpreting Results
The last step involves looking closely at your visualisations and understanding what they tell you. This might mean finding patterns, trends, or unexpected things that stand out.
Explore! Exploratory data analysis is an incredibly effective method to unearth more insights from a dataset. This exploration may come in the form of drilling into certain dimensions or fields, such as product group for example, and using different fields to slice that data to potentially show anything interesting that may be worth further exploration.
To go deeper, you can use statistical tools to measure relationships between various parts of your data. For example, you could find out if there is a connection between sales and customer satisfaction.
If your data is about things that change over time, like sales or website traffic, you can use time series analysis to predict future trends or identify seasonal patterns. These can be accessed on certain charts within Power BI from the analytics pane.
Once you have found these insights, it is important to tell a story with your data. Use plain language and visuals to explain what you have discovered. The goal is to turn your findings into actions that can help your business.
Remember, data analysis is an ongoing process. Keep watching your data and updating your insights as latest information becomes available.
Diving Deeper
When analysing and interpreting results, consider the following:
Data Storytelling: Effectively communicate your findings through compelling narratives and visualisations. Use storytelling techniques to engage your audience and make your insights memorable. Here’s a great resource for data storytelling: Microsoft ‘What is data storytelling’.
Prioritise insights: Identify the most important insights that will have the greatest impact on your business objectives.
Develop recommendations: Based on your insights, develop specific recommendations for actions that can be taken.
Continuously refine your analysis: Depending on the scope of a project, as new data becomes available, refine your analysis and update your recommendations accordingly.
Challenges and Solutions
While this workflow might seem straightforward, you will encounter challenges along the way. Here are a few common ones and potential solutions:
Data Quality Issues: Ensure your data is clean and accurate by implementing data quality checks and cleansing techniques.
Performance Optimisation: For large datasets, optimise your queries and consider using techniques like caching and indexing to improve performance.
Data Governance and Security: Implement proper data governance policies and security measures to protect sensitive information.
Tool Integration: If you are using other tools in your workflow, ensure they integrate seamlessly with Snowflake, SQL, and Power BI.
Skill Gap: If you are new to these tools, invest in learning and practice to develop the necessary skills.
Conclusion
We’ve just explored a general data analysis workflow, using Snowflake, SQL, and Power BI as our trusted tools, and hopefully you’ll have learned how to extract data from Snowflake using SQL, clean and transform data, load data into Power BI, create data models and relationships, build visualisations, and analyse and interpret results. You’re basically a pro.
Lumilinks can enhance your understanding of data analysis, from design, build, run to enable. Lumi-Analyse captures your data visualisation requirements and an assessment of your data readiness, with a report, a score and recommendations. The next step is to launch your reporting capabilities and business critical insights and visualisations with Lumi-Dash, to unlock the real value in your data.
Remember - data analysis is an ongoing process. As your data evolves and your business needs change, it is important to continuously refine your analysis and adapt your approach. With the right tools and techniques, you can harness the power of data to achieve your goals and make some game-changing pizza.