This GitHub Repo intends to give an example how to use Microsoft Fabric for the end to end scenario of getting insights into the performance of the sample data set Call Center Data from Kaggle
Disclaimer: In order to replicate this analysis, you need to sign up for Microsoft Fabric
Microsoft Fabric Blog Announcement
Microsoft Fabric was announced in Public Preview at Microsoft's Build conference in May 2023. This SaaS product enables end-to-end analytics scenarios in one product. From data integration, data pre-processing and engineering to data science, real-time analysis and business intelligence, Fabric enables users with different backgrounds to collaborate in a no-/low-code and pro developer environment. By "switching" between different workloads within Fabric, each user can perform a specific task to contribute to the overall holistic analysis use case. For example for data integration and ETL, the Data Factory experience in Fabric comes into play, whereas for transforming data in a Spark Notebook environment, a user would switch to the Data Engineering page. OneLake is the foundation for all data being used and artifacts created in Fabric as it serves as the storage layer. All data is stored in the open source Delta Parquet format and all the engines (TSQL, Spark, Analysis Services, KQL) were being revamped to natively process this file format. So no data duplication or transformation is necessary when different teams want to work on the same data with different skills.
We want to look closer into such an end-to-end scenario and understand, how several users with different skillsets get engaged. Say suppose an organization is selling B2C products and offers customer service support for whatever inquiry, support or complaint the end user might face and want to get into touch with the company. There are different ways how the customer can get in contact with the company's customer service team: via social media, email, chat or phone calls, so the company gathers a lot of data from various sources in different file formats. The goal for the company is to improve customer service performance and increase therefore customer satisfaction. The first important step for analyzing the support performance is to consolidate all the data that can be gathered across the interactions with the customer in one place. This one place in our example is OneLake in Microsoft Fabric, which is an optimized storage for big data analytics workloads, supporting structured, semi-/and un-structured file formats.
To simplify, this example will only take into consideration one csv file, the sample data set Call Center Data from Kaggle
- A data engineer or someone from the enterprise analytics team would integrate data from various sources into one place, landing the data in a bronze/raw zone in a Lakehouse by leveraging the data integration features of Microsoft Fabric Data Factory (following the Medallion architecture for Data Analysis)
- The data engineer would continue working in the Lakehouse to create a Spark Notebook and start performing some Exploratory Data Analysis (EDA) on the data to better understand the content and if there is the need to clean it or handle missing data (in the example the csv file). In this step, data pre-processing and saving it in a silver/curated layer inside of the Lakehouse could be a common step.
- Now the business analyst would like to visually explore the data and build a PowerBI Report. For this, a quick look at the file stored in the silver layer reveals, that some further data transformation is necessary to create for example new columns or measures that might be helpful in the report. Switching to the Data Factory workload again and leveraging the Dataflows gen2 feature for no-/low-code development supports preparing the data for BI and saving it into the gold/curated layer of the Lakehouse.
- Finally, with a cleaned and processed dataset, the business analyst or also business user from any company domain can start creating a PowerBI report by using the PowerBI workload inside of Fabric.