The official PASS Blog is where you’ll find the latest blog posts from PASS community members and the PASS Board. Contributors share their thoughts and discuss a wide variety of topics spanning PASS and the data community.

Azure Cognitive Services

In the first article of this two-part series, we briefly introduce Azure cognitive services Text Analytics API. Using the example of a team health survey, we walked through the steps of:

  • Creating Azure Cognitive services resource,
  • Loading the raw data into Power BI,
  • Creating and Invoking Custom Functions in Power BI, to extract key phrases and generate sentiment scores from raw text,
  • Saving the Key phrases and sentiment scores as new columns to the data table loaded in Power BI.

At the end of the first article, your Power BI Desktop Data Pane should have a table with 6 fields. The fields “Period”,” Manager”,” Team” and “Response” are from the raw data file. The fields “KeyPhrases” and “SentimentScore” are added and populated by the steps in we took in the first article.

Figure 1. Power BI Desktop Data Pane showing the table

In this second article, we will look at qualitative and quantitative analysis techniques for this data in Power BI. We will create a word cloud, and several statistical charts to help with analyzing this data, extract business value and use Power BI visualizations to narrate a meaningful story about this data.

  1. The Word Cloud

A word cloud is an image composed of words used in a particular text, where the size of each word indicates its frequency in that body of text. We will use the our new KeyPhrases field to generate a word cloud, because it only has the important words. This will help ensure that the word sizing in the resulting cloud isn't skewed by the frequent use of  words in a relatively small number of comments

If you don't already have the Word Cloud custom visual installed, install it by importing Custom Visual from the Store. In the Visualizations panel to the right of the workspace, ellipses (...) and choose Import From Store. Then search for "wordcloud" and click the Add button next the Word Cloud visual. Power BI installs the Word Cloud visual.

Figure 2. Import WordCloud Custom Visual from the store.

Then click the WordCloud icon in Visualizations panel, to make a new report appear in the workspace. Next, drag the KeyPhrases field from the Fields panel to the Category field in the Visualizations panel, which will make the wordcloud appear inside the report.

Figure 3. The basic word cloud.

As you can see, the basic word cloud can appear a bit busy and could benefit from some polishing.

Drag the Manager, Period and Team fields to Page Level filters, so that you can apply them to the visualization for further analysis of popular words by Manager, Period or Team.

Figure 4. Add Page Level filters.

Then switch to the Format page of the Visualizations panel. In the Stop Words category, turn on Default Stop Words to eliminate short, common words like "of" from the cloud. You can add a custom list of words here, separated by commas (,). Then turn off Rotate Text.

Figure 5. Stop Words and Rotate Text.

Then, in the general tab enter the value of 5 in the field Minimum number of repetitions to display.

This should make your word cloud look much cleaner.

Figure 6. The cleaned-up word cloud.

We will add a few more summary statistics to this Page.

  • Periods - Add the multi-row card visualization to the board, drag and drop Period the field and select Don’t summarize. Card will show the list of all available periods in the survey results.

Figure 7. Set up a multi-row card visualization for Period.

  • Count of Teams, Managers and Responses – Set up 3 new Card Visualizations.
    • Teams – Drag the Team field into fields set it to Count (Distinct)
    • Managers – Drag the Team field into fields set it to Count (Distinct)
    • Responses – Drag the Team field into fields set it to Count
  • Using a Text Box, add a Title at the Top of the Page.

At the end of this exercise, your Summary Page should look like the Figure below. The Page Level Filters of Period, Manager and Team are applicable to all the Visualizations on this Page, allowing you to perform analysis by any combination of these filters.

Figure 8. Summary Page.

  1. Average Sentiment Score by Team

A bar chart of Average Sentiment Score by Team will allow us to analyze how each team feels about the health of their team and compare that against other teams. This is a quick, simple and effective way to compare one team’s health with their peers.

  • Pull the stacked column chart visualization onto the page
  • Drag and drop the Team field on Axis
  • Drag and drop the SentimentScore field on Value and pick Average
  • Add Manager, Team & Period fields to Report level filters to allow for filtering by any combination of these fields

Figure 7. Bar Chart Set up.

Then show X axis title and add a chart title and your Bar Char is ready.

Figure 8. Bar Chart.

  1. Histogram

A Histogram is a representation of the distribution of numerical data. In our use case, the sentiment score bucket will be on the x axis, and the frequency (count of responses) belonging to that bucket will be on the y axis. There are 2 ways to plot a histogram in Power BI – either use the custom histogram visualization or use a regular bar chart by binning the data beforehand. We will use the regular bar chart method here.

To Bin the data, right click on SentimentScore and select New Group. On the Groups page, change Bin Type to “Number of Bins” and set Bin Count to “10”, and click OK.

Figure 9. Create Sentiment Score Bins for Histogram.

  • Pull the stacked column chart visualization onto the page
  • Drag and drop SentimentScore(bins) field into Axis
  • Drag and drop Responses field into Value and change it to Count
  • Show titles on X and Y axis
  • Show Grid lines
  • Add Manager, Team & Period fields to Report level filters to allow for filtering by any combination of these fields

Finally, add a title to the Histogram and its ready to use.

Figure 10. Set up for histogram.

Figure 11. Histogram.

The Histogram reveals an interesting insight and shows there are 2 groupings. A large number of responses are strongly positive. Smaller, but not insignificant number of responses are strongly negative. The neutral territory in the middle of the chart is pretty sparse.

  1. Box and Whiskers Plot

A Box and Whisker Plot (or Box Plot) is an efficient way of visually displaying data distribution through their quartiles. They take up less space and are very useful when comparing data distribution between groups. To build a box plot, Import the “Box and whiskers chart” custom visual from the marketplace and add it to a new page.

  • Since we would compare the distribution of Sentiment Score data between various teams, Drag the Team field into Category
  • Drag the Sentiment Score field into Values and select Average
  • Drag the Period field into Sampling

Figure 12. Box Plot setup

Figure 13. Box plot

A quick glance at the box plot reveals some interesting insights;

  • For team 3, the box is short, and the whiskers are short too. The distribution of their sentiment scores is narrow and grouped tightly around the average score of 0.7, which can be interpreted as most team members are agreeing about their team’s health.
  • For team 6, the box is taller, and the lower whisker is quite a bit longer. The distribution of their sentiment scores is wide and not as tightly grouped together around the average score of 0.59. It means several team members feel very negatively about their team’s health, compared to others on the same team. This could indicate a disconnect between the team members.

5.Details

Lastly, we would like to give our analysts/users the ease of reviewing individual responses, by applying a combination of filters like Sentiment Score Bin, Team, Manager and Period. Create a new Page and name it Details.

  • Table:
    • Add the table visualization to the new page, then drag and drop the fields “Period”, “Team”, “Manager”, “Sentiment Score” and “Response” onto to table
    • For the table, go into Totals and Turn Total Off
  • Slicers – We will add these visualizations to left side of the page (next to the table), which will server as filters for the Table.
    • Period – Add the slicer Visualization to the left side of the page, then drag and drop the Period field on it
    • Team – Add the slicer Visualization to the left side of the page, then drag and drop the Team field on it
    • Manager – Add the slicer Visualization to the left side of the page, then drag and drop the Manager field on it
    • SentimentScore(bin) – Add the slicer Visualization to the left side of the page, then drag and drop the SentimentScore(bin) field on it. For this slicer, changes its Type to List
  • Text Box – Finally, add a text box to at the top of the page, with a brief description.

Now, say you wanted to review the most positive responses for Team 1, during the Period of 2018-Q3. Firstly, select Team 1 and Period 2018-Q3 from their slicers. Then, in the SentimentScore(bin) slicer, select 0.8 and 0.9 from the list (which will only show responses with scores 0.8 or higher)

Figure 14. Details page with Table and slicers.

The visualizations we have created so far, help us analyze the team health data;

  • The word cloud helps us identify topics/popular themes and filter them by Period, Team, Manager
  • The Bar Chart allows us to compare the average sentiment scores across them – identify which teams have the highest/lowest average scores and if they changed over time
  • The Histogram lets us visualize the distribution of Responses across the range of sentiment scores, allowing us to identify clusters/groupings in the positive, neutral or negative ranges. The filters on Period, Team and Manager allow us to narrow down our analysis to the area of interest
  • The Box plot lets your quickly compare how the distribution of scores varies across the teams. It lets us interpret if the team members are tightly aligned with each other or not
  • The table, with slicers lets us review the actual response text, based on the selection of Sentiment Score Bin, Team, Period (and manager)

This data enables us to identify which teams are doing great, and which ones may need some help to improve their team’s health.

Sanil Mhatre
About the author

Sanil Mhatre is a Senior Data Engineer, currently focused on delivering analytical insights for a technology solutions and services company in St. Louis. He has a master's degree in information systems and enjoys working with various data processing technologies, analytics tools, and visualization platforms.

Sanil is a budding data scientist, an active member of PASS, and a frequent speaker at several local and regional technical events. In his spare time, he volunteers with numerous STEM mentorship programs, and blogs to keep up with developments in the fields of data science, Machine Learning, and AI.

Please login or register to post comments.

Back to Top
cage-aids
cage-aids
cage-aids
cage-aids