
Visualizing Categorical Data
Base on DataCamp
Introduction to categorical plots using seaborn
Categorical plot
import seaborn as sns
import matplotlib.pyplot as plt
sns.catplot(...)
plt.show()
The catplot function
Parameters
- x: name of the variable in- data.
- y: name of the variable in- data.
- data: a DataFrame
- kind: type of plot to create - one of ("strip", "swarm", "box", "violin", "boxen", "point", "bar", "count")
Creating a box plot
# Set the font size to 1.25
sns.set(font_scale=1.25)
# Set the background to "darkgrid"
sns.set_style("darkgrid")
# Create a boxplot
sns.catplot(x="Traveler type", y="Helpful votes", data=reviews, kind="box")
plt.show()
Seaborn barplot
The hue parameter
- hue:- name of a variable in - data
- used to split the data into a second category 
- used to color the graphic 
 
    sns.catplot(
        x="Traveler type", 
        y="Score", 
        data="reviews", 
        kind="bar",
        hue="Tennis court" # <--- New parameter
    )
Creating a bar plot
# Print the frequency counts of "Period of stay"
print(reviews["Period of stay"].value_counts())
sns.set(font_scale=1.4)
sns.set_style("whitegrid")
# Create a bar plot of "Helpful votes" by "Period of stay"
sns.catplot(x="Period of stay", y="Helpful votes", data=reviews,kind="bar")
plt.show()
Ordering categories
# Set style
sns.set(font_scale=.9)
sns.set_style("whitegrid")
# Print the frequency counts for "User continent"
print(reviews["User continent"].value_counts())
# Convert "User continent" to a categorical variable
reviews["User continent"] = reviews["User continent"].astype("category")
# Reorder "User continent" using continent_categories and rerun the graphic
continent_categories = list(reviews["User continent"].value_counts().index)
reviews["User continent"] = reviews["User continent"].cat.reorder_categories(new_categories=continent_categories)
sns.catplot(x="User continent", y="Score", data=reviews, kind="bar")
plt.show()
Bar plot using hue
#1
# Add a second category to split the data on: "Free internet"
sns.set(font_scale=2)
sns.set_style("darkgrid")
sns.catplot(x="Casino", y="Score", data=reviews, kind="bar", hue="Free internet")
plt.show()
#2
# Switch the x and hue categories
sns.set(font_scale=2)
sns.set_style("darkgrid")
sns.catplot(x="Free internet", y="Score", data=reviews, kind="bar", hue="Casino")
plt.show()
#3
# Update x to be "User continent"
sns.set(font_scale=2)
sns.set_style("darkgrid")
sns.catplot(x="User continent", y="Score", data=reviews, kind="bar", hue="Casino")
plt.show()
#4
# Lower the font size so that all text fits on the screen.
sns.set(font_scale=1.0)
sns.set_style("darkgrid")
sns.catplot(x="User continent", y="Score", data=reviews, kind="bar", hue="Casino")
plt.show()
Point and count plots
Point plot
Point plot help users focus on the different values across the category by adding a connecting line across the points, while the y-axis is changed to better focus on the points.
Creating a point plot
# Create a point plot with catplot using "Hotel stars" and "Nr. reviews"
sns.catplot(
  # Split the data across Hotel stars and summarize Nr. reviews
  x='Hotel stars',
  y="Nr. reviews",
  data=reviews,
  # Specify a point plot
  kind="point",
  hue="Pool",
  # Make sure the lines and points don't overlap
  dodge=True
)
plt.show()
Creating a count plot
sns.set(font_scale=1.4)
sns.set_style("darkgrid")
# Create a catplot that will count the frequency of "Score" across "Traveler type"
sns.catplot(
  x="Score",
  data=reviews,
  kind="count",
  hue="Traveler type",
)
plt.show()
Additional catplot() options
Difficulties with categorical plots
Trying to visualize multiple categories can be difficult. Instead of creating six different plots, one for each continent, we can do better with following steps:
- Using the catplot() facetgrid - sns.catplot( x="Traveler type", kind="count", data=reviews, col="User continent", col_wrap=3, palette=sns.color_palette("Set1") )- Common colors: - "Set1",- "Set2",- "Tab10",- "Paired"
- Updating plots - Setup: save graphics as an object: - ax
- Plot title: - ax.fig.suptitle("Super Title")
- Axis labels: - ax.set_axis_labels("x-axis-label", "y-axis-label")
- Title height: - plt.subplots_adjust(top=.9)
 
One visualization per group
# Create a catplot for each "Period of stay" broken down by "Review weekday"
ax = sns.catplot(
  # Make sure Review weekday is along the x-axis
  x="Review weekday",
  # Specify Period of stay as the column to create individual graphics for
  col="Period of stay",
  # Specify that a count plot should be created
  kind="count",
  # Wrap the plots after every 2nd graphic.
  col_wrap=2,
  data=reviews
)
plt.show()
Updating categorical plots
# Adjust the color
ax = sns.catplot(
  x="Free internet", y="Score",
  hue="Traveler type", kind="bar",
  data=reviews,
  palette=sns.color_palette("Set2")
)
# Add a title
ax.fig.suptitle("Hotel Score by Traveler Type and Free Internet Access")
# Update the axis labels
ax.set_axis_labels("Free Internet", "Average Review Rating")
# Adjust the starting height of the graphic
plt.subplots_adjust(top=0.93)
plt.show()