Melissa Rosales
- Dec 16, 2019
- 8 min read

Methods

Updated: Oct 5, 2020

by Melissa Rosales, Gabriella Mrozowski, Vivi Bonomie and Maya Huter

For our group's data visualization project, we wanted to explore the different healthcare facilities around Boston and the services they offer. Through the Boston Government website, Melissa was able to find a data set collected by the Division of Health Care Facility Licensure and Certification. Since this data set is from a government website, Melissa thought it would be cleaned and organized. She quickly learned that no data set is ever clean.

The original data set did not just include Boston, but all of Massachusetts. The data also included multiple services. The group decided to narrow down the data to focus on three things: 1) clinics only located in Boston 2) alcohol/substance abuse services and, 3) mental health services. We also kept the clinic name, street, city/town, and zip code.

After cleaning the data and creating a pivot table, Melissa noticed that the city/town corresponding to the zipcode entries do not match. Areas that should be under Jamaica Plain or Hyde Park, are also entered as Boston. Boston may be small but it's important that we set certain group areas for the clinics that are in common areas. So Melissa decided to organize it by neighborhood instead. This was when the process became more tedious. Certain zip codes do not automatically correspond to a neighborhood. So Melissa found an open resource Google map that has accurate and clear delineates of the neighborhoods in the city of Boston. Melissa manually entered each address just to make sure the neighborhoods were right. Simultaneously, she checked if the addresses correspond to the correct clinic names.

She found a few inaccuracies. Some clinics didn’t exist anymore or the zip codes entered were wrong. Melissa also noticed some of these clinics only offered certain services like sports medicine and speech language, so she removed them. She also removed all dental clinics and Shields MRI X-ray Labs.

The whole point of this data research was to show how available are these services and clinics are in Boston, specifically in which neighborhoods too. The addresses and zip codes are given but that's not enough for the visualization we aim to make. We wanted to create a map in Tableau. Tableau can only register zip codes, countries, states, counties, latitudes, and longitudes. We had street addresses. Melissa found a free web program called Geocodio that geocodes and data matches US and Canadian addresses to exact longitudes and latitudes.

Since Melissa only needed the exact coordinates, she selected the longitudes and latitudes and pasted it to the cleaned data set. After interviewing Katherine Saunders, Manager of Data Analysis & Integrity Bureau of Health Care Safety & Quality Massachusetts Department of Public Health (DPH), Melissa discovered that the data set provided by DPH was incomplete because different departments have different responsibilities.

“DPH focuses on hospitals, nursing homes, and traditional clinic services. Department of Mental Health (DMH) really focuses on mental health services,” Saunders said. “Bureau of Substance Abuse Services (BSAS) lists sites for methadone clinics and treatment centers not licensed by DPH. A mental health service can be licensed by the DPH, DMH, and Bureau of Substance Abuse Services.”

So Melissa manually added more entries from BSAS, DMH, and a recent report from the Massachusetts Health Policy Commission. BSAS did not have an actual data set, instead they had a search engine. So Melissa had to manually search centers that fit the data set. DMH had several other documents, including a 2019 multicultural mental health resource which Gabi used to add to the data set. From around 70 entries, the data set was doubled to 145 entries.

Once the data set was complete, Melissa created a calculated field that allowed Tableau to count how many clinics did offer mental health or alcoholism/substance abuse services and specifically which neighborhoods. The results showed that there were more mental health services offered than substance abuse services.

Melissa also created double map layers to showcase where clinics or services offered are located in each neighborhood and the volume of clinics available. The results showed there were more clinics in areas like Dorchester. It’s through these findings we found our story about accessibility of services.

For Gabi’s part, she wanted to focus on the intersection of substance abuse and mental health clinics that overlapped, as well as their intersection with multi-lingual options. The data set she focused on was “DMH Multicultural Mental Health Resource Directory,” provided by the Massachusetts DMH. She specifically looked at the Metro Boston Area chapter of the directory.

The directory featured various neighborhoods, but it also included towns and cities around Boston. Gabi chose to disclude those non-Boston locations, and only include the neighborhoods that belonged to the city.

Gabi manually inputted the clinics onto the Master Excel sheet she had been organizing with Melissa. There was a focus on making sure the clinics weren’t already included in the Master Excel sheet, as it would skew results. Therefore, all new clinics were checked prior to being added. Along with adding the names, it was important to add information like the zip code, the neighborhood, the longitude and latitude, if the clinic treated mental health and/or substance abuse, as well as the languages they offered.

The zip codes were easy to add, as it only took a Google search to find the most recent address. But determining the neighborhoods was a more difficult task, as there was no outright easy way of checking this information point. Melissa shared with Gabi a specific Google Map document that allowed users to input the address, and it would specifically give the neighborhood. In order to find out the mental health and/or substance abuse treatment services, Gabi had to manually search up the clinics and look at the offered services and determine which categories they fall into.

For the language portion of the data set, the DMH directory Gabi determined that the most popular languages offered were Spanish and Chinese (Mandarin and Cantonese, as they were always offered together). She made a decision to create a column for those separately, in order to easily map it later in Tableau.

Once all the data was inputted into the Excel file, Gabi then uploaded the file into Tableau and began to tinker with the mapping options. Her first chart would focus on the clinics offering Spanish throughout Boston. Unfortunately, there was no other “Spanish Language” icon, and the choice was made to use the flag of Spain, which was not an ideal solution.

The next map focus on clinics that offered Chinese language services.

The next map focused on clinics offering other languages outside of Spanish and Chinese.

Gabi’s last map focused on the intersection of mental health and substance abuse services in clinics across Boston. The icon with the head and brain symbolized the mental health clinics. The icon with the medical cross symbolized health clinics in general. The colors represented if the clinics offered substance abuse treatment. Gabi got rid of any general clinics (which didn’t treat mental health) that didn’t treat substance abuse because those clinics were useless in the data set. The clinics had to treat both conditions, or either. The ideal icon that represented both mental health and substance abuse treatment is the purple head and brain icon.

For Maya’s part, the data set being used was looking at the median income of households in each Boston’s neighborhood to see if the average income of the neighborhood had anything to do with how many healthcare facilities are in the area. The data set came from bostonplans.org and it started out looking like this once it was downloaded to excel:

First, the dataset had to be cleaned. The dataset was mostly clean already, so all Maya had to do was take out “United States” and “Massachusetts” as data points because they were irrelevant to the data. The rest of the data is useful, because looking at the aggregate income of the whole neighborhood as well as the per-capita income are both helpful in determining the wealth of a neighborhood. Once the data was clean and uploaded into tableau it looked like this:

After the data was clean, Maya imported the data into tableau, but there was a problem because there were no zip codes for the neighborhoods, so making a map would have been impossible. To solve this, we imported another data set from Lina that was of Boston neighborhood zip codes. From there, the data was combined by using the “add” tool on tableau:

Next, Maya created a visualization showing both the per capita and aggregate income of each neighborhood, with the Boston Area zip codes. She dragged in the per capita income and aggregate income tab, the community tab and the zipcodes tab. She used both the size and the color features to show the income of each Boston neighborhood. The size feature helps show the income of each neighborhood relative to the other neighborhoods, and the color helps differentiate between neighborhoods. The final product looked like this:

When creating the final visualization, Maya decided to narrow the data down to just the per capita incomes of each neighborhood. She then made a visualization to show this information. However, this ended up not working because the zip codes used for this visualization did not align with the zip codes that were used in Melissa’s visualization. In order to solve this, Maya created a new data set for Boston zip codes that matched the ones Melissa used, and also cleaned the data to make sure both Maya and Melissa’s data sets referenced the same neighborhoods with the same zip codes for each neighborhood. The new data set looked like this:

From there, Maya was able to redo her visualization using the same process but with new zip codes. The new visualization was then turned into a map to show the per-capita income of each neighborhood, which was then used to determine the relationship between the incomes of each neighborhood and the number of mental healthcare facilities in that neighborhood. The conclusion was that lower income neighborhoods actually had more facilities than those with higher incomes, likely due to the fact that those facilities are more needed in low income areas due to higher rates of substance abuse. The final visualization looked like this:

Vivi’s part of the project was to find a dataset that explained the amount of deaths related to opioid use in Massachusetts. Originally, she struggled to find a dataset that tracked opioid-related deaths in Boston specifically because most datasets tracked opioid deaths in Massachusetts by town/city by year, but not by region in Boston. Vivi was able to find, however, a dataset from the National Center for Health Statistics that outlined drug-related deaths in the US by State. This dataset is the one she settled on because it outlines specifically which drug type caused the death, meaning she could compare opioid-related deaths to other substances. This dataset also gives her the freedom to compare opioid-related death rates to other states.

This is what the dataset originally looked like. As you can see, it is overwhelmingly large and divided by state. This photo only depicts the first few columns which is data for the state of Arizona.

Initially, Vivi wanted to see the statistics for drug-related deaths in the US. This graph above is depicting the indicator (which drug caused the death) and the Sum from the Data Value. The graph did not initially look like this, Vivi had to omit the total number of deaths because it was a number so large, it took up the majority of the graph and made it harder to see the differences between the drug types. With that excluded from the graph, we are able to clearly see the differences in drug-related deaths.

This graph is the same as the one above, however, instead of data for the entire country, Vivi added State values and only kept Massachusetts data. This allows her to compare the data from Massachusetts to that of the entire country.

In this graph below, Vivi made a map to demonstrate which states have higher opioid-related deaths. She did this by omitting the other indicators so that the data reflected only opioid-related deaths by state. The larger the dot, the more deaths.

Based on this map Vivi was able to demonstrate that Massachusetts has a relatively high count of opioid-related deaths. In order to properly compare it to other states, she will make the statistics relative to state population.

Click here to download all our data sets and here for our data story.

Methods

Recent Posts