10 November 2015
After settling on the New York City birth rate dataset, I did three things. First, I blogged about it! Second, I requested the “raw” birth data from the city*. And third, I started looking around and downloading the data that is available online. The latter was a long process because of the slowness of the website and the need to submit a separate query for each set of metrics I was interested in.
*As of 11/29, I am still waiting to hear back regarding my request.
My first option was to choose a year I was interested in from a range of 2000 to 2013. Then I had an option of selecting two additional metrics. For me this was “Mother’s age (9 Levels)” and “Mother’s Borough of Residence.” (Since I was interested in teen pregnancy rates, I chose the 9 Levels option, because the 4 Levels option did not include data on the 10-14 year old population.) After downloading the resulting data onto my computer, I had to go back to the opening page and start by entering my selection over again, just changing the year for the data. I downloaded data for three years from the available range: 2000, 2006, and 2013. I hoped that comparing birth count six years apart would enable me to see changes that are not incremental.
14 November 2015
I played around with other selections to see what stands out to me from the City’s great website and decided to download “Neighborhood Poverty” data.
17 November 2015
For the last two and a half hours, I have been trying to figure out how to: stop Excel from changing my data into date format; merge multiple Excel documents into one; who created a Twitter account under my email address and how to disassociate them; and last but not least, how to code “No Data” – leave the cell blank or type in zeros? After several YouTube videos and lots of clicking around the web, I stopped for the day with the project.
18 November 2015
After a good night of sleep, I sat down at my computer and started the process anew. After more clicking around and another YouTube video, I successfully merged the documents. Then, following Digital Fellow Hannah Aizenman’s workshop on Data Debugging, I proceeded to clean up my data and fix any bugs or incorrect data, deciding, for now, to leave the blanks blank. I also left out birth data for non-NYC residents. I used graph paper and crayons to play with how I could visualize four metrics on one plane. After calling up my partner to ask the how-tos, I created pivot tables. Charts, bar graphs, and, circles followed. Then it was time to go to the Digital Fellows’ office hours.
It was a busy day for the Fellows! Patrick Sweeney advocated for my research and visualizations to stay in Excel. Despite agreeing with him that indeed, I am learning a new tool for this project – Excel – I insisted on learning a visualization tool. Patrick gave me a great idea for creating a webpage on the CUNY Commons for presenting my data. I left with a peace of mind, a great idea, and knowledge of how to make it a reality.
21 November 2015
I downloaded a free 14-day trial of Tableau 9.1, uploaded my first spreadsheet, and started clicking around. After a while of clicking and little to show for it, I wondered if this was the tool for me. What would visualizing the data help me learn about it? Will there be anything of worth that I will notice? Is there anything else to it than poverty leads to higher teen pregnancy rates?
24 November 2015
On a crisp Tuesday afternoon, I met up with my colleagues Ashleigh Cassemere-Stanfield and Oksana Byeha for data project work-till-you-drop session. In four hours, we discussed all things DH worries as well as made good progress towards completing our projects. I checked out the DiRT Directory and went through all the visualization software options, eliminating the ones that were really not relevant. Then I checked the websites of the tools I selected to see what things would look like if I used them. I decided to use Weave (Web-based Analysis and Visualization Environment). The software had a neat timeline option that I could use to show the birth rate change over time. However, I had issues downloading the tool. Ashleigh helped me figure it out; however, my anti-virus software would not let me run the installation process because it was interpreting the .exe as malware. And just like that, I was back to Tableau, the runner up of my selection. Today, however, I was more determined and successful with Tableau.
27 November 2015
Here are my visualizations:
Births by age group, borough, and year.
From the first two visualizations I see that teen pregnancy rates went down in all boroughs over the past 14 years. Unlike my expectation, Brooklyn has the highest number of teen pregnancies, not Bronx.
Here are the births for 18-19 year olds, by the borough and year.
And total births for 2000, 2006, and 2013 for all age categories, organized by borough.
Here we see that Brooklyn has the highest total number of births. Perhaps it’s to be expected that Brooklyn also has the highest number of teen births? However, the runner up in total births is Queens, not the Bronx. This means that no, teen birth rate is not the same in all boroughs. Manhattan birthrate is comparable with birth rate of Bronx. With these visualizations, I see that each borough has a distinct birthrate for each age group.
Below is the Neighborhood Poverty data is Tableau.
And this is the total New York City births and neighborhood poverty level:
From these two visualizations we see that, for the most part, teen birth rates fluctuated together with the fluctuations in the total birthrate of NYC. However, teen pregnancies in neighborhoods where more than 30% of the population lives in poverty have kept decreasing while the births in all age groups in these categories went slightly up after 2006.
Here are some challenges I’ve encountered: Keeping color scheme and order of boroughs and graphs consistent. Most of the color schemes transferred automatically between the visualizations. However, sometimes they did not. And since I was expecting them to, I only noticed it when I was looking through them this morning, when all was done. Lessons learned: Take more screenshots throughout, keep things consistent, pay attention to detail.
And here are some “fun” visualizations of the data:
And lots of circles!