Data visualizations make complex information easy to understand and more engaging. They are also useful for identifying patterns in data and are more memorable than tables or plain text. These features make visualization a vital tool at different stages of the data analytics lifecycle, including when conducting exploratory analysis and communicating insights.
If you plan to work with data, you must also know how to work with visualizations. We have compiled 11 data visualization projects and datasets to help you get started. These will enhance your practice and understanding of how to create and use visualizations to tell more compelling stories from data.
If you’re new to data analytics, you’ll need to first understand the basics of visualizations, i.e., when to use them, the different types, and how to identify trends visually. Working on data visualization projects with set outcomes is perfect for this.
This project uses a dataset detailing every winner of a Nobel Prize from 1901 to 2016. Based on the available features, you can create visualizations to answer questions such as:
This visualization project is a good starting point for beginners. You can learn how to use tools such as Matplotlib and Seaborn to create visualizations in Python. You can also use bar charts, line charts, scatter plots, and pie charts to visualize different aspects of the data and answer specific questions. This post shows one approach to handling this visualization project.
The COVID-19 pandemic is one of the most defining and most closely monitored events in recent years. A large amount of data was gathered detailing the numbers of casualties at national and local levels, infection rates in different countries, etc. This can be a detailed project, but as a beginner, you can start by creating visualizations to show information such as:
This is a project that requires careful data collection, cleaning, and validation. For visualization, you can create individual charts or a dashboard, as shown in this example. The different types of information that can be extracted from the data make this a good project for learning how to use Tableau and similar platforms. However, you can also use Python and Excel.
You can find COVID-19 datasets from the World Health Organization. A more challenging project would be combining this data with data on vaccination rates to answer additional questions, such as the effect of vaccinations on infection rates in different areas.
An increase in population is one of the development indicators used by the World Bank. When combined with information such as GDP, population data can tell an interesting story about the development of a country. In this project, the goal is to visualize the growth of populations in different countries from 1960 to 2023 and to compare population growth and growth rates with the GDP.
You can download the population and GDP data for this project from the World Bank website. Consider which visualizations would best showcase the information and create a dashboard highlighting key information, including countries with the fastest-growing GDPs and populations in different decades. Having at least one visualization in the form of a map would be ideal.
Although visualizations may be the final result, they are also useful in the early stages of data science and machine learning. In this project, your goal is to analyze the data visually and derive relevant insights about it before it’s used to train a machine learning model. Working on the machine learning portion is optional.
You can download the dataset from Kaggle. Your goal is to create visualizations to:
For a step-by-step tutorial, you can check out this post.
The goal of this project is to visualize the state of biodiversity in different national parks in the US. The dataset used here contains latitudes and longitudes, so you can create visualizations indicating the locations of different parks on a map of the US. Platforms such as Tableau make this easy.
The dataset also contains information such as the type of animals, whether or not a species is native or otherwise, and conservation status. This means you can create visualizations to show:
For datasets as rich as this, a lot of information can be visualized depending on your interests and end goals. Check out some examples in this post and see what other interesting visualizations you can come up with.
Once you have a solid understanding of data visualization principles, working with datasets and defining your own projects is more manageable. The nature of the data often dictates what is achievable, but this also helps you define relevant goals when using datasets.
This dataset contains information regarding electric vehicle and plug-in hybrid ownership in the State of Washington. Some of its features include the city of registration, the make and model, whether it’s electric or hybrid, and range.
These features make it possible to create visualizations to answer questions such as which models are more popular, which manufacturers offer vehicles with better mileage, and the density of electric vehicles in different counties in the state.
The rising cost of groceries has been a major concern in the US recently. The USDA maintains a dataset of fruit and vegetable prices that allows you to track the prices of common items, including apples, bananas, broccoli, and cucumbers.
Using this dataset requires some data wrangling skills because the data is in different files. You can create visualizations to track the prices of individual items over the years or compare how fast the prices of certain items are climbing.
Deaths from drug overdose are a major issue, and the CDC regularly compiles data on these events. This extensive dataset contains data from April 2015 to the present and has features such as state, number of deaths, types of drugs, month of death, and number of deaths caused by an overdose.
If you’re interested in working as a data professional in the medical field, this type of dataset can be used to create domain-relevant visualizations, e.g., to show which states have the highest percentages of overdose-related deaths, which drugs are associated with the highest number of overdoses in different states, and trends in overdose deaths in parts of the country.
This is a dataset on the sale of real estate in Connecticut from 2001 to 2018. The dataset has features such as the type of property (resident/commercial), the sale price, town, latitude and longitude, the assessed property value, etc.
Visualizations from this dataset can be used to map out the most affordable areas to buy commercial real estate in the state, pricing trends in different towns, areas with the most affordable single-family homes, etc.
Air pollution is a significant concern in large cities. This dataset contains air quality data taken from different parts of New York City. It contains features such as the period when measurements were taken, the types of pollutants recorded, and place names.
This data can be used to trace the rise or fall of a pollutant in a certain part of the city over the years, identify areas with the poorest air quality, specify the times of the year with the worst air quality, etc.
If you plan to work in transportation, logistics, or economics, this dataset should be interesting. It contains data on the number of shipping containers handled by ports in countries worldwide from 2011 to 2022.
Visualizations can be used to see if ports in a certain country have gotten busier or slowed down, which ports are witnessing the highest growth in throughput, which regions or continents are handling more containers, etc.
If you plan to work in data science, machine learning, and adjacent fields, you’ll need to create visualizations regularly to understand the data you’re working with and communicate insights to stakeholders. Working on data visualization projects at an early stage will help you to:
Data visualization is an integral part of working with data. It makes it easy to see trends and show your insights to others. Working with visualizations is a skill because you’ll need to know when and how to use them, how to interpret them, and how to use the many tools used to create them. Working on data visualization projects, either guided or from an original dataset, is a great way to get comfortable creating visualizations. It’s also a good way to build your portfolio.
Interview Query compiles these and many more resources to help you on your way to becoming a data professional. Whether you need ideas for machine learning, business analytics, or data science, we have lists of projects and datasets to help you in your quest. We also offer a wide range of useful resources when interviewing for roles in data science and similar fields, including interview questions and answers based on the business, company interview guides, a mock interview tool, coaching services from industry pros, and more. You can also check out related topics on our blog.
As you prepare for your career in data, we hope this list of data visualization datasets and projects will help you build the skills you need to succeed.