KickstarterProjectDetails

Visualisation Project

KickstarterProjectDetails

This project has been made during my study period abroad and concerning the rules of making a good visualisation. It is divided into three section, all of them regard a simple analysis that comes up whatching a visualisation made over a Kickstart dataset.

More secifically, the dataset that could be taken here contains information and details about the Kickstarter Projects undertaken from 2009 to 2017. It is very detailed and contains information about the money goal and those pledged, the dates of start and deadllines as well.

All the visualisations have been made using the VegaLite framework.

Dataset Description

The dataset used, from the two available on the Kaggle Repository, was the “ks-project-201801.csv” because, as the title suggests and as well as data exploration comes up, it concerns data from 2009 to 2017. The other one available, instead, contains data from 2009 to 2016. Furthermore, this choice has also been made due to the presence of the columns “usd real” and “usd goal real” which meaning is explained below. The dataset contains information and details about the KickProjects. Being more specific, each line is referred to a particular project and for each one of them there are ainformation about the project referred. Following a brief explanation of the attributes used for the analysis:

To avoid the slowness of page loading, as consequence of the weight of the whole dataset, it has been filterea Jupyter Python Notebook to delete the rows and the columns that have not been used in the specific visualisation. The data-sets of filtered data and the code used to obtain them have been uploaded here.

Visualisation 1

Question

Considering the successful projects, which is the Main Category with the greatest number of them? Did it also have the greatest number of Backers and of Money Pledged?

Description

Considering the only Successful projects, these three bar charts show respectively, for each Main Category, the number of projects, the number of backers and the total amount of money pledged in the USD currency. Each row represents the nominal attribute Main Category and its length represents the related quantitative attribute (i.e. the Number of Successful project for the bar chart on the left, the Number of Backers for the bar chart on the center and the Money Pledged in USD currency for the bar chart on the right). Each bar chart is also sorted in relation to its own quantitative attribute. Hovering over a bar will also reveal the exact value of the quantitative attribute for the specific category.

Insight

Against the expectation involving the “Technology” category as a winner, due to the nature of Kickstarter as an online platform, the category with the highest number successful projects is Music. This may means that, although Kickstarter is an online platform (i.e. strongly related to the technology field) where people try to finance their projects, other different kind of projects have been more appreciated. However, analyzing the number of supporters, music is not in first place. From the combination of this result with the result shown by the Money Pledged (USD) bar chart, where music is in 5th place, it can be assumed that music is a field in which projects generally have a lower goal then the other categories projects.(i.e. they do nots need a huge amount of money to be successful).

Design considerations

The horizontal bar charts have been chosen because it is a very effective way for length comparing. This makes very easy the comparisons between categories. Adding a bit of redundancy and according to the Colour Pre-Attentive Property, in particular with the fact that differences in hue are perceived in the pre-attention phase, it has been chosen the red colour for the Main Category with the greatest number of successful projects. The Chart starts at 0, as all bar charts should be, for better understanding the difference between bars and avoiding misunderstandings. The alternative considered has been a Grouped Bar Chart with a group for each Main Category. The positive aspect of this alternative is the compactness of the visualisation in which all the information could be encoded in the same chart. Nevertheless, the visualisation could be very difficult to decode because of the differences in the scale range of the quantitative attributes. This last problem could be overcome using normalisation. Also in this case the visualization remains difficult to decode and in particular it is difficult to make comparisons between the categories. Horizontal bar charts have also been considered, but they were been discarded to avoid the use of angled texts that are hard to read.

Data filtering and transformation

The dataset for this visualisation was filtered before using the python code uploaded on the Github repository. More precisely, only the columns “ID”, “Main Category”, “Backers” and “USD Pledged Real” of the successful projects are present in this filtered data version because they are the only used. Obviously, the same filter can be applied directly inside the Vega-Lite code, but it has been used the python code to make the page loading much faster.

Visualisation 2

Question

Considering the successful and failed projects, what is the relationship between the Goal (USD) and the Pledged (USD)?

Description

This scatterplot shows, considering the main category Technology, the relationship between the goal, measured in the USD currency, which represents the objective of a specific project and the pledged, measured in the USD currency as well, which represents the money that have been sent by the people for supporting that project. Only the successful and failed project are considered in this analysis. Hovering over a point will reveal some information about that specific project like name, category, goal and pledged. The colours used are blue to represent the successful projects and orange to represent the failed projects.

Insight

The first aspect that comes up from this graph is that the projects with a high goal were failed, receiving just few moneys or nothing. Furthermore, the projects that have success, generally have received much more money than the prefixed goal. This is the reason why the points are distributed along the axis and not on the diagonal as it would have happened if the pledged money had beenmore or less equal than a target money. Considering a generic project, this may suggest that either the project has lot of success or it fails miserably. There are also some interesting outliers highlighted with the orange triangle. They regard the most four successful projects that have been rewarded with more than 4,000,000 dollars:

  1. "Pono Music - Where Your Soul Rediscovers with" 6,225,354.98 dollars;
  2. "Bring Reading Rainbow Back for Every Child, Everywhere!"" With 5408916.95 dollars;
  3. "ZeTime: World's first smartwatch with hands over touchscreen" with 5333792.84 dollars;
  4. "Pimax: The World's First 8K VR Headset" with 4236618.49 dollars.

The analysis of these four projects enforces the previous consideration where there are successful projects that have earned much more than thegoal aimed.

Design considerations

It has been used the scatterplot because it is the best way to show the relationship between two quantitative variables (i.e. the pledged money and the goal money). The chosen colours have been blue for failed projects, and orange for successful projects according to the Colour Blind 10 palette, to make everyone able to decode the graph.
For this kind of analysis, the alternative considered was a binned heatmap that it is a very goodway to show the data distribution avoiding the overlap of points. However, as consequence of the fact that colours in the heatmap are used to encode density and it couldn’t be used to distinguish the successfulprojects from the failed projects, the scatterplot was preferred. Furthermore, using the scatterplot make you able to use the tooltip for seeing additional details as which of thenew measure inserted “% of Goal Earned” that represents the percentage of the pledges over the prefixed goal.

Data filtering and transformation

The failed prejects, which goal was over 6,500,000 dollars, representing extreme outliers, have been cancelled using Vegal-Lite filters to avoid the graph to be too long along the x-axes and hard to read. Using the python code for filtering data, only the following columns for the successful and failed projects have been maintained: “ID”,”name”,”category”, “state”, “usd_pledged_real”, “usd_goal_real” to avoid slowdowns during loading as consequence of the original dataset weight. Furthermore, still using python code, only thedata concerning the Technology Main Category have been used as consequence of the specific question about that Category. Obviously, the same filter can be applied directly inside theVega-Lite code, but it has been used the python code to make the page loading much faster.

Visualisation 3

Question

Regarding the Robot, 3D Printing and Software categories, how the number of projects launched vary across the years from 2014 to 2017? Is the pattern consistent between these three categories?

Description

This line chart aims to show how the number of projects launched in the main category Technology vary during the period from 2014 to 2017, considering the Robot, 3D Printing and Software sub-category. The tooltip is not used here because useless in this context

Insight

Using this line chart, it is possible to see the trend during the years from 2014 to 2017 regarding respectively the 3D Printing, the Robots and the Software categories. This graph shows that there was a great increase in the number of projects launched in the 2015 in each category. However, Software category has had a greater interesting then the other two, considering the number of projects launched in the whole period. Then, the pattern that comes up from this analysis is the general increase in the number of projects launched during 2015 followed by a slight decrease during the following years.

Design considerations

It has been used a line chart because it is the best way to spot and compare trends as which seen before. Colours have been also used to encode the different analysed categories. The colours choice is made according to the Colour Blind 10 palette to make everyone able to decode the graph. The alternative that has been analysed for this kind of analysis was the Grouped Bar Chart with a group for each year. The positive aspects of the grouped bar chart is that it makes easier the comparison between the three categories for each year but it makes too hard the identification of a trend. The reason why it has been chosen the line chart is that it is more expressive to spot trends and it is the best way to answer to the specific question above.

Data filtering and transformation

As well as in the previous visualisations it has been used the python code for filtering data and build specific dataset for this visualisation. In particular the columns that differ from “ID”, “category” and “launched” for the categories 3D Printing, Robots , Software have been deleted. Obviously, the same filter can be applied directly inside the Vega-Lite code, but it has been used the python code to make the page loading much faster. The filters used inside the Vega-Lite code made possible the restriction over the years from 2014 to 2017.