# Visualizing a cloud of skills with Python

As a consultant and a freelance programmer, I need to be aware of which are the most demanded skills in my area of interest. What I did until now was scanning contract openings trying to remember how often certain skills appear in the specifications. But people are bad intuitive statisticians, a scientific fact proven by Daniel Kahneman. At least I am. It is hard for me to objectively aggregate statistic information in my head, and it’s kind of boring to keep notes and crunch the numbers manually. It’s much more fun to have a picture, which, as they say, is worth a thousand words. So I wanted to visualize the skills demands as a tag cloud.

## Input

First I dump all skills specifications into a file skills.txt. Skills are comma or slash separated. Leading and trailing dashes and spaces are insignificant.

Here is a sample:

## Script

A cloud image is generated with a small Python script using a package wordcloud, which has to be installed either with pip

or from the source.

Getting a cloud image with wordcloud is pretty straightforward. The generate_from_frequencies method expects a list of tuples, where the first member of a tuple is a term, and second is a term’s frequency. For example: [('java', 15), ('scala', 12), ('spring', 12)].

There are two ways to assign weights to the terms: by rank and by frequency. Parameter relative_scaling defines a ratio by which these two strategies are mixed. With relative_scaling=0, only ranks are considered. With relative_scaling=1, only frequencies are considered. In the example above, java has rank of 1, scala - of 2, spring - of 3, whereas frequencies are distributed as 15, 12, 12.

When frequency distribution is smooth, with many terms occupying same frequency, the cloud looks better if the frequency-based strategy gets more influence, so that differences in font size would not be huge. In this example relative_scaling is set to 0.8.

## The image

The generated image is saved to a file and displayed on screen. The lines displaying the image can be removed if interactivity is not needed, and the matplotlib package is not installed.

## More things to do

• Ignore comment lines starting with #. Keep dates, job details in comments.