A popular feature in blogs and other article-based websites thesedays is a 'tag cloud' (or 'word cloud', depending on your outlook on life). The premise is simple enough: popular tags or words that appear within the website are displayed in a box somewhere on the website. Clicking on one of the words brings up, say, the ten most recent articles which are related to that particular word.
When creating this blog, I decided it was a feature I'd like, and after some faffing around with MySQL queries and a bit of PHP, I managed to come up with a reasonably good solution (if you haven't already spotted it, it's the 'popular topics' bit at the side of this page).
Although the version used in this blog is a little different, as it's heavily tied into the MVC framework I built the site upon, the tutorial below is based on the same basic ideas and should work reasonably well.
Step One
Create an array of the keywords you want to appear. You can store these in a database table (as this website does), or hardcode them into the tag cloud script:
// Let's use a hardcoded array as an Example
$Keywords_Array = array('apples', 'bananas', 'clementines', 'mangos');
Step Two
The next step is to search the database for each of these words. For readability, I put this into a separate function:
function get_word_occurrences($Word)
{
// Let's assume you have a MySQL DB Connection called $Conn
// already defined elsewhere in your script
global $Conn;
// Get the Length of the Word
$Word_Length = strlen($Word);
// Make the Word Lowercase as MySQL is Case Sensitive
$Word = str_to_lower($Word);
// Let's create the Query (replace fruit_txt with your column name
// and fruit_blog_tbl with your table name)
$Query = "SELECT ( LENGTH(`fruit_txt`) - LENGTH( REPLACE( LCASE(`fruit_txt`),
'$Word', '' ) ) ) / $String_Length AS `occurrences` FROM fruit_blog_tbl";
// Execute the Query
$Result = mysql_query($Query, $Conn);
// Create a Counter and set it to Zero
$Final_Count = 0;
// Loop through each Table Row and Count the Occurrences
while ( $Count = mysql_fetch_assoc($Result) )
{
$Final_Count += $Count['occurrences'];
}
// Return the Number of Occurrences for the Word
return $Final_Count;
}
What's happening in the Function
What the query in this function does is replace all of the instances of the keyword $Word with an empty string (i.e. it removes the keywords) in the database field specified. It then gets the length of the string with all of the keywords removed, and subtracts it from the original length of the string. Finally, it divides this by the number of letters in the keyword, thus giving us the number of times the word appeared in the string:
Example String ( length: 47 chars )
The cat sat in the hat after peeing on the mat.
Example String with 'at' Removed ( length: 39 chars )
The c s in the h after peeing on the m.
The Math
( Example String Length - Stripped String Length ) / Keyword Length = Number of Occurences
( 47 - 39 ) / 2 = 4
There are 4 instances of the word 'at' in the original string (count 'em and see)
Step Three
Now we know how often each of our keywords appears in the database table, we can start to build our tag cloud. The first step is to create a multi-dimensional array containing each word, and the number of times it appears. At the same time, we can work out which word appears most frequently, and eliminate words which appear too few times (or indeed zero times).
// Create an Array with the Tag Cloud Info
$Tag_Cloud_Data = array();
// Loop through each word and get the Occurrences
foreach ($Keywords_Array as $Keyword)
{
// Use the function from above to get the Occurrences
$Occurrences = get_word_occurrences($Keyword);
// Add the info to the Array
$Tag_Cloud_Data[] = array( 'keyword' => $Keyword,
'occurrences' => $Occurrences );
// Check if it's the most frequently appearing Word
if ( $Occurrences > $Max_Occurrences )
{
$Max_Occurrences = $Occurrences;
}
}
// Now build the HTML Code for the Tag Cloud
$HTML_Code = '';
foreach($Tag_Cloud_Data as $Tag)
{
// First, check that the Word actually appears
if ($Tag['occurrences'] > 0)
{
// Get the Font Size based on how often the Word Appears
$Font_Size = $Tag['occurrences'] / $Max_Occurrences;
// Only include the Word if the Font is bigger than
// 0.2 (otherwise the words are too small to read)
if ($Font_Size > 0.2)
{
// Build the HTML String
$HTML_Code .= '<span style="font-size: '.$Font_Size;
$HTML_Code .= 'em">'.$Tag['keyword'].'</span>';
}
}
}
Once we've looped through the keywords and determined the number of times they appear (as well as the maximum number of times any word appears), we can build the HTML code for the tag cloud. By using the 'em' unit for the font-size, we set the size of each word to be relative to a certain number; in this case, relative to the most frequently occurring word.
To explain the math behind this a little better, let's say that the word 'bananas' appears most in our blog posts:
WORD NO. OF OCCURRENCES apples 6 bananas 10 clementines 3 mangos 4
The maximum number of word occurrences is, as we said, 'bananas', which has 10 occurrences. We determine the font-size of each word based on the number of occurrences in comparison to the maximum occurrences of any word, so:
apples:
6 / 10 = 0.6
bananas
10 / 10 = 1.0
clementines
3 / 10 = 0.3
mangos
4 / 10 = 0.4
Therefore the font-size of the word 'apples' will be 0.6em (or 60% of the maximum size any word can be), bananas will be 1.0em (or 100% of the maximum size any word can be), and so on...
All that remains to be done now is to set the maximum font-size any word can be. This can be done in a separate CSS file, or inline, like so:
<div id="tag_cloud" style="font-size: 18px"> <?php echo $HTML_Code; ?> </div>
And that's it! I've tried to leave this script as open and easy-to-follow as possible so you can hack it to pieces and modify it for your needs etc. Any comments, questions or correction, please email me.