Introduction
In this article, I’ll show how to get a random subset of data from a DataTable. This is useful when you already have queries that are filtered correctly but returns all the rows.
Analysis
I came across this situation when I wanted to display a random tag cloud. I already had the query to get the keywords ordered by number of clicks and I wanted to created a tag cloud. Tags that are the most popular should have more chance to get picked and should be displayed larger than less popular ones.
Implementation
In this code snippet, there is everything you need.
' Min size, in pixel for the tag
Private Const MIN_FONT_SIZE As Integer = 9
' Max size, in pixel for the tag
Private Const MAX_FONT_SIZE As Integer = 14
' Basic function that retreives Tags from a DataBase
Public Shared Function GetTags() As MediasTagsDataTable
' Simple call to the TableAdapter, to get the Tags ordered by number of clicks
Dim dt As MediasTagsDataTable = taMediasTags.GetDataValide
' If the query returned no result, return an empty DataTable
If dt Is Nothing OrElse dt.Rows.Count < 1 Then
Return New MediasTagsDataTable
End If
' Set the font-size of the group of data
' We are dividing our results into sub set, according to their number of clicks
' Example: 10 results -> [0,2] will get font size 9, [3,5] will get font size 10, [6,8] wil get 11, ...
' This is the number of elements in one group
Dim groupLenth As Integer = CType(Math.Floor(dt.Rows.Count / (MAX_FONT_SIZE - MIN_FONT_SIZE)), Integer)
' Counter of elements in the same group
Dim counter As Integer = 0
' Counter of groups
Dim groupCounter As Integer = 0
' Loop througt the list
For Each row As MediasTagsRow In dt
' Set the font-size in a custom column
row.c_FontSize = MIN_FONT_SIZE + groupCounter
' Increment the counter
counter += 1
' If the group counter is less than the counter
If groupLenth <= counter Then
' Start a new group
counter = 0
groupCounter += 1
End If
Next
' Return the new DataTable with font-size
Return dt
End Function
' Function that generate the random sub set
Public Shared Function GetRandomSampleTags(ByVal KeyCount As Integer) As MediasTagsDataTable
' Get the data
Dim dt As MediasTagsDataTable = GetTags()
' Create a new DataTable that will contains the random set
Dim rep As MediasTagsDataTable = New MediasTagsDataTable
' Count the number of row in the new DataTable
Dim count As Integer = 0
' Random number generator
Dim rand As New Random()
While count < KeyCount
Randomize()
' Pick a random row
Dim r As Integer = rand.Next(0, dt.Rows.Count - 1)
Dim tmpRow As MediasTagsRow = dt(r)
' Import it into the new DataTable
rep.ImportRow(tmpRow)
' Remove it from the old one, to be sure not to pick it again
dt.Rows.RemoveAt(r)
' Increment the counter
count += 1
End While
' Return the new sub set
Return rep
End Function
Pro’s
This method is good because it doesn’t require much work to get it work fast. It is a good concept when you are working with small tables, let says less than 100 records.
Con’s
If you have more than 100 records, out of memory exception may occur since we are coping and duplicating rows. I would consider using a stored procedure instead.