Subset generation by rules
- by Sazug
Let's say that we have a 5000 users in database. User row has sex column, place where he/she was born column and status (married or not married) column.
How to generate a random subset (let's say 100 users) that would satisfy these conditions:
40% should be males and 60% - females
50% should be born in USA, 20% born in UK, 20% born in Canada, 10% in Australia
70% should be married and 30% not.
These conditions are independent, that is we cannot do like this:
(0.4 * 0.5 * 0.7) * 100 = 14 users that are males, born in USA and married
(0.4 * 0.5 * 0.3) * 100 = 6 users that are males, born in USA and not married.
Is there an algorithm to this generation?