So if I have a matrix (list of lists) of unique words as my column headings, document ids as my row headings, and a 0 or 1 as the values if the word exists in that particular document.
What I'd like to know is how to determine all the possible combinations of words and documents where more than one word is in common with more than one document.
So something like:
[[Docid_3, Docid_5], ['word1', 'word17', 'word23']], [[Docid_3, Docid_9, Docid_334], ['word2', 'word7', 'word23', 'word68', 'word982']], and so on for each possible combination. Would love a solution that provides the complete set of combinations and one that yields only the combinations that are not a subset of another, so from the example, not [[Docid_3, Docid_5], ['word1', 'word17']] since it's a complete subset of the first example.
I feel like there is an elegant solution that just isn't coming to mind and the beer isn't helping.
Thanks.