Python Multiword Index

Posted by Manab Chetia on Stack Overflow See other posts from Stack Overflow or by Manab Chetia
Published on 2012-09-24T20:13:23Z Indexed on 2012/09/24 21:37 UTC
Read the original article Hit count: 278

Filed under:
|
|
index = {'Michael': [['mj.com',1], ['Nine.com',9],['i.com', 34]], / 
         'Jackson': [['One.com',4],['mj.com', 2],['Nine.com', 10], ['i.com', 45]], /
         'Thriller' : [['Seven.com', 7], ['Ten.com',10], ['One.com', 5], ['mj.com',3]}

# In this dictionary (index), for eg: 'KEYWORD': 
# [['THE LINK in which KEYWORD is present,'POSITION
# of KEYWORD in the page specified by link']]

eg: Michael is present in MJ.com, NINE.com, and i.com at positions 1, 9, 34 of respective pages.

Please help me with a python procedure which takes index and KEYWORDS as input.

When i enter 'MICHAEL'. The result should be:

>>['mj.com', 'nine.com', 'i.com']

When I enter 'MICHAEL JACKSON'. The result should be :

>>['mj.com', 'Nine.com']

as 'Michael' and 'Jackson' are present at 'mj.com' and 'nine.com' consecutively i.e. in positions (1,2) & (9,10) respectively. The result should not show 'i.com' even though it contains both KEYWORDS but they are not placed consecutively.

When I enter 'MICHAEL JACKSON THRILLER', the result should be

['mj.com']

as the 3 words 'MICHAEL', 'JACKSON', 'THRILLER' are placed consecutively in 'mj.com' ie positions (1, 2, 3) respectively.

If I enter 'THRILLER JACKSON' or 'THRILLER FEDERER', the result should be NONE.

© Stack Overflow or respective owner

Related posts about python

Related posts about list