Python Multiword Index
Posted
by
Manab Chetia
on Stack Overflow
See other posts from Stack Overflow
or by Manab Chetia
Published on 2012-09-24T20:13:23Z
Indexed on
2012/09/24
21:37 UTC
Read the original article
Hit count: 284
index = {'Michael': [['mj.com',1], ['Nine.com',9],['i.com', 34]], /
'Jackson': [['One.com',4],['mj.com', 2],['Nine.com', 10], ['i.com', 45]], /
'Thriller' : [['Seven.com', 7], ['Ten.com',10], ['One.com', 5], ['mj.com',3]}
# In this dictionary (index), for eg: 'KEYWORD':
# [['THE LINK in which KEYWORD is present,'POSITION
# of KEYWORD in the page specified by link']]
eg: Michael is present in MJ.com, NINE.com, and i.com at positions 1, 9, 34 of respective pages.
Please help me with a python procedure which takes index
and KEYWORDS
as input.
When i enter 'MICHAEL'
. The result should be:
>>['mj.com', 'nine.com', 'i.com']
When I enter 'MICHAEL JACKSON'.
The result should be :
>>['mj.com', 'Nine.com']
as 'Michael'
and 'Jackson'
are present at 'mj.com'
and 'nine.com'
consecutively i.e. in positions (1,2) & (9,10) respectively. The result should not show 'i.com'
even though it contains both KEYWORDS but they are not placed consecutively.
When I enter 'MICHAEL JACKSON THRILLER',
the result should be
['mj.com']
as the 3 words 'MICHAEL', 'JACKSON', 'THRILLER'
are placed consecutively in 'mj.com'
ie positions (1, 2, 3) respectively.
If I enter 'THRILLER JACKSON'
or 'THRILLER FEDERER',
the result should be NONE
.
© Stack Overflow or respective owner