A Multimodal Approach to exploit similarity in documents