Leveraging Acoustic Images for Effective Self-supervised Audio Representation Learning