Version: 1.0
Provider: North South University, Bangladesh
Maintainer: Dr. Shafin Rahman, Department of Electrical and Computer Engineering
Profile construction using human faces and probable URLs (likely containing biographical and professional information) is a critical challenge in multimodal AI. It involves extracting structured, identity-specific information by integrating visual and textual modalities. Existing datasets in this domain have poor demographic diversity, lack factual information in the real world, and minimal alignment between visual identity and textual evidence. These limitations make creating and evaluating vision language models (generating personalized summaries for individuals) challenging. To address these issues, we introduce a novel dataset, Face2Profile, which has approximately 10K publicly available facial images, names of the person, professional details of the person, curated sets of positive images that contain the person's information and negative or misleading web links, and peer-reviewed human-written summaries. In addition, the dataset emphasizes demographic diversity and includes challenging visual conditions such as poor lighting, occluded faces, or nonfrontal viewpoints to reflect real-world scenarios. It includes rigorous demographic stratification and annotation to ensure diversity, factual consistency, and relevance in real-world scenarios. We benchmark generative performance by evaluating GPT-4o and DeepSeek R1 using Bilingual Evaluation Understudy (BLEU) and a novel Custom-BLEU metric that penalizes missing identity elements such as names and occupations. Our analysis shows that GPT-4o and DeepSeek-R1 produce fluent summaries but frequently omit key factual content. We further evaluate lightweight language models—Phi-3 Mini and DeepSeek-1.5B—using an Entity Coverage Score (ECS) to assess the factual precision of structured output summaries by Small Language Models. This benchmark offers a novel perspective on identity-based profile construction by evaluating the models in a zero-shot setting without any model fine-tuning or task-specific training, and establishes a challenging benchmark dataset for future research on the multimodal profile construction task.
The dataset is not publicly downloadable. To request access, please send an email to:
Email: shafin.rahman@northsouth.edu
Please include "Request for Face2Profile Dataset Access" in the subject line.
Note: The dataset is provided strictly for non-commercial academic research purposes. By requesting the dataset, you agree to abide by the terms and conditions laid out in the Data Use and Confidentiality Agreement.