Sriparna Saha/Resources

Resources

Title: When Words Can’t Capture It All: Towards Video-Based User Complaint Text Generation with Multimodal Video Complaint Dataset (ComVID) CIKM 2025 Applied Research Track
Description: ComVID is the first multimodal dataset designed for video-based complaint mining, comprising 1,175 complaint videos paired with expressive textual descriptions. Each entry is annotated with four key emotional labels — dissatisfaction, blame, frustration, and disappointment — alongside four major aspect categories: electronic gadgets, household items, fashion items, and others. With a particular focus on electronics (keyboards, mice, and headphones), the dataset emphasizes their importance in the rapidly expanding consumer tech sector. ComVID further supports evaluations across eight critical aspects, including quality, functionality, and defect identification, establishing a robust foundation for advancing Complaint Description from Videos (CoD-V) research.
Reference: S. Das, R E Zera Marveen Lyngkhoi, K. Jain, V. Goyal, S. Saha, and M. Gupta (2025), "When Words Can’t Capture It All: Towards Video-Based User Complaint Text Generation with Multimodal Video Complaint Dataset", CIKM 2025 Applied Research Track (core rank A).
Full Paper
Title: Unlocking Financial Insights: An advanced Multimodal Summarization with Multimodal Output Framework for Financial Advisory Videos ACM Multimedia 2025 Main Conference
Description: FASTER is a multimodal framework for financial advisory summarization that integrates visual semantics (BLIP), text extraction (OCR), and speech transcription with diarization (Whisper), enhanced by a fact-checked DPO loss and ranker-based retrieval for precision and coherence. To support this, the Fin-APT dataset offers 470 advisory videos spanning Business, Finance, Investment, Economics, and Marketing, with voice tones categorized as Informative, Neutral, Energetic, and Cautious. Comprehensive evaluations across VLMs such as Gemma3 and GPT-4o demonstrate the challenges and potential of multimodal financial summarization.
Reference: S. Das, R E Zera Marveen Lyngkhoi, S. Saha, A. Maurya (2025), "Unlocking Financial Insights: A Advanced Multimodal Summarization with Multimodal Output Framework for Financial Advisory Videos", The 33rd ACM International Conference on Multimedia (ACM Multimedia), Ireland, October 27–31, 2025 (core rank A*).
Full Paper
Title: COSMMIC: Comment-Sensitive Multimodal Multilingual Indian Corpus for Summarization and Headline Generation ACL 2025 Main Conference
Description: COSMMIC introduces the first comment-aware, multimodal, and multilingual summarization dataset for nine major Indian languages. It consists of 4,959 article-image pairs and 24,484 reader comments, with reference summaries in all languages. The dataset supports four configurations: text-only, text+comments, text+images, and full multimodal integration. Using models like LLaMA3, GPT-4, IndicBERT, and CLIP, we evaluate the effectiveness of different components and filtering strategies. COSMMIC fills a critical gap in Indian language NLP by incorporating reader feedback and visual context, offering a holistic benchmark for summarization and headline generation.
Reference: R. Kumar, M. Salman S A, A. Sahu, T. Nandi, Pragathi Y P, S. Saha, J. G Moreno (2025), "COSMMIC: Comment-Sensitive Multimodal Multilingual Indian Corpus for Summarization and Headline Generation", The 63rd Annual Meeting of the Association for Computational Linguistics, Vienna, Austria, July 27–August 1st, 2025 (ACL Main) (core rank A*)

Full Paper
Title: SANSKRITI: A Comprehensive Benchmark for Evaluating Language Models' Knowledge of Indian Culture
Description: SANSKRITI is the largest benchmark designed to assess language models’ understanding of Indian cultural diversity. It features 21,853 QA pairs across 28 states and 8 union territories, covering 16 key cultural attributes such as rituals, cuisine, festivals, language, and more. We evaluate various LLMs, ILMs, and SLMs on their ability to handle culturally grounded queries, revealing critical performance gaps.
Reference: A. Maji, R. Kumar, A. Ghosh, Anushka, S. Saha (2025), "SANSKRITI: A Comprehensive Benchmark for Evaluating Language Models’ Knowledge of Indian Culture", 63rd Annual Meeting of the Association for Computational Linguistics, Vienna, Austria, July 27–August 1st, 2025 (ACL Findings) (core rank A*).

Full Paper
Title: Toward Symptom Assessment Guided Symptom Investigation and Disease Diagnosis

Full Paper
Title: Two eyes, Two views, and finally, One summary! Towards Multi-modal Multi-tasking Knowledge-Infused Medical Dialogue Summarization

Full Paper
Title: Knowledge-Infused, Discourse-aware Disease Identification

Full Paper
Title: An EcoSage Assistant: Towards Building A Multimodal Plant Care Dialogue Assistant

Full Paper
Title: Symptoms are known by their companies: Towards Association guided Disease Diagnosis Assistant

Full Paper
Title: Local context is not enough! Towards Query Semantic and Knowledge Guided Multi-Span Medical Question Answering

Full Paper
Title: Persona or Context? Towards Building Context adaptive Personalized Persuasive Virtual Sales Assistant

Full Paper
Title: A knowledge infused context driven dialogue agent for disease diagnosis using hierarchical reinforcement learning

Full Paper
Title: Dr. Can See: Towards a Multi-modal Disease Diagnosis Virtual Assistant

Full Paper
Title: Experience and Evidence are the eyes of an excellent summarizer! Towards Knowledge Infused Multi-modal Clinical Conversation Summarization

Full Paper
Title: Your tone speaks louder than your face! Modality Order Infused Multi-modal Sarcasm Detection

Full Paper
Title: From Sights to Insights: Towards Summarization of Multimodal Clinical Documents

Full Paper
Title: Yes, This Is What I Was Looking For! Towards Multi-modal Medical Consultation Concern Summary Generation

Full Paper
Title: Seeing Is Believing! towards Knowledge-Infused Multi-modal Medical Dialogue Generation

Vide Presentation
Full Paper
Title: Hi Model, generating “nice” instead of “good” is not as bad as generating “rice”! Towards Context and Semantic Infused Dialogue Generation Loss Function

Vide Presentation
Full Paper
Title: IndicBART Alongside Visual Element: Multimodal Summarization in Diverse Indian Languages
Abstract: In the age of information overflow, the demand for advanced summarization techniques has surged, especially in linguistically diverse regions such as India. This paper introduces an innovative approach to multimodal multilingual summarization that seamlessly unites textual and visual elements. Our research focuses on four prominent Indian languages: Hindi, Bangla, Gujarati, and Marathi, employing abstractive summarization methods to craft coherent and concise summaries. For text summarization, we leverage the capabilities of the pre-trained IndicBART model, known for its exceptional proficiency in comprehending and generating text in Indian languages. We integrate an image summarization component based on the Image Pointer model to tackle multimodal challenges. This component identifies images from the input that enhance and complement the generated summaries, contributing to the overall comprehensiveness of our multimodal summaries. Our proposed methodology attains excellent results, surpassing other text summarization approaches tailored for the specified Indian languages. Furthermore, we enhance the significance of our work by incorporating a user satisfaction evaluation method, thereby providing a robust framework for assessing the quality of summaries. This holistic approach contributes to the advancement of summarization techniques, particularly in diverse Indian languages.
Resource link: https://github.com/Raghvendra-14/indicBART
If you are using the above resource, please cite the following paper: Kumar, R., Sinha, R., Saha, S., Jatowt, A.: Multimodal rumour detection: catching news that never transpired!. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds.) Document Analysis and Recognition - ICDAR 2023, ICDAR 2023, LNCS, vol. 14189, pp. 231–248. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-41682-8_15
Title: Extracting the Full Story: A Multimodal Approach and Dataset to Crisis Summarization in Tweets
Abstract: In our digitally connected world, the influx of microblog data poses a formidable challenge in extracting relevant information amid a continuous stream of updates. This challenge intensifies during crises, where the demand for timely and relevant information is crucial. Current summarization techniques often struggle with the intricacies of microblog data in such situations. To address this, our research explores crisis-related microblogs, recognizing the crucial role of multimedia content, such as images, in offering a comprehensive perspective. In response to these challenges, we introduce a multimodal extractive-abstractive summarization model. Leveraging a fusion of TF-IDF scoring and bigram filtering, coupled with the effectiveness of three distinct models—BIGBIRD, CLIP, and bootstrapping language-image pre-training (BLIP)—we aim to overcome the limitations of traditional extractive and text-only approaches. Our model is designed and evaluated on a newly curated Twitter dataset featuring 12 494 tweets and 3090 images across eight crisis events, each accompanied by gold-standard summaries. The experimental findings showcase the remarkable efficacy of our model, surpassing current benchmarks by a notable margin of 16% and 17%. This confirms our model’s strength and its relevance in crisis scenarios with the crucial interplay of text and multimedia. Notably, our research contributes to multimodal, abstractive microblog summarization, addressing a key gap in the literature. It is also a valuable tool for swift information extraction in time-sensitive situations.
Resource link: https://github.com/Raghvendra-14/A-Multimodal-Approach-and-Dataset-to-Crisis-Summarization-in-Tweets
If you are using the above resource, please cite the following paper: R. Kumar, R. Sinha, S. Saha, A. Jatowt (2024), “Extracting the Full Story: A Multimodal Approach and Dataset to Crisis Summarization in Tweets”, IEEE Transactions on Computa-tional Social Systems, DOI:doi: 10.1109/TCSS.2024.3436690 .
Title: Silver Lining in the Fake News Cloud: Can Large Language Models Help Detect Misinformation?
Abstract: In the times of advanced generative artificial intelligence, distinguishing truth from fallacy and deception has become a critical societal challenge. This research attempts to analyze the capabilities of large language models for detecting misinformation. Our study employs a versatile approach, covering multiple Large Language Models (LLMs) with few and zeroshot prompting. These models are rigorously evaluated across various fake news and rumour detection datasets. Introducing a novel dimension, we additionally incorporate sentiment and emotion annotations to understand the emotional influence on misinformation detection using LLMs. Moreover, to extend our inquiry, we employ ChatGPT to intentionally distort authentic news as well as human-written fake news, utilizing zero-shot and iterative prompts. This deliberate corruption allows for a detailed examination of various parameters such as abstractness, concreteness, and named entity density, providing insights into differentiating between unaltered news, human-written fake news and its LLM-corrupted counterpart. Our findings aspire to furnish a refined framework for discerning authentic news, human-generated misinformation, and LLM-induced distortions. This multifaceted approach, utilizing various prompt techniques, contributes to a comprehensive understanding of the subtle variations shaping misinformation sources.
Resource link: https://github.com/Raghvendra-14/TAI-MISINFORMATION
If you are using the above resource, please cite the following paper: R. Kumar, B. Goddu, S. Saha, A. Jatowt (2024) "Silver Lining in the Fake News Cloud: Can Large Language Models Help Detect Misinformation?", IEEE Transactions on Artificial Intelligence (IEEE TAI), doi: 10.1109/TAI.2024.3440248.
Dataset: Explainable comprehensive financial market-based dataset which contains popular social media-based financial tweets.
Description: We present a detailed study on how financial market behaviour is influenced by public psychology. In this paper, we presented an explainable comprehensive financial market-based dataset which contains popular social media-based financial tweets. This dataset has Emotion, sentiment and Causal Expression labels. With a CentralNet-based multitasking framework, the model analyses market behaviour. Additionally, we analyse the NASDAQ dataset with ARIMA and LSTM models
GitHub link: https://github.com/sarmistha-D/Financial-Market-with-ESAN
Refrences: Investigate How Market Behaves: Toward an Explanatory Multitasking Based Analytical Model for Financial Investments S Das, U Chowdhury, NS Lijin, A Deep, S Saha… - IEEE Access, 2024
Dataset: Unimodal explainable complaint mining dataset in Financial Domain
Description: We present a comprehensive, unimodal, financially explainable complaint-mining dataset derived from social media (Twitter) posts. The dataset comprises five attributes: Complaint-NonComplaint, Severity Level, Emotion, Sentiment, and Causal Expression. The primary objective is to classify text as either a complaint or non-complaint. Subsequently, complaints are further categorized into severity levels based on the associated risk tolerance. Additionally, the dataset is labelled with five distinct emotion categories and sentiments. To enhance interpretability, we propose an explainable complaint cause identification framework using a dyadic attention mechanism within a multitasking CentralNet model, capturing both linguistic and pragmatic nuances.
GitHub link: https://github.com/sarmistha-D/Complaint-HaN
Refrences: Negative Review or Complaint? Exploring Interpretability in Financial Complaints S Das, A Singh, S Saha, A Maurya - IEEE Transactions on Computational Social Systems, 2024
Resource: Video Complaint Dataset (VCD)
Description: We introduce the Video Complaint Dataset (VCD), a novel resource aimed at advancing research in aspect-level complaint detection. The dataset contains 450 annotated utterances from 130 product review videos from Youtube. Each annotation includes the product category, product name, timestamp of utterance with corresponding aspect and complaint labels.
The train and validation dataset used in the paper is present here: https://github.com/rdev12/MAACA/tree/main/data
Please cite the following paper if you are using the above dataset: R. Devanathan, A. Singh, A.S. Poornash, S. Saha (2024), "Seeing Beyond Words: Multimodal Aspect-Level Complaint Detection in Ecommerce Videos", ACM Multimedia, Melbourne, Australia, 28th October 1st November 2024 (Core rank A*).
ToxVI: a Multimodal LLM-based Framework for Generating Intervention in Toxic Code-Mixed Videos
Description: We can discourage social media users from sharing toxic material by automatically generating interventions that explain why certain content is inappropriate. We introduce a Toxic Code- Mixed Intervention Video benchmark dataset (ToxCMI), comprising 1697 code-mixed toxic video utterances sourced from YouTube. Each utterance in this dataset has been meticulously annotated for toxicity and severity, accompanied by interventions provided in Hindi-English code-mixed languages.
Reference: Krishanu Maity, A.S. Poornash, Sriparna Saha and Kitsuchart Pasupa. "ToxVI: a Multimodal LLM-based Framework for Generating Intervention in Toxic Code- Mixed Videos" In Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, October 21–25, 2024, Boise, Idaho, USA CIKM 2024, (Core Rank A)
MedSumm:
Description: We present the Multimodal Medical Code-mixed Question Summarization (MMCQS) dataset, the first of its kind, which pairs Hindi-English code-mixed medical queries with visual aids and corresponding English summaries. The MMCQS dataset comprises medical queries, across 18 different medical disorders offering enriched representations of patients' conditions through the integration of visual cues.
Reference: Ghosh, Akash, Arkadeep Acharya, Prince Jha, Sriparna Saha, Aniket Gaudgaul, Rajdeep Majumdar, Aman Chadha, Raghav Jain, Setu Sinha, and Shivani Agarwal. "MedSumm: A Multimodal Approach to Summarizing Code-Mixed Hindi-English Clinical Queries." In European Conference on Information Retrieval, pp. 106-120. Cham: Springer Nature Switzerland, 2024.
ClipSyntel:
Description: We present the Multimodal Medical Question Summarization (MMQS) Dataset, designed to harness the unexploited potential of integrating textual queries with visual representations of medical conditions. By pairing medical inquiries with corresponding visual aids, this dataset aims to enhance and refine the comprehension of patient needs. The MMQS Dataset comprises 3,015 instances, offering a valuable resource for the development of more sophisticated multimodal approaches in medical query analysis.
Reference: Ghosh, Akash, Arkadeep Acharya, Raghav Jain, Sriparna Saha, Aman Chadha, and Setu Sinha. "Clipsyntel: clip and llm synergy for multimodal question summarization in healthcare." In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 20, pp. 22031-22039. 2024
Hate Speech Detection from Videos in Code-mixed Setting:
Paper Title: ToxVidLLM: A Multimodal LLM-based Framework for Toxicity Detection in Code-Mixed Videos
Description: We introduce ToxCMM, an openly accessible dataset extracted from YouTube that is meticulously annotated for toxic speech, with utterances presented in code-mixed form. Each sentence within the videos is annotated with three crucial labels, namely Toxic (Yes / No), Sentiment (Positive / Negative / Neutral), and Severity levels (Non-harmful / Partially Harmful / Very Harmful). This extensive dataset comprises 931 videos, encompassing a total of 4021 utterances. The release of the ToxCMM dataset is intended to foster further exploration in the realm of multi-modal toxic speech detection within low-resource code-mixed languages.
Reference: Krishanu Maity, A.S. Poornash, Sriparna Saha and Pushpak Bhattacharyya. "Multimodal Toxicity Detection in Code-Mixed Video Content: Unveiling a Hindi-English Benchmark Dataset", Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Long Papers), ACL 2024, Bangkok, Thailand, August 11-16, 2024, (Core Rank A*)
Multimodal Pharmacovigilance:
Title: "Enhancing Adverse Drug Event Detection with Multimodal Dataset: Corpus Creation and Model Development" (Accepted in ACL findings 2024)
Description: The mining of adverse drug events (ADEs) is pivotal in pharmacovigilance, enhancing patient safety by identifying potential risks associated with medications. Previous ADE mining studies have focused on text-based methodologies, overlooking visual cues, limiting contextual comprehension, and hindering accurate interpretation. To address this gap, we present a MultiModal Adverse Drug Event (MMADE) detection dataset, merging ADE-related textual information with visual aids. To access the dataset, please follow the GitHub link: 'https://github.com/singhayush27/MMADE.git'. If you are using the dataset, don’t forget to cite the above-mentioned original paper.
Towards Emotion-aided Multi-modal Dialogue Act Classification:
Description: A new dataset- multimodal Emotion aware Dialogue Act dataset called EMOTyDA, collected from open-sourced dialogue datasets.EMOTyDA dataset is curated by collecting conversations from two open sourced datasets IEMOCAP and MELD. Both IEMOCAP and MELD have pre-annotated emotion labels. The 12 DA annotated categories are "Greeting (g)", "Question (q)", "Answer (ans)", "Statement-Opinion (o)", "Statement-Non-Opinion (s)", "Apology (ap)", "Command (c)", "Agreement (ag)", "Disagreement (dag)", "Acknowledge (a)", "Backchannel (b)" and "Others (oth)".
Reference: T. Saha, A. Patra, S. Saha and P. Bhattacharyya (2020), `` Towards Emotion-aided Multi-modal Dialogue Act Classification", In ACL 2020, July 5-10, 2020, Seattle, Washington (Category A*).
Sentiment and Emotion aware Multi-modal Speech Act Classification in Twitter (Tweet Act Classification) : EmoTA.
Description: EmoTA dataset is curated by collecting tweets from an open-sourced tweet dataset named SemEval-2018. SemEval-2018 dataset has pre-annotated multi-label emotion tags. The 7 manually annotated TA tags are “Statement” (sta), “Expression” (exp), “Question” (que), “Request” (req), “Suggestion” (sug), “Threat” (tht) and “Others” (oth). The sentiment label for tweets are obtained following a semi-supervised approach using the IBM Watson Sentiment Classifier(https://cloud.ibm.com/apidocs/natural-language-understanding#sentiment). EmoTA dataset contains the silver-standard sentiment tags.
Reference: T. Saha, A. Upadhyaya, S. Saha, P. Bhattacharyya (2021), "Towards Sentiment and Emotion aided Multi-modal Speech Act Classification in Twitter", in NAACL-HLT 2021, June 6-11, 2021 ( Category A).
A Multitask Framework for Sentiment, Emotion and Sarcasm aware Cyberbullying Detection from Multi-modal Code-Mixed Memes.
Description: We have created a benchmark multi-modal (Image+Text) meme dataset called MultiBully annotated with bully, sentiment, emotion and sarcasm labels collected from open-source Twitter and Reddit platforms. Moreover, the severity of the cyberbullying posts is also investigated by adding a harmfulness score to each meme. Out of 5854 memes in our database, 2632 were labeled as nonbully, while 3222 were tagged as bullies.
Reference: Maity, K., Jha, P., Saha, S. and Bhattacharyya, P., 2022, July. A multitask framework for sentiment, emotion and sarcasm aware cyberbullying detection from multi-modal code-mixed memes. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 1739-1749).
Ex-ThaiHate: A Generative Multi-task Framework for Sentiment and Emotion Aware Hate Speech Detection with Explanation in Thai.
Description: We have developed Ex-ThaiHate, a new benchmark dataset for explainable hate speech detection in the Thai language. This dataset includes hate, sentiment, emotion and rationales labels. The dataset comprises 2685 hate and 4912 non-hate instances.
Reference: Maity, K., Bhattacharya, S., Phosit, S., Kongsamlit, S., Saha, S. and Pasupa, K., 2023, September. Ex-ThaiHate: A Generative Multi-task Framework for Sentiment and Emotion Aware Hate Speech Detection with Explanation in Thai. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (pp. 139-156). Cham: Springer Nature Switzerland.
GenEx: A Commonsense-aware Unified Generative Framework for Explainable Cyberbullying Detection in Hindi-English Code-mixed language
Description: We created an explainable cyberbullying dataset called BullyExplain, addressing four tasks simultaneously: Cyberbullying Detection (CD), Sentiment Analysis (SA), Target Identification (TI), and Detection of Rationales (RD). Each tweet in this dataset is annotated with four classes: Bully (Yes/No), Sentiment (Positive/Neutral/Negative), Target (Religion/Sexual-Orientation/Attacking-Relatives-and-Friends/Organization/Community/Profession/Miscellaneous), and Rationales (highlighted parts of the text justifying the classification decision). The rationales are not marked if the post is non-bullying, and the target class is selected as NA (Not Applicable). The BullyExplain dataset comprises a total of 6,084 samples, with 3,034 samples belonging to the non-bully class and the remaining 3,050 samples marked as bully. The number of tweets with positive and neutral sentiments is 1,536 and 1,327, respectively, while the remaining tweets express negative sentiments.
Reference: Maity, K., Jain, R., Jha, P., Saha, S. and Bhattacharyya, P., 2023, December. GenEx: A Commonsense-aware Unified Generative Framework for Explainable Cyberbullying Detection. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (pp. 16632-16645).
A deep learning framework for the detection of Malay hate speech
Description: We created a dataset called HateM in Malay, where we looked at each tweet and marked it as either hate or non-hate. The dataset has 3,002 tweets marked as non-hate and 1,890 tweets marked as hate.
Reference: Maity, K., Bhattacharya, S., Saha, S. and Seera, M., 2023. A deep learning framework for the detection of Malay hate speech. IEEE Access.
Emotion, Sentiment, and Sarcasm aided Complaint Detection:
Description: We extend the Twitter-based Complaints dataset with the emotion, sentiment, and sarcasm classes. The extended Complaints dataset consists of 2214 non-complaints and 1235 complaint tweets in English.
Reference: A. Singh, A. Nazir, S. Saha (2021), ``Adversarial Multi-task Model for Emotion, Sentiment, and Sarcasm aided Complaint Detection", in 44th European Conference on Information Retrieval (10-14 April 2022), ECIR 2022 (core ranking A), Norway.
Sentiment and Emotion-Aware Multi-Modal Complaint Identification:
Description: We curate a new multimodal complaint dataset- Complaint, Emotion, and Sentiment Annotated Multi-modal Amazon Reviews Dataset (CESAMARD), a collection of opinionated texts (reviews) and images of the products posted on the website of the retail giant Amazon. The CESAMARD dataset comprises 3962 reviews with the corresponding complaint, emotion, and sentiment labels.
Reference: A Singh, S. Dey, A. Singha, S. Saha (2021), ``Sentiment and Emotion-aware Multi-modal Complaint Identification", in AAAI 2022 (core rank A*).
Complaint and Severity Identification from Online Financial Content:
Description: We curate a Financial Complaints corpus (FINCORP), a collection of annotated complaints arising between financial institutions and consumers expressed in English on Twitter. The dataset has been enriched with the associated emotion, sentiment, and complaint severity classes. The dataset comprises 3149 complaints and 3133 non-compliant instances spanning over ten domains (e.g., credit cards, mortgages, etc.).
Reference: A. Singh, R. Bhatia, and S. Saha (2022), "Complaint and Severity Identification from Online Financial Content", IEEE Transactions on Computational Social Systems.
Peeking inside the black box - A Commonsense-Aware Generative Framework for Explainable Complaint Detection:
Description: We extended the original Complaints dataset with causal span annotations for complaint and non-complaint labels. The extended dataset (X-CI) is the first benchmark dataset for explainable complaint detection. Each instance in the X-CI dataset is annotated with five labels: complaint label, emotion label, polarity label, complaint severity level, and rationale (explainability), i.e., the causal span explaining the reason for the complaint/non-complaint label.
Reference: A. Singh, R. Jain, P. Jha, S. Saha (2023), ``Peeking inside the black box: A Commonsense-aware Generative Framework for Explainable Complaint Detection”, ACL 2023 (Core rank: A*).
Knowing What and How - A Multi-modal Aspect-Based Framework for Complaint Detection:
Description: he CESAMRD-Aspect dataset consists of aspect categories and associated complaint/non-complaint labels and spans five domains (books, electronics, edibles, fashion, and miscellaneous). The dataset comprises 3962 reviews, with 2641 reviews in the non-complaint category (66.66%) and 1321 reviews in the complaint category (33.34%). Each record in the dataset consists of the image URL, review title, review text, and corresponding complaint, polarity, and emotion labels. The instances in the original CESAMARD dataset were grouped according to various domains, such as electronics, edibles, fashion, books, and miscellaneous. We take it a step forward by including the pre-defined set of aspect categories for each of the 5 domains with the associated complaint/non-complaint labels. All domains share three common aspects: packaging, price, and quality, which are essential considerations when shopping online.
Reference: A. Singh, V. Gangwar, S. Sharma, S. Saha (2023), ``Knowing What and How: A Multi-modal Aspect-Based Framework for Complaint Detection", ECIR 2023 (Core rank A) , Dublin.
AbCoRD - Exploiting Multimodal Generative Approach for Aspect-based Complaint and Rationale Detection:
Description: We added rationale annotation for aspect-based complaint classes to the benchmark multimodal complaint dataset (CESAMARD) covering five domains (books, electronics, edibles, fashion, and miscellaneous). The causal span that best explains the reason for the complaint label in each aspect-level complaint instance was selected. Note that if the review is categorized as non-complaint as a whole, then all aspect-level annotations will also be marked as non-complaint. However, in cases where there is a complaint at the review level, certain aspects may still be considered non-complaint. Each review instance is marked with at most six aspects in the dataset.
Reference: R.Jain, A. Singh, V. Gangwar, S. Saha (2023), ``AbCoRD: Exploiting multimodal generative approach for Aspect-based Complaint and Rationale Detection”, ACM Multimedia (Core rank: A*) .
Large Scale Multi-Lingual Multi-Modal Summarization Dataset
Description: The current largest multi-lingual multi-modal summarization dataset (M3LS), and it consists of over a million instances of document-image pairs along with a professionally annotated multi-modal summary for each pair. Spans 20 languages, targeting diversity across five language roots, it is also the largest summarization dataset for 13 languages and consists of cross-lingual summarization data for 2 languages.
Reference: Verma, Y., Jangra, A., Verma, R. and Saha, S., 2023, May. Large Scale Multi-Lingual Multi-Modal Summarization Dataset. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (pp. 3602-3614) ECIR (CORE Ranking: A).
Multimodal Rumour Detection: Catching News that Never Transpired!
Description: Extension of the PHEME 2016 dataset. The PHEME-2016 dataset initially lacked images. Images were collected for tweet threads with user-uploaded images mentioned in the metadata. For threads without images, we performed web scraping to augment visuals. Only source tweets were considered for image downloads to ensure relevance and appropriateness.
Reference: Kumar, R., Sinha, R., Saha, S., Jatowt, A. (2023). Multimodal Rumour Detection: Catching News that Never Transpired!. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023 (CORE Ranking: A). Lecture Notes in Computer Science, vol 14189. Springer, Cham. https://doi.org/10.1007/978-3-031-41682-8_15