Title:
IndicBART Alongside Visual Element: Multimodal Summarization in Diverse Indian Languages
Abstract:
In the age of information overflow, the demand for advanced summarization techniques has surged,
especially in linguistically diverse regions such as India. This paper introduces an innovative
approach to multimodal multilingual summarization that seamlessly unites textual and visual elements.
Our research focuses on four prominent Indian languages: Hindi, Bangla, Gujarati, and Marathi, employing abstractive
summarization methods to craft coherent and concise summaries. For text summarization, we leverage
the capabilities of the pre-trained IndicBART model, known for its exceptional proficiency in
comprehending and generating text in Indian languages. We integrate an image summarization component
based on the Image Pointer model to tackle multimodal challenges. This component identifies images from
the input that enhance and complement the generated summaries, contributing to the overall comprehensiveness
of our multimodal summaries. Our proposed methodology attains excellent results, surpassing other text summarization
approaches tailored for the specified Indian languages. Furthermore, we enhance the significance of our work by
incorporating a user satisfaction evaluation method, thereby providing a robust framework for assessing the quality of
summaries. This holistic approach contributes to the advancement of summarization techniques, particularly in diverse Indian languages.
Resource link:
https://github.com/Raghvendra-14/indicBART
If you are using the above resource, please cite the following paper:
Kumar, R., Sinha, R., Saha, S., Jatowt, A.: Multimodal rumour detection: catching news that never transpired!. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds.) Document Analysis and Recognition - ICDAR 2023, ICDAR 2023, LNCS, vol. 14189, pp. 231–248. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-41682-8_15
Title:
Extracting the Full Story: A Multimodal Approach and Dataset to Crisis Summarization in Tweets
Abstract:
In our digitally connected world, the influx of microblog data poses a formidable challenge
in extracting relevant information amid a continuous stream of updates. This challenge
intensifies during crises, where the demand for timely and relevant information is crucial.
Current summarization techniques often struggle with the intricacies of microblog data in
such situations. To address this, our research explores crisis-related microblogs, recognizing
the crucial role of multimedia content, such as images, in offering a comprehensive perspective.
In response to these challenges, we introduce a multimodal extractive-abstractive summarization model.
Leveraging a fusion of TF-IDF scoring and bigram filtering, coupled with the effectiveness of three distinct
models—BIGBIRD, CLIP, and bootstrapping language-image pre-training (BLIP)—we aim to overcome the limitations
of traditional extractive and text-only approaches. Our model is designed and evaluated on a newly curated Twitter
dataset featuring 12 494 tweets and 3090 images across eight crisis events, each accompanied by gold-standard summaries.
The experimental findings showcase the remarkable efficacy of our model, surpassing current benchmarks by a notable margin
of 16% and 17%. This confirms our model’s strength and its relevance in crisis scenarios with the crucial interplay of text
and multimedia. Notably, our research contributes to multimodal, abstractive microblog summarization, addressing a key gap in
the literature. It is also a valuable tool for swift information extraction in time-sensitive situations.
Resource link:
https://github.com/Raghvendra-14/A-Multimodal-Approach-and-Dataset-to-Crisis-Summarization-in-Tweets
If you are using the above resource, please cite the following paper:
R. Kumar, R. Sinha, S. Saha, A. Jatowt (2024), “Extracting the Full Story: A Multimodal Approach and Dataset to Crisis Summarization in Tweets”, IEEE Transactions on Computa-tional Social Systems, DOI:doi: 10.1109/TCSS.2024.3436690 .
Title:
Silver Lining in the Fake News Cloud: Can Large Language Models Help Detect Misinformation?
Abstract:
In the times of advanced generative artificial intelligence, distinguishing truth from fallacy and deception has become a critical societal challenge. This research attempts to
analyze the capabilities of large language models for detecting misinformation. Our study employs a versatile approach, covering multiple Large Language Models (LLMs) with few and zeroshot prompting. These models are rigorously evaluated across various fake news and rumour detection datasets. Introducing a novel dimension, we additionally incorporate sentiment and emotion annotations to understand the emotional influence on misinformation detection using LLMs. Moreover, to extend our inquiry, we employ ChatGPT to intentionally distort authentic news as well as human-written fake news, utilizing zero-shot and iterative prompts. This deliberate corruption allows for a detailed examination of various parameters such as abstractness, concreteness, and named entity density, providing insights into differentiating between unaltered news, human-written fake news and its LLM-corrupted counterpart. Our findings aspire to furnish a refined framework for discerning authentic news,
human-generated misinformation, and LLM-induced distortions. This multifaceted approach, utilizing various prompt techniques, contributes to a comprehensive understanding of the subtle variations shaping misinformation sources.
Resource link:
https://github.com/Raghvendra-14/TAI-MISINFORMATION
If you are using the above resource, please cite the following paper:
R. Kumar, B. Goddu, S. Saha, A. Jatowt (2024) "Silver Lining in the Fake News Cloud:
Can Large Language Models Help Detect Misinformation?", IEEE Transactions on Artificial Intelligence (IEEE TAI), doi: 10.1109/TAI.2024.3440248.
Dataset:
Explainable comprehensive financial market-based dataset which contains popular social media-based financial tweets.
Description:
We present a detailed study on how financial market behaviour is influenced by
public psychology. In this paper, we presented an explainable comprehensive financial
market-based dataset which contains popular social media-based financial tweets.
This dataset has Emotion, sentiment and Causal Expression labels. With a
CentralNet-based multitasking framework, the model analyses market behaviour.
Additionally, we analyse the NASDAQ dataset with ARIMA and LSTM models
GitHub link:
https://github.com/sarmistha-D/Financial-Market-with-ESAN
Refrences:
Investigate How Market Behaves: Toward an Explanatory Multitasking Based Analytical
Model for Financial Investments S Das, U Chowdhury, NS Lijin, A Deep, S Saha… -
IEEE Access, 2024
Dataset:
Unimodal explainable complaint mining dataset in Financial Domain
Description:
We present a comprehensive, unimodal, financially explainable complaint-mining dataset
derived from social media (Twitter) posts. The dataset comprises five attributes:
Complaint-NonComplaint, Severity Level, Emotion, Sentiment, and Causal Expression.
The primary objective is to classify text as either a complaint or non-complaint.
Subsequently, complaints are further categorized into severity levels based on the
associated risk tolerance. Additionally, the dataset is labelled with five distinct
emotion categories and sentiments. To enhance interpretability, we propose an
explainable complaint cause identification framework using a dyadic attention
mechanism within a multitasking CentralNet model, capturing both linguistic and
pragmatic nuances.
GitHub link:
https://github.com/sarmistha-D/Complaint-HaN
Refrences:
Negative Review or Complaint? Exploring Interpretability in Financial Complaints S Das,
A Singh, S Saha, A Maurya - IEEE Transactions on Computational Social Systems, 2024
Resource:
Video Complaint Dataset (VCD)
Description:
We introduce the Video Complaint Dataset (VCD), a novel resource aimed at advancing
research in aspect-level complaint detection. The dataset contains 450 annotated
utterances from 130 product review videos from Youtube. Each annotation includes
the product category, product name, timestamp of utterance with corresponding aspect
and complaint labels.
The train and validation dataset used in the paper is present here:
https://github.com/rdev12/MAACA/tree/main/data
Please cite the following paper if you are using the above dataset:
R. Devanathan, A. Singh, A.S. Poornash, S. Saha (2024), "Seeing Beyond Words:
Multimodal Aspect-Level Complaint Detection in Ecommerce Videos", ACM Multimedia,
Melbourne, Australia, 28th October 1st November 2024 (Core rank A*).
ToxVI: a Multimodal LLM-based Framework for Generating Intervention in Toxic Code-Mixed Videos
Description:
We can discourage social media users from sharing toxic material by automatically generating
interventions that explain why certain content is inappropriate. We introduce a Toxic Code- Mixed
Intervention Video benchmark dataset (ToxCMI), comprising 1697 code-mixed toxic video utterances
sourced from YouTube. Each utterance in this dataset has been meticulously annotated for toxicity
and severity, accompanied by interventions provided in Hindi-English code-mixed languages.
Reference:
Krishanu Maity, A.S. Poornash, Sriparna Saha and Kitsuchart Pasupa.
"ToxVI: a Multimodal LLM-based Framework for Generating Intervention in Toxic Code- Mixed Videos"
In Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, October 21–25, 2024, Boise, Idaho, USA CIKM 2024, (Core Rank A)
MedSumm:
Description:
We present the Multimodal Medical Code-mixed Question Summarization (MMCQS) dataset,
the first of its kind, which pairs Hindi-English code-mixed medical queries with visual aids and
corresponding English summaries. The MMCQS dataset comprises medical queries, across 18 different
medical disorders offering enriched representations of patients' conditions through the integration
of visual cues.
Reference:
Ghosh, Akash, Arkadeep Acharya, Prince Jha, Sriparna Saha, Aniket Gaudgaul, Rajdeep Majumdar, Aman Chadha, Raghav Jain, Setu Sinha, and Shivani Agarwal.
"MedSumm: A Multimodal Approach to Summarizing Code-Mixed Hindi-English Clinical Queries." In European Conference
on Information Retrieval, pp. 106-120. Cham: Springer Nature Switzerland, 2024.
ClipSyntel:
Description:
We present the Multimodal Medical Question Summarization (MMQS) Dataset, designed to harness
the unexploited potential of integrating textual queries with visual representations of medical
conditions. By pairing medical inquiries with corresponding visual aids, this dataset aims to
enhance and refine the comprehension of patient needs. The MMQS Dataset comprises 3,015
instances, offering a valuable resource for the development of more sophisticated multimodal
approaches in medical query analysis.
Reference:
Ghosh, Akash, Arkadeep Acharya, Raghav Jain, Sriparna Saha, Aman Chadha, and Setu Sinha.
"Clipsyntel: clip and llm synergy for multimodal question summarization in healthcare."
In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 20, pp. 22031-22039. 2024
Hate Speech Detection from Videos in Code-mixed Setting:
Paper Title:
ToxVidLLM: A Multimodal LLM-based Framework for Toxicity Detection in Code-Mixed Videos
Description:
We introduce ToxCMM, an openly accessible dataset extracted from YouTube that is meticulously
annotated for toxic speech, with utterances presented in code-mixed form. Each sentence within
the videos is annotated with three crucial labels, namely Toxic (Yes / No), Sentiment
(Positive / Negative / Neutral), and Severity levels (Non-harmful / Partially Harmful /
Very Harmful). This extensive dataset comprises 931 videos, encompassing a total of 4021
utterances. The release of the ToxCMM dataset is intended to foster further exploration in
the realm of multi-modal toxic speech detection within low-resource code-mixed languages.
Reference:
Krishanu Maity, A.S. Poornash, Sriparna Saha and Pushpak Bhattacharyya. "Multimodal Toxicity Detection in Code-Mixed Video Content: Unveiling a Hindi-English Benchmark Dataset", Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Long Papers), ACL 2024, Bangkok, Thailand, August 11-16, 2024, (Core Rank A*)
Multimodal Pharmacovigilance:
Title:
"Enhancing Adverse Drug Event Detection with Multimodal Dataset: Corpus Creation and Model Development" (Accepted in ACL findings 2024)
Description:
The mining of adverse drug events (ADEs) is pivotal in pharmacovigilance, enhancing patient safety by identifying potential risks
associated with medications. Previous ADE mining studies have focused on text-based methodologies, overlooking visual cues, limiting
contextual comprehension, and hindering accurate interpretation. To address this gap, we present a MultiModal
Adverse Drug Event (MMADE) detection dataset, merging ADE-related textual information with visual aids.
To access the dataset, please follow the GitHub link: 'https://github.com/singhayush27/MMADE.git'. If you are using the dataset, don’t forget to cite the above-mentioned original paper.
Towards Emotion-aided Multi-modal Dialogue Act Classification:
Description:
A new dataset- multimodal Emotion aware Dialogue Act dataset called EMOTyDA, collected from open-sourced dialogue datasets.EMOTyDA dataset is curated by collecting conversations from two open sourced datasets IEMOCAP and MELD.
Both IEMOCAP and MELD have pre-annotated emotion labels.
The 12 DA annotated categories are "Greeting (g)", "Question (q)", "Answer (ans)", "Statement-Opinion (o)", "Statement-Non-Opinion (s)", "Apology (ap)", "Command (c)", "Agreement (ag)", "Disagreement (dag)", "Acknowledge (a)", "Backchannel (b)" and "Others (oth)".
Reference:
T. Saha, A. Patra, S. Saha and P. Bhattacharyya (2020), `` Towards Emotion-aided Multi-modal Dialogue Act Classification", In ACL 2020, July 5-10, 2020, Seattle, Washington (Category A*).
Sentiment and Emotion aware Multi-modal Speech Act Classification in Twitter (Tweet Act Classification) : EmoTA.
Description:
EmoTA dataset is curated by collecting tweets from an open-sourced tweet dataset named SemEval-2018.
SemEval-2018 dataset has pre-annotated multi-label emotion tags.
The 7 manually annotated TA tags are “Statement” (sta), “Expression” (exp), “Question” (que), “Request” (req), “Suggestion” (sug), “Threat” (tht) and “Others” (oth).
The sentiment label for tweets are obtained following a semi-supervised approach using the IBM Watson Sentiment Classifier(https://cloud.ibm.com/apidocs/natural-language-understanding#sentiment). EmoTA dataset contains the silver-standard sentiment tags.
Reference:
T. Saha, A. Upadhyaya, S. Saha, P. Bhattacharyya (2021), "Towards Sentiment and Emotion aided Multi-modal Speech Act Classification in Twitter", in NAACL-HLT 2021, June 6-11, 2021 ( Category A).
A Multitask Framework for Sentiment, Emotion and Sarcasm aware Cyberbullying Detection from Multi-modal Code-Mixed Memes.
Description:
We have created a benchmark multi-modal (Image+Text) meme dataset called MultiBully annotated with bully, sentiment, emotion and sarcasm labels collected from open-source Twitter and Reddit platforms. Moreover, the severity of the cyberbullying posts is also investigated by adding a harmfulness score to each meme. Out of 5854 memes in our database, 2632 were labeled as nonbully, while 3222 were tagged as bullies.
Reference:
Maity, K., Jha, P., Saha, S. and Bhattacharyya, P., 2022, July. A multitask framework for sentiment, emotion and sarcasm aware cyberbullying detection from multi-modal code-mixed memes. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 1739-1749).
Ex-ThaiHate: A Generative Multi-task Framework for Sentiment and Emotion Aware Hate Speech Detection with Explanation in Thai.
Description:
We have developed Ex-ThaiHate, a new benchmark dataset for explainable hate speech detection in the Thai language. This dataset includes hate, sentiment, emotion and rationales labels. The dataset comprises 2685 hate and 4912 non-hate instances.
Reference:
Maity, K., Bhattacharya, S., Phosit, S., Kongsamlit, S., Saha, S. and Pasupa, K., 2023, September. Ex-ThaiHate: A Generative Multi-task Framework for Sentiment and Emotion Aware Hate Speech Detection with Explanation in Thai. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (pp. 139-156). Cham: Springer Nature Switzerland.
GenEx: A Commonsense-aware Unified Generative Framework for Explainable Cyberbullying Detection in Hindi-English Code-mixed language
Description:
We created an explainable cyberbullying dataset called BullyExplain, addressing four tasks simultaneously: Cyberbullying Detection (CD), Sentiment Analysis (SA), Target Identification (TI), and Detection of Rationales (RD). Each tweet in this dataset is annotated with four classes: Bully (Yes/No), Sentiment (Positive/Neutral/Negative), Target (Religion/Sexual-Orientation/Attacking-Relatives-and-Friends/Organization/Community/Profession/Miscellaneous), and Rationales (highlighted parts of the text justifying the classification decision). The rationales are not marked if the post is non-bullying, and the target class is selected as NA (Not Applicable). The BullyExplain dataset comprises a total of 6,084 samples, with 3,034 samples belonging to the non-bully class and the remaining 3,050 samples marked as bully. The number of tweets with positive and neutral sentiments is 1,536 and 1,327, respectively, while the remaining tweets express negative sentiments.
Reference:
Maity, K., Jain, R., Jha, P., Saha, S. and Bhattacharyya, P., 2023, December. GenEx: A Commonsense-aware Unified Generative Framework for Explainable Cyberbullying Detection. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (pp. 16632-16645).
A deep learning framework for the detection of Malay hate speech
Description:
We created a dataset called HateM in Malay, where we looked at each tweet and marked it as either hate or non-hate. The dataset has 3,002 tweets marked as non-hate and 1,890 tweets marked as hate.
Reference:
Maity, K., Bhattacharya, S., Saha, S. and Seera, M., 2023. A deep learning framework for the detection of Malay hate speech. IEEE Access.
Emotion, Sentiment, and Sarcasm aided Complaint Detection:Description:
We extend the Twitter-based Complaints dataset with the
emotion, sentiment, and sarcasm classes. The extended Complaints dataset
consists of 2214 non-complaints and 1235 complaint tweets in English.
Reference:
A. Singh, A. Nazir, S. Saha (2021), ``Adversarial Multi-task Model for Emotion, Sentiment, and Sarcasm aided Complaint Detection", in 44th European Conference on Information Retrieval (10-14 April 2022), ECIR 2022 (core ranking A), Norway.
Sentiment and Emotion-Aware Multi-Modal Complaint Identification:Description:
We curate a new multimodal complaint dataset- Complaint, Emotion, and Sentiment Annotated Multi-modal Amazon Reviews Dataset (CESAMARD), a collection of opinionated texts (reviews) and images of the products posted on the website of the retail giant Amazon. The CESAMARD dataset comprises 3962 reviews with the corresponding complaint, emotion, and sentiment labels.
Reference:
A Singh, S. Dey, A. Singha, S. Saha (2021), ``Sentiment and Emotion-aware Multi-modal Complaint Identification", in AAAI 2022 (core rank A*).
Complaint and Severity Identification from Online Financial Content:Description:
We curate a Financial Complaints corpus (FINCORP), a collection of annotated complaints arising between financial institutions and consumers expressed in English on Twitter. The dataset has been enriched with the associated emotion, sentiment, and complaint severity classes. The dataset comprises 3149 complaints and 3133 non-compliant instances spanning over ten domains (e.g., credit cards, mortgages, etc.).
Reference:
A. Singh, R. Bhatia, and S. Saha (2022), "Complaint and Severity Identification from Online Financial Content", IEEE Transactions on Computational Social Systems.
Peeking inside the black box - A Commonsense-Aware Generative Framework for Explainable Complaint Detection:Description:
We extended the original Complaints dataset with causal span annotations for complaint and non-complaint labels. The extended dataset (X-CI) is the first benchmark dataset for explainable complaint detection. Each instance in the X-CI dataset is annotated with five labels: complaint label, emotion label, polarity label, complaint severity level, and rationale (explainability), i.e., the causal span explaining the reason for the complaint/non-complaint label.
Reference:
A. Singh, R. Jain, P. Jha, S. Saha (2023), ``Peeking inside the black box: A Commonsense-aware Generative Framework for Explainable Complaint Detection”, ACL 2023 (Core rank: A*).
Knowing What and How - A Multi-modal Aspect-Based Framework for Complaint Detection:Description:
he CESAMRD-Aspect dataset consists of aspect categories and associated complaint/non-complaint labels and spans five domains (books, electronics, edibles, fashion, and miscellaneous). The dataset comprises 3962 reviews, with 2641 reviews in the non-complaint category (66.66%) and 1321 reviews in the complaint category (33.34%). Each record in the dataset consists of the image URL, review title, review text, and corresponding complaint, polarity, and emotion labels. The instances in the original CESAMARD dataset were grouped according to various domains, such as electronics, edibles, fashion, books, and miscellaneous. We take it a step forward by including the pre-defined set of aspect categories for each of the 5 domains with the associated complaint/non-complaint labels. All domains share three common aspects: packaging, price, and quality, which are essential considerations when shopping online.
Reference:
A. Singh, V. Gangwar, S. Sharma, S. Saha (2023), ``Knowing What and How: A Multi-modal Aspect-Based Framework for Complaint Detection", ECIR 2023 (Core rank A) , Dublin.
AbCoRD - Exploiting Multimodal Generative Approach for Aspect-based Complaint and Rationale Detection:Description:
We added rationale annotation for aspect-based complaint classes to the benchmark multimodal complaint dataset (CESAMARD) covering five domains (books, electronics, edibles, fashion, and miscellaneous). The causal span that best explains the reason for the complaint label in each aspect-level complaint instance was selected. Note that if the review is categorized as non-complaint as a whole, then all aspect-level annotations will also be marked as non-complaint. However, in cases where there is a complaint at the review level, certain aspects may still be considered non-complaint. Each review instance is marked with at most six aspects in the dataset.
Reference:
R.Jain, A. Singh, V. Gangwar, S. Saha (2023), ``AbCoRD: Exploiting multimodal generative approach for Aspect-based Complaint and Rationale Detection”, ACM Multimedia (Core rank: A*) .
Large Scale Multi-Lingual Multi-Modal Summarization DatasetDescription:
The current largest multi-lingual multi-modal summarization dataset (M3LS), and it consists of over a million instances of document-image pairs along with a professionally annotated multi-modal summary for each pair. Spans 20 languages, targeting diversity across five language roots, it is also the largest summarization dataset for 13 languages and consists of cross-lingual summarization data for 2 languages.
Reference:
Verma, Y., Jangra, A., Verma, R. and Saha, S., 2023, May. Large Scale Multi-Lingual Multi-Modal Summarization Dataset. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (pp. 3602-3614) ECIR (CORE Ranking: A).
Multimodal Rumour Detection: Catching News that Never Transpired!Description:
Extension of the PHEME 2016 dataset. The PHEME-2016 dataset initially lacked images. Images were collected for tweet threads with user-uploaded images mentioned in the metadata. For threads without images, we performed web scraping to augment visuals. Only source tweets were considered for image downloads to ensure relevance and appropriateness.
Reference:
Kumar, R., Sinha, R., Saha, S., Jatowt, A. (2023). Multimodal Rumour Detection: Catching News that Never Transpired!. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023 (CORE Ranking: A). Lecture Notes in Computer Science, vol 14189. Springer, Cham. https://doi.org/10.1007/978-3-031-41682-8_15