Projects

  1. Project Vacaspati
    Our corpus VĀCASPATI is varied from multiple aspects, including type of composition, topic, author, time, space, etc. It contains more than 11 million sentences and 115 million words. We also built a word em- bedding model, VĀC-FT , using FastText from VĀCASPATI as well as trained a BERT model, VĀC-BERT , using the corpus. VĀC-BERT has far fewer parameters and requires only a fraction of resources compared to other state- of-the-art BERT models and yet performs either better or similar on various downstream tasks.
  2. Project Annotation
    If you are interested you can contact us for annotation on Lemmatization, POS-tagger, NER for Bangla. This will be a paid position. Contact details is available on the contact page