Natural Language Processing

NLP - Linguistic Resources

NLP - Linguistic Resources

Linguistic resources are required for the creation of grammars, in the framework of symbolic approaches, or for the training of machine learning modules. The word corpus means body in Latin, but when used as a data source in linguistics, it can be interpreted as a collection of texts. A collection of linguistic data, either written texts or transcriptions of recorded speech can be used to begin linguistic description. The linguistic data consortium (LDC) owns a large catalog of written and spoken corpora covering a wide range of languages. ELRA2 is a European language resource agency that collects, distributes, and validates spoken, written, and terminological linguistic resources, as well as software tools.

A corpus is a sizable, organized collection of texts that are machine-readable and were created in a context where communication was natural. Corpora are plural. They can be derived in a variety of ways, including electronic text, transcripts of spoken language, optical character recognition, and so on.
Sampling is yet another crucial component of corpus design. Sampling has a strong relationship with corpus representativeness and balance. As a result, sampling is unavoidable in corpus building.

The following practical factors and the intended use of the corpus will all affect how big the corpus is:

  • The type of question expected from the user.
  • The method by which the users studied the data.
  • The availability of the data source.

Top course recommendations for you

    How to Build your own Chatbot using Python?
    2 hrs
    Beginner
    34.1K+ Learners
    4.51  (2501)
    Face Detection with OpenCV in Python
    2 hrs
    Intermediate
    16.7K+ Learners
    4.47  (801)
    Introduction to Artificial Intelligence
    1 hrs
    Beginner
    154.8K+ Learners
    4.47  (21765)
    AI for Leaders
    2 hrs
    Advanced
    4.8K+ Learners
    4.49  (265)
    AI Foundation
    4 hrs
    Beginner
    8.3K+ Learners
    4.51  (203)
    Multilayer Perceptron
    2 hrs
    Intermediate
    3K+ Learners
    4.65  (220)
    Deepfakes Basics
    2 hrs
    Intermediate
    5.1K+ Learners
    4.5  (238)
    Convolutional Neural Networks
    3 hrs
    Intermediate
    14.3K+ Learners
    4.58  (733)
    Neural Network in R
    2 hrs
    Intermediate
    6.2K+ Learners
    4.61  (302)