Automatic text classification approach for aerospace pdf documents using NLP techniques

Nabil Abdoun, Mohammad Chami (SysDICE GmbH)

Keywords
Systems Engineering;Artificial Intelligence;Aerospace
Abstract

One of the regular activities performed by engineers during the design and development of the technical systems are to determine which statements in an engineering specification document rep-resent a requirement, functional architecture, design solution, variability, or other types of systems engineering (SE) information. Capturing such text from the document is still manually performed, which requires competence in the modeling language, high effort, and error prone. That is why automatic extraction and classification of such SE information is an important task. But understanding and extracting such information from big documents are still relatively scarce and a challenging task. By following suitable writing and markup conventions, it can provide an immediate and easy way to classify and analyze the document. However, such conventions are not always followed strictly. That is why we propose a solution that can label SE documents, categorize them, and classify each statement into predefined classes.