An Extract is a system for creating a sequential file from a relational marketing database. The extract can then be used for preparing reports, or for sending data to other companies for their own personal use. Data extraction is a place from where data is seen and analyzed, taken from a database. When this is done, data processing continues and it involves including metadata and any other data combination. Usually, an extract is actually the primary data which is installed into a computer, mainly from an external source, such as an USB. Data is usually extracted from unstructured links such as emails, PDF files, web pages, scanned texts etc. Extracting data from these unstructured links is quite a technical challenge because data extraction has to deal with changes in physical hardware formats.
Adding structure to unstructured data undergoes a number of forms, which are the following: using text pattern to identify small or large structure, for example records in a report, using an approach based on tables to identify similar sections within a limited domain, for example in emails to identify skills or to identify previous work experience etc; and using text analytics to try to understand the text and to link it to other information.