Data engineering for natural language