From:Nexdata Date: 2025-10-14
To meet the client's requirement for natural conversation data in Indonesia, this project conducted 100 hours of data collection over a one-month period.
· Language: Indonesian
· Volume: 100 hours
· Type: Natural conversation data collection
· Duration: 1 month
· Background noise must not exceed 40dB; any noise must be avoided during recording. (Standard natural conversation recording allows for background noise to be no more than 50dB, and accepts sudden noise; online recording is difficult to meet this requirement.)
· Financial topics must account for at least one-third of the total content. (Standard natural conversation projects limit a single topic to no more than 30 minutes; to meet the client's financial topic ratio requirements, a dedicated financial topic recording team will be required.)
· Text: Indonesian contains a large number of colloquial vocabulary; some colloquial and formal terms vary in both writing and meaning, with no fixed rules.
· Labels: The client required multiple types of tags; due to the client's limited Indonesian proficiency, multiple tags (e.g., Arabic, Java) were added during the annotation process.
· Background noise: Shifting from online to offline data collection to control noise; selecting locations near schools for efficient data collection to prevent cost overruns.
· Collection subjects shifted from the general population to students; specifically recruiting accounting/finance students to record financial topics.
· Based on the acceptance report, a coordination meeting was held with the client and third-party acceptance team to reconfirm the transcription rule for colloquial terminology: "transcribe as heard." For terms with minor pronunciation differences, both the colloquial and formal versions will be considered correct (unification will be carried out in post-processing if necessary). The standard for comma/period usage was clarified: both usages are acceptable as long as they do not affect the meaning of the sentence.
· Regarding subjective background noise labeling, quality inspectors attended meetings to understand the client’s judgment criteria and achieved a high degree of subjective consistency with the client’s standards.
· Under tight schedules, project execution must adhere to the process (starting with a trial run, followed by mass production only after approval), rather than simply pursuing a deadline..
· During project execution, proactively identify the client's roles and responsibilities; clearly define the acceptance party and coordinate acceptance criteria through direct communication.
· For highly subjective issues, abandon "judgment by experience" and strictly adhere to the client's standards.
The core project experience includes: strictly implementing the pilot process to control quality risks; clarifying acceptance criteria and responsible parties in advance; and adhering to client standards for subjective issues.
If you have similar voice data collection/annotation needs (any language), please feel free to contact [email protected]. With over a decade of experience in professional data services, Nexdata can help you accelerate your AI journey.