Smart scroll tools4/30/2023 The key was to highlight sections only when it is possible to extract helpful keywords from them. The output of this process is an ordered list of suggested keywords for each of the sections.Ī significant challenge in the development of Smart Scrolling was how to identify whether a section or keyword is important - what is of great importance to one person can be of less importance to another. Once the keyword scores are retrieved from both models, we normalize and combine them by utilizing NLP heuristics (e.g., the weighted average), removing duplicates across sections, and eliminating stop words and verbs. But when used together, these two models complement each other, forming a balanced bias-variance tradeoff. The TF-IDF approach is conservative in the sense that it is prone to finding uncommon keywords in the text (high bias), while the drawback for the bidirectional transformer model is the high variance of the possible keywords that can be extracted. In parallel, the bidirectional transformer model, which was fine tuned on the task of extracting keywords, provides a deep semantic understanding of the text, enabling it to extract precise context-aware keywords. Furthermore, the model then aggregates these features into a score using a pre-trained function curve. The model does so, much like a standard TF-IDF model, by utilizing the ratio of the number of occurrences of a given word in the text compared to the whole of the conversational data set, but it also takes into account the specificity of the term, i.e., how broad or specific it is. The TF-IDF-based model detects informative keywords by giving each word a score, which corresponds to how representative this keyword is within the text. The conversational datasets were from the same domains as the expected product use cases, focusing on meetings, lectures, and interviews, thus ensuring the same word frequency distribution ( Zipf’s law). The extractive TF-IDF approach rates terms based on their frequency in the text compared to their inverse frequency in the trained dataset, and enables the finding of unique representative terms in the text.īoth models were trained on publicly available conversational datasets that were labeled and evaluated by independent raters. This enables parallel processing of the input text to identify contextual clues both before and after a given position in the transcript.īidirectional Transformer-based model architecture The bidirectional transformer is a neural network architecture that employs a self-attention mechanism to achieve context-aware processing of the input text in a non-sequential fashion. By using the bidirectional transformer and the TF-IDF-based models in parallel for both the keyword extraction and important section identification tasks, alongside aggregation heuristics, we were able to harness the advantages of each approach and mitigate their respective drawbacks (more on this in the next section). The first extracts representative keywords from each section and the second picks which sections in the text are the most informative and unique.įor each task, we utilize two different natural language processing (NLP) approaches: a distilled bidirectional transformer (BERT) model pre-trained on data sourced from a Wikipedia dataset, alongside a modified extractive term frequency–inverse document frequency (TF-IDF) model. The Smart Scrolling feature is composed of two distinct tasks. The models used are lightweight enough to be executed on-device without the need to upload the transcript, thus preserving user privacy. The user can then scroll through the keywords or tap on them to quickly navigate to the sections of interest. To increase the navigability of content, we introduce Smart Scrolling, a new ML-based feature in Recorder that automatically marks important sections in the transcript, chooses the most representative keywords from each section, and then surfaces those keywords on the vertical scrollbar, like chapter headings. Yet because Recorder can transcribe very long recordings (up to 18 hours!), it can still be difficult for users to find specific sections, necessitating a new solution to quickly navigate such long transcripts. Recorder makes editing, sharing and searching through transcripts easier. Last year we launched Recorder, a new kind of recording app that made audio recording smarter and more useful by leveraging on-device machine learning (ML) to transcribe the recording, highlight audio events, and suggest appropriate tags for titles. Posted by Itay Inbar, Senior Software Engineer, Google Research
0 Comments
Leave a Reply.AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |