In the digital age, the number of scientific articles is growing exponentially. For instance, the Open Research Knowledge Graph’s question-answering function, ASK, has already indexed over 80 million research articles. Extracting the most relevant information from these vast collections can be daunting for researchers, students, and academics. To tackle this challenge, search engines and digital libraries often rely on advanced search techniques, with one of the most effective being faceted search.
Faceted search is an advanced search method that allows users to filter and refine search results based on multiple predefined attributes, known as facets. Each facet represents a specific category or attribute of the data, such as publication year, author, domain, journal name, or keywords. Although faceted search offers significant advantages, traditional faceted search models can still face limitations when applied to large and diverse academic datasets. Often, these models offer static, predefined facets that do not adapt based on user interactions or the nature of the data being explored. This can lead to an overwhelming or inefficient user experience, especially in environments with vast and rapidly evolving datasets like digital libraries and academic search engines.
Image 1: Static faceted search in Google Scholar.
This is where dynamic facet generation comes into play. The key innovation behind dynamic facet generation is the ability to adapt and adjust facets in real-time, based on user inputs and the evolving nature of the dataset. This approach not only makes the search process more flexible and personalized but also allows for the discovery of relevant academic content in a much more efficient and intuitive manner.
Our Contribution
We have developed, proposed, and compared three distinct methods for Dynamic Facet Generation (DFG), each with its unique approach. These methods, illustrated in Image 2, include a symbolic approach and two neuro-symbolic approaches that integrate large language models (LLM) and knowledge bases.
1. KB2 (Knowledge Base-based): KB2 is a symbolic approach that leverages Wikipedia-based knowledge bases to enable dynamic facet generation. In this method, the knowledge base provides structured information that helps generate relevant facets for academic content.
2. KBLLM (Knowledge Base and Large Language Model-based): KBLLM represents a neuro-symbolic approach combining knowledge bases with the language prediction and understanding capabilities of an LLM. By combining the structured knowledge of a database with the flexibility of a language model, KBLLM generates facets more adaptive to user queries, offering nuanced and contextual refinement of search results.
3. KBLLMKA (Knowledge Base and Large Language Model with Knowledge Augmentation-based): KBLLMKA is an enhanced version of KBLLM that incorporates knowledge augmentation to further improve the LLM’s facet predictions. This augmentation provides additional context and relationships from the knowledge base, refining the LLM’s understanding and facet generation capabilities.
Image 2: Diagram illustrating our methodology and the three distinct approaches KB2, KBLLM, and KBLLMKA.
Evaluation
To evaluate the effectiveness of the three proposed Dynamic Facet Generation (DFG) methods – KB2, KBLLM, and KBLLMKA – we tested them on 26 distinct sets of research articles from various academic fields (« Arts and Humanities, » « Engineering, » « Life Sciences, » « Physical and Mathematical Sciences, » and « Social and Behavioral Sciences »). Each set contained an average of 9 articles. This diverse selection allowed us to assess each method’s adaptability and accuracy across a wide range of research domains. Our evaluation combined two key metrics: user ratings from a survey-based assessment and the average time taken for dynamic facet generation. KBLLM leads the way with a 7.2/10 rating, with an average time of 7.9 seconds for DFG, enhancing the overall user experience by providing quick and responsive filtering.
Image 3: Top-n facets generated using the KB2, KBLLM, and KBLLMKA facet generation methods for literature on « academic bullying evidence. »
Benefits for Academic Search Engines
Implementing the KBLLM approach for Dynamic Facet Generation (DFG) offers significant benefits for digital libraries. With KBLLM’s ability to dynamically generate and adapt facets in response to user inputs, digital libraries can provide a much more intuitive and efficient search experience for researchers, students, and academics. By integrating the flexibility of a large language model with structured knowledge from established databases, KBLLM creates contextually relevant and adaptive filters that guide users through complex datasets. This enables users to more easily identify relevant publications, refine search queries, and explore related areas within vast collections of research documents. Currently, we are integrating the approach into the Open Research Knowledge Graph project. ASK a question-answering service, allowing users to pose research questions across approximately 80 million academic articles.
Acknowledgments
This work was co-funded by the European Research Council for the ScienceGRAPH project (grant agreement ID: 819536) as well as the NFDI4Ing project funded by the German Research Foundation (project number 442146713) and NFDI4DataScience (project number 460234259).
This work was accepted at the 27th European Conference on Artificial Intelligence (ECAI 2024).
is a research assistant at the Knowledge Infrastructures Lab at TIB – Leibniz Information Center for Science and Technology
Mutahira Khalid is a research assistant at the Knowledge Infrastructures Lab at TIB – Leibniz Information Center for Science and Technology
is a professor of data science and digital libraries at Leibniz Universität Hannover and director of TIB
Sören Auer is a professor of data science and digital libraries at Leibniz Universität Hannover and director of TIB
leads the Knowledge Infrastructures Lab at TIB – Leibniz Information Center for Science and Technology
Markus Stocker leads the Knowledge Infrastructures Lab at TIB – Leibniz Information Center for Science and Technology