Summary of my works on knowledge acquisition/retrieval

Dr Philippe Martin


Chronologically organized summary

During my PhD, I designed CGKAT, a tool helping knowledge engineers to represent, link and search information within KBs and documents. To achieve this, I integrated the structured document editor Thot (the code of which is now re-used in the W3C's Amaya browser), the Conceptual Graph workbench Cogito, models of the KADS library, my extension of WordNet 1.5 with some top-level ontologies, and the Unix shell. Because of its re-use of Thot, CGKAT remains the only tool that fully integrates and combines KB management with document management.

During my postdoc in 1997 and then as a research fellow at Griffith University, I designed WebKB-1, a KB server enabling knowledge engineers to load or execute knowledge representations/queries that they have stored in Web documents and let them index any part of any Web document (furthermore, as in CGKAT, associating a query to an hyperlink permits the generation of a virtual document). By the end of 1997, WebKB-1 began to propose user-friendly notations for indexing and representing parts of Web document and a language of commands to retrieve them lexically, structurally and conceptually. Nowadays, many tools - generally exploiting an XML-based language - have similar purposes but the notations and commands they propose are most often very cumbersome, restricted in scripting and knowledge representation/querying capabilities, and cannot be used within informal textual documents. In Ontobroker (Fensel, 1998) and nowadays Semantic MediaWiki, some relations about the "object that a document is about" can be hidden within hyperlinks in this document; this approach is user-friendly for casual readers of the pages but quite un-friendly and restrictive for knowledge providers since only simple relations about the object of the page can be represented (hence, for example, the semantic content of a table cannot be represented within one document). To sum up, WebKB-1 integrates KB management with document management as much as current Web browsers permit.

From July 2000 to December 2003, as a senior research fellow at DSTC, I designed WebKB-2, a KB server that can not only manage a very large KB but also allows people or software agents to store and tightly interconnect their knowledge into it without having to discuss and agree on terminology or beliefs. To that end, I built a multi-source Web-accessible knowledge base management system above an object-oriented database and designed special editing protocols to encourage knowledge interlinking and keep the KB consistent (as far as the inference engine can tell). To initialise the default KB (the one "proposed" to people), I transformed WordNet 1.7 into a genuine lexical ontology, corrected it, and extended it with several top-level ontologies. The result of this merge (which, unlike most other merging efforts, modifies the source ontologies only if some inconsistencies within them are detected) and the on-going integration of the DOLCE and SUMO top-level ontologies, has been named the MSO (Multi Source Ontology). It has been voted "a material to work on" by the Standard Upper Ontology Working Group, and can be accessed and extended by any Web user or software agent via WebKB-2.

The large shared KBs of WebKB-2 are a necessary complement to the private KBs of WebKB-1 in order to help knowledge sharing and re-use by enabling people to interconnect their knowledge on a domain (or their top-level/generic ontologies) in an effective way. Merging/aligning ontologies is difficult even for knowledge engineers because these ontologies most often do not have enough formal and informal details to guide the alignment of the categories (i.e., the intentions of their authors are lost). Hence, automatic merging is bound to be even less optimal. With the WebKB-2 approach, people do not have to create separate ontologies to add concepts, relations or statements to existing ontologies: they directly add them in the shared KB, and when doing so, they are guided and checked by the large shared ontology into adding precise knowledge. Furthermore, the result is directly re-usable (unlike the "private ontologies" approach which inevitably leads people to create mutually redundant or inconsistent ontologies, hence to problems in choosing between them and re-using them). Stating that such a centralized approach is easier for entering and re-using knowledge is not denying that not all knowledge can be stored into the same KB nor that there will always be competing KB servers. However, provided that each server is allowed to and actually does copy knowledge from competing or slighly more specialized knowledge servers, the advantages of centralization and distribution can be combined and it does not matter which KB server the users add to or query.

For WebKB-1 and WebKB-2, I designed and continue to refine notations that are more intuitive, expressive and normalizing that currently existing notations. One of them is adapted to the case of simple relations between categories and permits to represent a large volume of knowledge in a structured way and a small amount of space (which is important for browsing a large KB). Two others, named Frame-CG and Formalized English, are derived from the Conceptual Graph Linear Form and not only extend it but improve on the qualities that made its success: its intuitiveness and "knowledge normalization" effect. That is, people are better led to follow good "lexical, structural and ontological principles" that I collected and refined. Thus, these notations ease the visualization, handling, entering and sharing of knowledge. WebKB-2 can also export some of its knowledge in some other notations such as RDF+OWL, CGIF and KIF. For WebKB-2, I also designed "cascading knowledge entering forms": such forms are automatically generated from definitions and general statements in the KB and they can be combined to guide knowledge entering on any object once its type has been selected. For an accommodation broker (namely, Wotif), I represented some information about accommodations on the Sunshine Coast (Australia) and completed the accommodation retrieval predefined interfaces that I designed (e.g., using Google maps) with an access to the above mentioned generated forms for unforeseen kinds of queries.

Since February 2004, I have extended WebKB-2 to permit it to exploit some external inference engines and, most importantly, support semi-formal repositories. For example, I designed a user-friendly notation providing all the necessary constructs to engage in or represent "structured discussions" (i.e., semantically organized collections of arguments and counter-arguments for various statements) and designed an algorithm to evaluate the popularity and originality of each contribution and contributor based on the structure of the discussions and votes on each statement.
Such cooperatively-built semi-formal repositories could potentially be used for precision-oriented corporate memories and as enhanced versions or complements to the current on-line yellow-pages and auction sites. However, I focused on showing how they could help researchers, lecturers and students by supporting them in cooperatively building a semantically structured "state of the art" and "learning object repository" in their domains. Although a distinction is made here and will often occur in practice, the same semantic network can, and ideally should, be used for both research and learning/teaching. I organised a workshop about such repositories and the use of structured discussions to build them.
As a starting point for a semantically structured "state of the art" in knowledge engineering and to permit the comparison of knowledge management tools, I have begun an ontology of such tools (for now an important part of this ontology is focused on Conceptual Graph tools). As part of my on-line teaching of Workflow Management and, this last semester of 2006, as part of my "Griffith E-Learning" research grant project, I represented a good part of the lecture materials for three different courses into semantic networks within WebKB-2, asked the students to contribute to parts of these semantic networks (e.g., as a replacement for an informal "learning journal" and hence as a better way to train and evaluate their critical thinking) and asked them to fill surveys about the approach. In general, they appreciated the help that the centralisation and categorisation of pieces of information scattered all accross the lecture materials provided them in accessing and understanding these information but did not enjoy having to "learn a notation" even though most of my face-to-face undergraduate students had no problem in reading it after a short explanation.


The links that companies may be most interested in

Note: the above referred interfaces, ontologies and languages need to be adapted to an application before being usable by most employees of a company.