WebKB-2 focused short-term research/design plan

This plan does not detail in any way some potential applications
of WebKB-2 in education, research, e-health, tourism and biology.
Click here if you are after a "statement of research".
Dr Philippe Martin


My research goals are the acquisition and retrieval of succinctly yet comprehensively organised information for reasoning, learning or collaboration. So far, as detailed in my "summary of works", my approach has mainly relied on the following elements:
1) the design of various user-friendly notations to represent, index or retrieve knowledge or parts of documents, some of them (such as FCG and Formalized English) being expressive and normalizing (i.e., reducing the number of incomparable ways some information can be represented), other being adapted to certain tasks (e.g., my semi-formal notation for structured discussions);
2) the design of (i) a large-scale KB server (WebKB-2), (ii) protocols that permit users to tightly interconnect their knowledge into the shared KB without having to discuss and agree on terminology or beliefs, and (iii) an algorithm to evaluate the popularity and originality of each contribution and contributor to structured discussions;
3) the transformation of WordNet into a genuine lexical ontology, its integration with several top-level ontologies, and also the association of schemas to high-level or medium-level concepts for example to guide knowledge entering (in WebKB-2 "cascading knowledge entering forms" are generated from the schemas) or (semi-)automatic natural language understanding (NLU);
4) the creation of querying mechanisms and browsing interfaces that are adapted to large KBs and expressive knowledge created by different authors (in WebKB-2, this is achieved via the use of concise user-friendly notations, of filters on the authors and kind of knowledge, and of different kinds of efficient matching procedures between the queries and the statements of the KB; much work is still required, for example the implementation or re-use of a rule-based system would complement the matching procedures).

As detailed in the following sections, my current general research plan is to continue working on these previous elements and some new ones (e.g., NLU), whenever possible in the context of applications, with other researchers and students, and hence in relation with tools other than WebKB-2.


1. Mechanisms and experiments of cooperatively-built repositories

The editing protocols used in WebKB-2 (to encourage knowledge interlinking and keep the KB consistent), and the rules they exploit to detect redundancies between statements, need some refinements (e.g., the rules are currently a bit too restricting). The algorithm valuating contributions and contributors in structured discussions needs to be tested in various situations in order to be refined and then also extended to other types of knowledge. To that aim, and also because cooperative built semi-formal states of the art of concepts, ideas, tools and techniques in a domain would be very interesting for students, researchers and industrials (for search, comparison, learning, advertising and feed-back purposes), it would be interesting to extend and multiply the experiments I have begun in that direction, especially my ontology of knowledge management tools (which for now is mainly focused on organising and comparing features of Conceptual Graph tools). Since argumentation systems are usually applied to support group decision making, especially for the design of systems or policies (e.g., SIBYL, gIBIS, ArgNoter), my approach for structured discussions could also be applied in these areas too.

Currently in the shared KB of WebKB-2, every piece of knowledge is public. Allowing a user to specify rules to hide some of his/her information until he/she receives some other information about the reader, would be interesting, for example in a repository for a dating agency.


2. Content of the default ontology in the shared KB

Some elements of my work on the MSO (Multi Source Ontology) should be completed (the integration of DOLCE and SUMO) or updated (adaptation to the last version of WordNet). Other elements would be interesting to add, for example OntoWordNet and the CIA World factbook. Extended WordNet provides some structural interpretations of the WordNet glosses. After converting these interpretations into logic-based forms, some NLU can be performed on the glosses to refine these forms and hence associate definitions or schemas to the categories. The more relations between the categories, and the more definitions they have, the more it will be possible to match statements in the KB and hence support manual or automatic knowledge representation, checking and retrieval.


3. Import/export of knowledge

The import/export procedures of WebKB-2 from/to a few selected languages (typically, KIF, CGIF, RDF+OWL, FCG and Formalized English) should be completed for all the kinds of knowledge that these notations can represent. However, this work mainly refers to syntactical translations: when the destination notation is less expressive that the source notation, comments are used in the destination notation to represent what cannot be translated directly.

I designed FCG (Frame Conceptual Graph) and Formalized English to be more normalizing and easier to read and use than other notations, especially for the representation of expressive information and natural languages. However, some expressions are still complex to represent, for example "2 by 2" and "in particular". Hence, it is important to continue refining these notations, for example by introducing formally defined syntactic elements for such uses of "by" and "especially".

For some applications, it is important to perform some NLU, for example to allow querying in English (although the user may be asked to choose between different Formalised English interpretations of the query) or to extract certain relationships between concepts within English documents (for example, relationships such as specialization, subtask, physical part, agent of, result, condition, etc.). To that aim, grammatical constraints and conceptual constraints (the categories associated to words and the schemas associated to the categories) should be combined to resolve as many ambiguities as possible and thus construct representations. It is however doubtful that such NLU would be of significant help in complementing ontologies or semi-formal states of the art in a domain since these tasks require precision and people's skills of interpretation.


4. Knowledge retrieval

WebKB-2 proposes various commands to query knowledge: search for specializations of the query (i.e., at least for simple statements, the query is a logic consequence of the retrieved statements), for generalizations, or a mix of both. For some applications, the implementation of more advanced path retrieval mechanisms and of a rule-based system would be interesting. For an heavily populated KB, it would be more important to structure the presented list of query results or list of statements and properties related to a concept. This structuring can be done 1) according to the specialization relations between the statements, and 2) according to various topics to which these statements are related (for example, topics that can be used to group statements about a certain person can be: physical characteristics, administrative details, possessions, work related activities, etc.). Both kinds of structuring can be combined. The first only requires implementation, the second requires research and experimentation.