关于LarKC

一:简要介绍

LarKC: The Large Knowledge Collider ( 大规模知识加速器 )

LarKC 主页:http://www.larkc.eu

LarKC中文主页: http://cn.larkc.eu

欧盟第7框架计划(FP7)的LarKC项目的目标是开发大规模知识加速器(LarKC,其发音为“lark”),LarKC被设计为一个大规模分布式不完备推理平台。按照LarKC (The Large Knowledge Collider) 项目的起源,其中文可译为大规模知识对撞机,原因是这个名字的由来是受到了欧盟原子能研究组织开发的大规模强子对撞机(Large Hadron Collider)的启发。如果用简短的话来描述LarKC项目的目的,那便是:基于海量数据的分布式不完全搜索推理平台。分布式体现在数据集(目前主要处理的是RDF格式的数据)分布在万维网、本地等不同的来源,不完全体现在“在有限时间内,基于海量数据的确定性推理几乎是不可能的”,因此只能“在不完全数据上进行令用户足够满意的推理”,而平台则体现在LarKC将与基于语义Web的问题求解组件都以插件的形式组织在一起,通过一个管道(Pipe line)进行调用。目前在管道中已经考虑的插件类型有:Transformer, Selector, Reasoner, Decider等。正如Cyc 的CTO Michael Witbrock在LarKC的Blog中描述的[http://blog.larkc.eu/?p=1401],“这种体系结构的设计意义在于对于人工智能的研究者来说,当面对基于海量数据的问题求解时,你不需要将任何事情都从头做起”。该平台用于突破语义万维网(Semantic Web)推理系统目前面临的知识处理规模瓶颈。

图 LarKC架构总览

LarKC项目将通过以下几方面的努力实现以上目标:

l 扩充现有基于逻辑的语义万维网推理方法:计划通过运用信息检索、机器学习、信息论、数据库、概率推理等学科的理论研发新的推理方法;

l 使用受认知科学启发的方法与技术:如传播激活(spreading activation)、注意(attention)、强化(reinforcement)、 习惯(habituation)、关联推理(relevance reasoning)、有限合理性(bounded rationality);

l 构建分布式推理平台:计划将在高性能计算集群上及互联家庭计算机平台上实现。

LarKC项目已经发布的阶段性报告可以从以下地址获得:

http://www.larkc.eu/deliverables/

在LarKC项目支持下发表的论文可以从下列地址获得:

http://www.larkc.eu/publications/

部分LarKC相关关键技术的简要介绍可以从下列地址获得:

http://wiki.larkc.eu/TechnologyTopics

二:阅读阶段性报告

WP1: 概念性框架及评估;

WP2: 语义空间信息检索和选择(语义空间模型);

WP4: 逻辑推理和决策技术报告;

WP5: 各样模式的分析设计实现和时间测试;

2.1 larkc_d111_overview-of-relevant-work-in-other-areas_m6

奥地利因斯布鲁克大学(UIBK)-语义技术研究院(STI Innsbruck)

第二部分 Approximation Theory 中提到基于案例的推理(Case-Based Reasoning), 概率性推理(Probabilistic Reasoning), 粒状推理(granular reasoning). 及这些和LarKC的相关性.

第三部分介绍 量子逻辑(Quantum Logics) 及和LarKC的相关性.

第四部分介绍 元推理(Meta-Reasoning) 及和LarKC的相关性.

第五部分Cognitive Architecture 介绍SOAR, ACT-R, Detractors及这些和LarKC的相关性.

第六部分 Similarity Theories 介绍 相似问题(The Trouble with Similarity), 度量模型(Metric Models), 特征模型(Feature Models), 结构化的映射模型(Structural Mapping Models), 转化分布模型(Transformation Distance Models) 及这些和LarKC的相关性.

第七部分介绍 脑科学(Brain Science) 及和LarKC的相关性.

第八部分 Economics 介绍 Supply-Chain Management, Cost-Benefit Analysis, Risk Management及这些和LarKC的相关性.

第九部分 有限理性(Bounded Rationality) 及和LarKC的相关性.

第十部分 Patterns and Pattern Languages介绍核心概念(Core Concepts) 及和LarKC的相关性.

第十一部分 分布式和并行计算(Distributed and Parallel Computing) 及和LarKC的相关性.

第十二部分 软件工程 waterfall model, 螺旋模型(Spiral model), Agile Software Development及和LarKC的相关性.

第十三部分 总结和未来工作.

第二部分中对LarKC推理部分的展望是这么说的: LarKC中的推理一定要考虑多种策略, 可以根据需求或限制条件选择第二部分提及到的策略中的特定一个. 多策略的使用可能会加速推理过程. Web推理同样需要整合Rule-based 和Case-based的推理. (WP4)

第九部分有限理性中提到LarKC会同时采用启发式推理和基于规则推理. (WP4)

为了快速了解LarKC的逻辑推理部分, 直接进入WP4的技术报告内容.

2.2 larkc_d41_survey-of-web-scale-reasoning

荷兰 (VUA)-阿姆斯特丹自由大学

主要是针对Web规模推理和多样化推理分支的关联做了调研, 这些推理方法主要是在如下领域中发展起来的: Computer Science, Artificial Intelligence, Logics, Cognitive Science和其他的相关领域. 所调研的推理分支包括: approximate reasoning, resource bounded reasoning, rule-based reasoning, contextual and modular reasoning, distributed and parallel reasoning. 所有的推理方法需要遵守Web规模推理的5个方面: 可扩展性(scalability), 异质性( heterogeneity ), Dynamics, Inconsistency, Parallelism.

关键词: ontology reasoning, web scale reasoning, the Semantic Web.

第一部分: Introduction.

General background: 语义网中知识的表示和推理从1999年左右开始成为一项重要挑战. Semantic Web vision的目标是在world-wide规模级把信息结构化. 但通常学者开发的推理方法都是rather small, closed, trustworthy, consistent and static domains. (受限的一阶逻辑子集)

The goals of LarKC: 开发Reasoning plus-in, 调研现有的reasoning技术方法, 进行扩展使其能够应用到web scale reasoning中. More exactly, 需要调研如何使用这些方法能够实现如下goals:

1). Scalability; 2).Heterogeneity; 3). Dynamics; 4).Inconsistency; 5).Parallelism.

要实现这些goals就要采取一些不同的方法, the families of approaches 是:

Approximate Reasoning: 主要来应对scalability problem;

Resource Bounded Reasoning: 启发式搜索进行资源限定推理 not only perform as well as more complex algorithms, they can also perform better;

Rule-based Reasoning for dynamic and incomplete knowledge: 在rule-based技术上进行研究如何扩展来适应Web scale reasoning.

Contextual and Modular Reasoning;

Distributed and Parallel Reasoning.

Linking reasoning tasks to types of use-cases.

第二部分: Approximate Reasoning.

Approximate reasoning is nonstandard reasoning which is based on the idea of sacrificing soundness or completeness for significant speed-up reasoning.

第三部分: Resource Bounded Reasoning.

Bounded Rationality as Ecological Rationality. Heuristic may enhance the ability to solve a problem efficiently.

第四部分: Rule-Based Reasoning for Dynamic and Incomplete Knowledge.

A Logical formalism is described monotonic. The classical problem to illustrate that this is the conclusion that a certain individual can fly, based on the assertions i) that birds can fly and ii) that the individual is a bird. The specific individual is a penguin. Thus, logical reasoning would not be flexible enough to reflect human common sense and that it could not handle inconsistent data. We present the most well-known examples of nonmonotonic logics:

1). Default Logic: It applies inference rules that admit exceptions;

2). Circumscription

3). Auto epistemic Logic: Modal logics are extensively studied as a basis for formalizing nonmonotonic reasoning.

第五部分: Contextual and Modular Reasoning.

第六部分: Distributed Reasoning and Parallel Reasoning.

We present two approaches of distribution, both should be considered for LarKC. The first approach is a P2P architecture and the second approach is based on a Web Service.

第七部分: Use Cases and Reasoning Tasks.

1. Make categories of use-cases;

2. Survey the most prominent Semantic Web system applications and prototypes of the past few years, and classify each into one or more of these use-case categories.

3. Link each of the use-case categories to different basic reasoning tasks.

4. Discuss the relation between different reasoning tasks and the large scale reasoning approaches.

7.4 Semantic Web reasoning tasks

Notation, o is an ontology, c is a concept defined in such an ontology, and i is an instance belonging to such a concept, r is a relation.

Realization:

Realization is the task of determining which concepts a given instance is a member of.

Signature: i : instance×o: ontology |→ c: concept

Definition: Find a c such that o |= ic

Subsumption:

Subsumption is the task of determining whether one concept is a subset of another:

Signature:c1: concept×c2: concept|→ bool

Definition: Determine whether c1包含于c2

Mapping:

Mapping is the task of finding a correspondence relation between two concepts.

We follow the definition proposed in the survey report:

Signature: c1: concept×c2: concept |→ r: relation

where any relation r is either an equivalence (=), subsumption (包含于,被包含于), or disjointness (⊥).

Definition: find an r such that c1 r c2.

Retrieval:

Retrieval is the inverse of realization: determining which instances belong to a given concept:

Signature: c: concept |→ i: instances

Definition: find i such that ic

Classification:

Classification is the task of determining where a given class should be placed in a subsumption hierarchy:

Signature: c: concept×h: hierarchy |→ < cl, ch>

Definition: Find a highest subclass cl and a lowest superclass sh such that cl 包含于c包含于ch

X means be used. Notice that the table above only displays the minimally required reasoning tasks for each use-case.

实体词与概念词之间的关系.

林书豪有几只手:

林书豪 |→ 人 Realization 找实例对应的概念

手 |→ 肢体 Realization 找实例对应的概念

人 – 肢体 Mapping 概念之间查关系

2.3 larkc_d431-strategies-design-for-interleaving-reasoning-and-selection-of-axioms_final

1). Query-based selection; 2). Granularity-based selection; 3). Language-based selection.

2.2 Framework of Web Scale Reasoning by Interleaving Reasoning and Selection

Algorithm 2.1: Selection-Reasoning-Loop

Repeat

Selection: Select a (consistent) subset

Reasoning: Reasoning with the subset to get answers

Decision: Deciding whether or not to stop the processing

Until Answers are returned.

第四部分: Interleaving Reasoning and Selection in the LarKC Platform

The reasoning plug-in supports the four standard methods for a SPARQL endpoint, namely select, describe, construct and ask.

4.4 Semantic Relevance based Selection Functions

Google Distances are used to measure the co-occurrence of two keywords over the Web.

Similarity(x, y) = 1−Distance(x, y).

5.2.1 Knowledge Graph

2.4 LarKC_d432-Implemented-plug-ins-for-interleaving-reasoning-and-selection-of-axioms_final

• Selection: Use a selector to select part of data.

• Reasoning: Use a reasoner to reason over the selected data;

• Deciding: Use a decider to decide whether or not the procedure should bestopped and return an answer or go back to the selection step to continue the interleaving processing.

2.3 Implementation of DIGPION

The main tasks of the DIG Reasoner plug-in are:

• Data Translation. Because the data set imported to a reasoner plug-in in the LarKC platform is designed to be a set of statements. The first step of the DIG reasoner plug-in is to translate a set of statements (ontology data) into a DIG data, so that it can be posted to the external DIG reasoner. If it is an OWL-DL data, the system will use the OWL2DIG library to translate it into a DIG data.

• Query Translation. Since the query to a reasoner plug-in in the LarKC platform is designed to be a SPARQL query, that query should be translated into DIG queries, so that they can be posted into the external DIG reasoner.

• Query and answer processing. The DIG reasoner plug-in may have to make several DIG queries to get the complete answer to a given SPARQL query. For example, we cannot express a single DIG query which involves two variables such as SubClassof(x,y). However, SPARQL is expressive to provide a query which involves multiple variables. The reasoning result of a SPARQL query can be obtained by multiple DIG query steps, in which one step is used to obtain variable binding of a single variable, then another step is used to obtain variable binding of another variable by instantiating a variable of the corresponding the DIG query.

• Translate DIG answers into SPARQL answers. Since the output of a reasoned plug-in is designed to be a SPARQL answer (say, a variable binding for a SPARQL select query), the system have to translate the DIG answers into their SPARQL answers.

[参考] 1. LarKC中文官方网站 http://cn.larkc.eu

2. LarKC阶段性报告http://www.larkc.eu/deliverables/

发表评论

电子邮件地址不会被公开。 必填项已用 * 标注

*

您可以使用这些 HTML 标签和属性: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>