May I help you?

Wednesday, April 1, 2015

Structured Data to Knowledge Graph and Knowledge Graph Identifcation,Bayes' Theorem and Theorem of total probability used as Predictive Algorithm.

Large-scale information processing systems are able to ex- tract massive collections of interrelated facts, but unfortunately trans- forming these candidate facts into useful knowledge is a formidable chal-
lenge. In this paper, we show how uncertain extractions about entities and their relations can be transformed into a knowledge graph. The ex-tractions form an extraction graph and we refer to the task of removing noise, inferring missing information, and determining which candidate facts should be included into a knowledge graph as knowledge graph identi cation. In order to perform this task, we must reason jointly about candidate facts and their associated extraction con dences, identify co- referent entities, and incorporate ontological constraints. Our proposed approach uses probabilistic soft logic (PSL), a recently introduced probabilistic modeling framework which easily scales to millions of facts. We
demonstrate the power of our method on a synthetic Linked Data corpus derived from the MusicBrainz music community and a real-world set of extractions from the NELL project containing over 1M extractions and 70K ontological relations. We show that compared to existing methods, our approach is able to achieve improved AUC and F1 with signi cantly lower running time.

The web is a vast repository of knowledge, but automatically extracting that knowledge at scale has proven to be a formidable challenge. Recent evaluation e orts have focused on automatic knowledge base population [1, 2], and many well-known broad domain and open information extraction systems exist, including the Never-Ending Language Learning (NELL) project , OpenIE , and e orts at Google , which use a variety of techniques to extract new knowledge, in the form of facts, from the web. These facts are interrelated, and hence,recently this extracted knowledge has been referred to as a knowledge graph .

A key challenge in producing the knowledge graph is incorporating noisy information from di erent sources in a consistent manner. Information extraction systems operate over many source documents, such as web pages, and use a collection of strategies to generate candidate facts from the documents, spanning syntactic, lexical and structural features of text. Ultimately, these extraction systems produce candidate facts that include a set of entities, attributes of these entities, and the relations between these entities which we refer to as the extraction graph.However errors in the extraction process introduce inconsistencies in the extraction graph, which may contain duplicate entities and violate key ontological constraints such as subsumption, mutual exclusion, inverse, domain and range constraints. Such noise obscures the true knowledge graph, which captures a consistent set of entities, attributes and relations.

Google work infers the knowledge graph from the extraction graph generated by an information extraction system.
We demonstrate that the errors encountered by information extraction systems require jointly reasoning over candidate facts to construct a consistent knowledge graph. Our approach performs entity resolution, collective classi cation and link prediction while also enforcing global constraints on the knowledge graph, a process which we refer to as knowledge graph identi cation.

In order to implement knowledge graph identi cation, we use probabilistic soft logic (PSL) , a recently introduced framework for reasoning probabilistically over continuously-valued random variables. PSL provides many advantages: models are easily de ned using declarative rules with rst-order logic syntax, continuously-valued variables provide a convenient representation of uncertainty,weighted rules and weight learning capture the importance of model rules, and advanced features such as set-based aggregates and hard constraints are supported. In addition, inference in PSL is a convex optimization that is highly scalable allowing us to handle millions of facts in minutes.

convex optimization is a mathematical concept,where in  3d System we define the quantity to be optimized along Z axis which is a function of a point in x-y plane,where x,y are the domain or parameters upon which Quantity to be optimized depends.
Mathematically it can be represented as
 Z=f(x,y) ,where z= the Quantity to be optimized
                       x,y=domain or variable upon which the Quantity Z depends.
We can extend this concept to n dimensional system too.

Here we develop a PSL model for knowledge graph identi cation that both captures probabilistic dependencies which will be analysed in the next paper between facts and enforces global constraints
between entities and relations. Through this model, we de ne a probability distribution over interpretations - or truth value assignments to facts - each of which corresponds to a possible knowledge graph. By performing inference using the extraction graph and an ontology, we are able to fi nd the most probable knowledge graph. We will try to establish the bene ts of our approach on two large datasets: a synthetic dataset derived from the MusicBrainz com and ontological relationships de ned in the Music Ontology as well as noisy extractions from NELL, a large-scale operational knowledge extraction system.
What you readers can try is
1) formulating the knowledge graph identi cation problem that supports reasoning about multiple, uncertain extractorsources in the presence of ontological constraints;
 2) solving knowledge graph identi cation e ciently with convex optimization using PSL; and
 3) demonstrating the power of knowledge graph identi cation by presenting results on benchmark datasets that are superior to state-of-the-art methods and generating massive knowledge graphs on the scale of minutes that are infeasible to compute in competing systems.


What is the Motivation behind Knowledge Graph Identi cation
 we represent the candidate facts from an information extraction system as a knowledge graph where entities are nodes, categories are labels associated with each node, and relations are directed edges between the nodes. Information extraction systems can extract such candidate facts, and these extractions
can be used to construct an extraction graph. Unfortunately, the extraction graph is often incorrect, with errors such as spurious and missing nodes and edges, and missing or inaccurate node labels. Our approach, knowledge graph identi cation (KGI) combines the tasks of entity resolution, collective classi cation
and link prediction mediated by rules based on ontological information.

From Structured Data to the Knowledge Graph
What actually transformed a Structured Data to the Knowledge Graph.
To understand this we require to understand the following terms,that can be of immense utility for web developers to introduce structured data into their HTML code and thus helping to produce Snippets which we call as brief introductory piece of information for a customer from search engine output corresponding to customer queries typed in search engine.

Agenda
● The end of search as we know it
● Knowledge Graph
● Schema.org and markup
● Developer tools
● What the future holds


If you type in the search engine any search queries the intelligent search engine does the following
  1. Answer your queries according to text entered.
  2. Anticipate from the text you have entered.
How search engine can Understand the world.Can it See the world through some sensors like camera,can it sense the environment through some sensors like temperature,humidity,sound etc sensors ,can it automatically learn environment ,its people and place information through some sensors and robot programs?

Knowledge Graph about an actor can look like this having a lot of nodes connected directly or indirectly to the central node that is the actor here.
Knowledge Graph helps answer user's queries,the text you enter in the search engine can give you direct answers and will also run prediction algorithm based on the text you have entered and history of that computer searches ,this pridiction algorithm is based on Bayes' Theorem.

Bayes' Theorem and Partition of a sample space
Consider that there are two bags I and II. Bag I contains 2 white and 3 red balls and Bag II contains 4 white and 5 red balls. One ball is drawn at random from one of the bags. We can find the probability of selecting any of the bags (i.e.12 ) or probability of drawing a ball of a particular colour (say white) from a particular bag (say Bag I). In other words, we can find the probability that the ball drawn is of a particular colour, if
we are given the bag from which the ball is drawn. But, can we find the probability that the ball drawn is from a particular bag (say Bag II), if the colour of the ball drawn is given? Here, we have to find the reverse probability of Bag II to be selected when an event occurred after it is known. Famous mathematician, John Bayes' solved the problem of finding reverse probability by using conditional probability. The formula developed by him is known as ‘Bayes theorem’ which was published posthumously in 1763.
Before stating and proving the Bayes' theorem, let us first take up a definition and some preliminary results.

Partition of a sample space


A set of events E1, E2, ..., En is said to represent a partition of the sample space S if
(a) Ei ∩ Ej = φ, i ≠ j, i, j = 1, 2, 3, ..., n
(b) E1 ∪ Ε2 ∪ ... ∪ En= S and
(c) P(Ei) > 0 for all i = 1, 2, ..., n.

In other words, the events E1, E2, ..., En represent a partition of the sample space S if they are pairwise disjoint, exhaustive and have nonzero probabilities.As an example, we see that any nonempty event E and its complement E′ form a partition of the sample space S since they satisfy E ∩ E′ = φ and E ∪ E′ = S.From the Venn diagram in Fig 13.3, one can easily observe that if E and F are any two events associated with a sample space S, then the set {E ∩ F′, E ∩ F, E′ ∩ F, E′ ∩ F′}is a partition of the sample space S. It may be mentioned that the partition of a sample space is not unique. There can be several partitions of the same sample space.We shall now prove a theorem known as Theorem of total probability.

Theorem of total probability
Let {E1, E2,...,En} be a partition of the sample space S, and suppose that each of the
events E1, E2,..., En has nonzero probability of occurrence. Let A be any event associated
with S, then


 Proof: Given that E1, E2,..., En is a partition of the sample space S (Fig below). Therefore
S = E1 ∪ E2 ∪ ... ∪ En   From the figure above

and Ei ∩ Ej = φ, i ≠ j, i, j = 1, 2, ..., n

Now, we know that for any event A,
 A = A ∩ S
     = A ∩ (E1 ∪ E2 ∪ ... ∪ En)
     = (A ∩ E1) ∪ (A ∩ E2) ∪ ...∪ (A ∩ En)
Also A ∩ Ei and A ∩ Ej are respectively the subsets of Ei and Ej . We know that
Ei and Ej are disjoint, for i ≠ j , therefore, A ∩ Ei and A ∩ Ej are also disjoint for all i ≠ j, i, j = 1, 2, ..., n.

Thus, P(A) = P [(A ∩ E1) ∪ (A ∩ E2)∪ .....∪ (A ∩ En)]
                  = P (A ∩ E1) + P (A ∩ E2) + ... + P (A ∩ En)

Now, by multiplication rule of probability, we have
P(A ∩ Ei) = P(Ei) P(A|Ei) as P (Ei) ≠ 0∀i = 1,2,..., n

Therefore, P (A) = P (E1) P (A|E1) + P (E2) P (A|E2) + ... + P (En)P(A|En)

Or


 Let's come back to the topic of Knowledge Graph discussion
Google Knowledge Graph helps answer user's queries as shown in the following Figure
 know what things exist against the text you type in the search engine...........

 summarize relevant facts about those things as shown below ... 





discover related things of interest as shown below........


So you must have realized that Google is a pioneer in the field of information structuring and management and providing you access to the information that is useful to you and that information is having utility for you in the best possible way.It also learns your interest based on the Queries you type in Google search engine and make your search more fruitful by showing information based on your search history and your interest using Some optimized predictive Algorithms.

I will come with more Advertisement Algorithms in the future if you have any query ask and Subscribe to my page to get the Latest updates.

To read my more pages click the link below









No comments:

Post a Comment