What is an Entity?
It’s a great question! Let’s take a moment to explain what an entity is.
Simply, an entity is a thing. It can be any-thing. It can be a person, a place, a brand, a product, an event, a team, a colour.
It is a way to represent real things in language and within databases.
Here are 3 examples of entities:
Bird. This is a class of vertebrate.
Bird. This is a film directed by Clint Eastwood.
Bird. This is an American jazz musician whose real name is Charlie Parker.
It is the same English word, but it refers to 3x very different things. Each of these things is a different entity.
In order to make sense of the world, humans have built databases that contain entities and their definitions. They’re the equivalent of computer based dictionaries. The most famous and powerful is Wikipedia. Let’s look at how they use entities.
Wikipedia contains definitions for millions of things, each one is a unique entity.
For the 3 Examples above we have 3 distinct entities, each of them has its own ID and entry in Wikidata.
Bird. The class of vertebrate. Q5113
Bird. The film directed by Clint Eastwood. Q865056
Bird. The American Jazz Musician. Q103767
So if you are talking about a ‘bird’ in the tree, you are probably talking about Q5113, the animal. Similarly, if you are saying how much you enjoyed watching Bird at the cinema last night, you are probably talking about the film, Q865056. As humans we’re good at using the context to understand the meaning. We don’t the databases and the IDs, we have it all stored in our brain.
Computers are not good at understanding language in this way. That’s where entities become important. Companies (like EntityX) process large amounts of human written content and try to extract the meaning. At EntityX we want to understand what the meaning is on the page, in order to know whether that page is relevant to place an ad on.
When we analyse a web page we extract anything that looks like an entity and we have built semantic tools that help us to determine the context. If the page is talking about Ostriches and Kiwis, then if we see the word ‘bird’ it is probably Q5113. Whereas if the page contains lots of musical references, then it is probably Q103767.
We take language on a web page and break it down into entities that we can manipulate and analyse. Entities help us to disambiguate words in order to get a better understanding.
Entities are language agnostic
One entity can have different labels in different languages.
So to take our bird example. The entity is ‘class of vertebrate’ Q5113 but the label is only ‘bird’ in English.
In French Q5113 is Oiseau
In German Q5113 is Vögel
In Spanish Q5113 is Aves
Same entity, different words.
This means if we’re analysing a page in French and we find the word ‘oiseau’ then we will get the same meaning as if we’re analysing a page in English and found the word ‘bird’.