6

Has any schema-agnostic database engine been implemented?

nbro
  • 39,006
  • 12
  • 98
  • 176
Leo
  • 111
  • 6
  • 1
    The same article you link points to SPARQL as a query language and https://query.wikidata.org/ and http://dbpedia.org/isparql/ as example implementations. Is that what you were looking for? If not, why not? – Alpha Jan 27 '17 at 21:27
  • @Alpha The goal is to abstract users from the data representation. The user acessing this kind of database does not need to known all the available entities that could be queried hence a query like "SELECT * FROM customers" or "SELECT * FROM clients" must give the same result. The db engine will figure out that "clients" and "customers" is referencing the same entity. – Leo Jan 30 '17 at 10:08
  • 1
    For future reference. MarkLogic is not a Schema-agnostic database: http://www.marklogic.com/blog/schema-agnosticism-what-it-is-and-why-you-should-care/ – Leo Jun 13 '17 at 19:13

2 Answers2

1

A very large one, the world wide web with highly scaled and optimized indexing by Google.com is the most distributed and robust schema-agnostic database known today. Without the schema-awareness Google brought to the table by applying more rigorous information science to the table, it was almost useless to those that did not know the URL of the target document in advance.

Schema agnosticism is another way of saying that the database cannot

  • Provide meta information to the services accessing it,
  • Normalize the structure using simple SQL query-insert combinations
  • Proactively optimize the keys automatically as is now possible with machine learning, or
  • Validate insertions

Without first detecting a schema from data patterns. Moving away from structure is appealing because you can just jam data in like a librarian without a book shelf. However, the data scientist will point out that adding entropy working alongside thermodynamic devolution into stochasm.

The purpose of storing data is to be able to retrieve it. Feature extraction is an opportunity to improve structure automatically during the storing structure, rather than store documents chaotically, a trend that will not lead anywhere good for the world of IT.

Consider whether Google is successful because it organizes its data as it crawls or later as we enter key phrases. Which is the efficient sequence?

One more point, Wikipedia is a blog, and they know this, which is why they want peer review for everything now (after much of the information was added without peer review). It is a good place to find lists but not verified facts. The existence of a Wikipedia page is definitely not an indication of the value of the concept on it.

Douglas Daseeco
  • 7,423
  • 1
  • 26
  • 62
  • I improved a bit the question. I asked about an Engine, not the data itself. The scenario is already chaotic, Google provides a search engine for the web and needs to organize the data, sure. What about the data in private silos? The amount of data is not huge like in the web but the chaos exists with or without a schema. Each time an enterprise deploys a new system it has to interface with legacy data and/or perhaps creates more data, more chaos to be queried. The agnosticism is much more about how to consume the data. – Leo Jul 25 '18 at 13:52
  • Another thing. Yes wikipedia alone is not proof of anything, but in this case I am one of the contributos of this article and have worked with several people from Academia and Industry to consolidate this concept. And someone needs wirte down the first lines anyway. – Leo Jul 25 '18 at 13:57
  • Summing up all your points. Yes, the question is how to have a smart system that can infer (with some level of error, of course) and perhaps return useful information for the end user in a way that the user will further improve the query and at the same time give feedback to the system, making the engine even better. The agnosticism lies mostly in the user query language interface. About Wikipedia you are right, I just pointed that perhaps this article, in particular, is a good one ;) – Leo Jul 27 '18 at 12:19
  • I have upvoted to balance the entropy ;) – Leo Aug 31 '18 at 08:21
0

You could be interested in orthogonally persistent systems. You could look at them as schema-agnostic database systems whose data fits entirely in RAM (remember also 1980s Smalltalk or Lisp Machines or Prolog ones and 1994 GrassHopper OS) or at least in virtual memory. With that approach, even SBCL almost fits in your wish, since it has save-lisp-and-die. Look also into frame based systems and object databases. Read also a good operating systems textbook and see past discussions archived on tunes.org.

Shameful self promotion: My bismon system (work in progress in summer 2019) claims to be a GPLv3+ orthogonally persistent system applied for static source code analysis of IoT software. But you might reuse most of it for other kind of orthogonal persistence (of frame based data).