SUMMARY
Welcome to the Semantic Islands:
Mining Meaning and Significance from the LLM Seas
An essay by Gordon Freedman · Publisher, Technologist, Researcher
“Where a human judges, a model correlates. Where a human evaluates, a model predicts. Where a human engages with the world, a model engages with a distribution of words. Their architecture makes them extraordinarily good at reproducing patterns found in text. It does not give them access to the world those words refer to.”
— Walter Quattrociocchi, Scientific American, February 2026Every day, researchers, lawyers, educators, journalists, and analysts use large language models to extract, validate, and organize knowledge from the largest body of digitized information ever assembled. And every day, when the session ends, that work disappears. The sea forgets. The AI, by design, returns to its undifferentiated state. The knowledge that was built — confirmed against sources, organized into structure, connected to other validated claims — slips back into the water. This is not a technical glitch. It is the defining structural problem of the AI age, and no one has named it yet. This essay names it.
The large language model is an ocean. It holds, in some distributed statistical sense, nearly everything that has been digitized: every research paper, policy document, court record, curriculum, dataset, and news article absorbed into one vast undifferentiated body. The sea is extraordinary. It is generative, patient, available to anyone with a connection to it, at essentially no marginal cost. It is also, as Quattrociocchi and colleagues have demonstrated empirically, structurally unable to distinguish the verified fact from the confident confabulation. It cannot validate its own outputs. And it practices a structural amnesia: every conversation begins from the same high-entropy state. The work of extracting and organizing domain knowledge does not accumulate. The sea remains the sea.
“Information is the resolution of uncertainty.”
— Claude Shannon, A Mathematical Theory of Communication, 1948Claude Shannon gave us the mathematics for this problem in 1948. His information theory defined entropy as the measure of uncertainty in a system — the higher the entropy, the more equally probable the possible states, the less organized, the less navigable. A large language model, relative to any specific domain question, is a near-maximum-entropy source: the space of plausible outputs is vast, and the variance across possible answers is the measure of how much the system does not know exactly where the truth is. Shannon called information the resolution of uncertainty. What the LLM produces, without expert mediation, resolves very little.
The essay proposes what needs to rise above the sea: semantic islands. When a domain expert — a cancer biologist, a labor economist, a legal analyst, an educator — interacts with an LLM purposefully, evaluating its outputs against their knowledge, confirming claims against primary sources, and organizing validated propositions into a relational structure, something thermodynamically real happens. Entropy is reduced. The specific, the verified, and the relationally organized rises above the undifferentiated water. An island appears. The island is what the essay calls a Qualified Semantic Network: a persistent, portable, validity-bearing knowledge structure that does not dissolve when the session ends. Its basic unit is the semantic knode — a validated, relationally linked proposition whose strength can be measured in Shannon bits, weighted by the expert’s confidence and the density of its connections to other validated knodes.
“Semantic information is well-formed, meaningful, and truthful data. Misinformation is not a type of semantic information but pseudo-information — not semantic information at all.”
— Luciano Floridi, Is Semantic Information Meaningful Data? Philosophy and Phenomenological Research, 2005
Luciano Floridi, the founding theorist of the philosophy of information and founding director of Yale’s Digital Ethics Center, has spent thirty years arguing that information worthy of the name must be truthful — not merely plausible, not merely fluent, but verified. The LLM produces pseudo-information as readily as it produces information, and it cannot tell the difference between them. The semantic island can, because the expert has done the work of sorting. The island carries a confidence rating on every claim. It knows what it knows with high confidence and what it holds provisionally. That epistemic self-awareness is the thing the sea most conspicuously lacks and the island most essentially possesses.
Islands grow into archipelagoes. Archipelagoes support a semantic economy of validated knowledge exchange. The essay demonstrates this with two live applications. The Mitochondria-Cancer Atlas is a Semantext archipelago in cancer biology: validated, relationally organized intelligence about the role of mitochondria across cancer types, built from a body of literature too vast for any single researcher to survey comprehensively before LLMs existed, now being assembled island by island by an international working group whose founding commentary was published in Cell Metabolism in June 2026. The E3-LLM initiative is a parallel archipelago in human capital intelligence: a proposed architecture for connecting the fragmented American education-employment-economy system into a navigable knowledge network for the student in Albuquerque, the displaced worker in Detroit, the educator designing a curriculum that leads somewhere real.
“The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation.”
— Tim Berners-Lee, James Hendler, and Ora Lassila, Scientific American, May 2001Tim Berners-Lee saw this coming in 2001. His Semantic Web proposal — information given well-defined meaning, making the web navigable by meaning rather than merely by link — was the right vision twenty-five years too early. It had no sea vast enough to mine. The large language model is that sea. The Semantext framework, proposed in this essay, is the process that reconstitutes Berners-Lee’s vision: not by building structured ontologies from scratch, as he imagined, but by extracting them from the LLM ocean using human expert judgment as the entropy pump, Shannon bits as the unit of value, and the Qualified Semantic Network as the persistent, shareable, reusable output.
The essay closes with a structural observation about itself: it was built by the process it describes. A sustained, iterative conversation between a domain expert and an AI produced a persistent, validated knowledge structure about the construction of persistent, validated knowledge structures. A semantic island about island-building, built by island-building. In the formal language of Hofstadter’s strange loop theory, the essay is autological. In the language of dynamical systems, the iterative human-AI conversation that produced it is a strange attractor. And in the language of the argument itself: the island is here. You are standing on it.
The process that builds these islands is what this essay calls Semantext. The unit of their height is the Shannon bit, weighted by human confidence and relational density. The name for the network they form is the Qualified Semantic Network. Vannevar Bush imagined it in The Atlantic in 1945. Shannon measured it in 1948. Berners-Lee proposed it in 2001. The large language model made the sea large enough to demonstrate it in 2022. The Semantext framework, and this essay, are the point where all four lines converge.