It's Always Raining Links - But Do We Ever Have Droughts?

Not a day goes by where rain doesn't fall on the Earth. Our weather patterns have changed through the eons and there have been wetter periods and dryer periods but so far as I know the rain keeps falling and will continue to keep falling until Earth can no longer sustain a liquid water environment.

The World Wide Web works much the same way. As we add content to the Web we "hook it up" and link to something, and something often links back to us. Even on the dark Web there are a lot of links. But do links burst into existence at a continuous rate? Is there a predictability to how much linkage we create with our new content? I'm sure a few people have tried to estimate stuff like this. But there's always a catch.

Let me back up a moment and share an odd question that perhaps only a non-physicist like me would ask: if (as was postulated in one scenario) the universe is gradually burning itself up, such that in about 100 billion years or so all that will be left is heat radiated from black holes, what if there is something exotic beyond the boundaries of our laws of physics that is capable of soaking up that heat and converting it into something else?

I can imagine the universe sort of sloshing around within whatever its boundaries may be; some matter may escape those boundaries as its journey away from the Beginning continues. As everything sloshes, it's all boiling away, gradually, passing through some intermediary stages. So, that image has stayed with me for a long time and, frankly, it makes no sense to me.

But there are situations on the Internet where virtual existence seems to mirror physical existence. For example, "out there" in the real universe whenever two pieces of matter come near to each other the chances that a third piece of matter will join one or both of them increases. We lay people call that gravity. On the Web, we Web analysts call that "the Law of Preferential Attachment".

According to the Law of Preferential Attachment, any highly visible piece of content is more likely to attract new links than any poorly visible piece of content. Mike Grehan named this the Filthy Linking Rich Principle in 2004. Another aspect of this principle is that any (naturally) newly produced piece of content is more likely to refer to an existing piece of content than not. In fact, given today's social media and blogging platforms, it's virtually improbable that a real person will create a Web document that doesn't at least link to the site it's on. Most of us don't upload raw files to servers any longer.

But there are nonetheless "dry spells" in link growth, and these dry spells have loci and local environments. For example, any non-automated Website that remains iidle for an extended period of time produces no new links. Also, after an event loses popularity people stop creating content and links pertaining to that event. You usually see a huge surge in content and links following a major catastrophe, for example, and then the production gradually falls away. The less far-ranging the event, the faster the falloff occurs and the closer to zero link and content production become.

It can be argued that no historical event can reach a true zero rate of growth in content and link production as someone somewhere in the future may create something random and obscure about any historical event. We need a name for forgotten historical events -- things that occurred within the scope of human knowledge but which have fallen out of that scope of knowledge. For example, what was the first word Abraham Lincoln spoke as a child? What was his first full sentence? Unless that information was preserved by his family we and our successors in life will probably never know. Hence, we cannot create content about it.

The Web "forgets" things just as history "forgets" things. Websites vanish. Entire hosting services vanish. Once highly active, productive communities lose vitality and eventually die away. Their members go elsewhere, pick up new interests. As we move on to new interests we leave behind us desolated swathes of once active conversations, images, and networks of expressions and links. They remain frozen in place, perhaps occasionally visited by someone (or some thing), but the length of time between visits expands.

Analysts refer to this process by various names such as "link rot", "link decay", "link degradation", etc. It's not really the links that are rotting away, however, so much as it is portions of the Web. The Webverse is constantly changing, transforming itself into new patterns and new collections of content and connectivity. But what should we make of that old stuff?

Search engines and Web marketers alike are not sure of how to measure the Old Web. Should the content still be featured in a world where Query Deserves Freshness or should it be ignored because no one is linking to it any more? The rate at which link acquisition occurs has received much scrutiny among information retrieval scholars, search engineers, Web marketers, and geeky people trying to figure out how the Web works and grows.

Despite the best efforts of Archive.Org to preserve the Web That Was, and of notable mirror services such as ReoCities (first among several attempts to preserve the old GeoCities.com content that Yahoo! shut down), we can no longer browse the Web that existed in, say, 1996. I know because I took down my first personal Website -- built in August 1996 -- and it is not preserved in any archive on the Web I have found, although I may still have an old copy on a CD somewhere. The Old Web recedes into the past just like the Old Universe has receded into the past.

In "A Search In Time Is A Memorable Path" I wrote "We must learn how to search through Time on the Internet for Time surely sorts and separates the information we find. We don't yet have the tools to do this, but we will." Maybe, but those tools won't be able to reach everything in the past. What they might do, however, is change the environment of the Old Web and help make it interesting again. That is, of course, the mission of existing archives.

As we lose portions of the Web That Was we create frayed pathways that lead nowhere. Old Content does not fade away uniformly. Many an old page is loaded with "dead links" that point to now non-existent documents which once lay elsewhere on the Web -- which once were deemed important. As fewer and fewer people visit old pages, or archival copies of old pages, the traffic from their links dries up. As the traffic vanishes the space between the creation of the last link to an Old Document and the next link to that Old Document grows longer and longer on the timeline.

And people like me are asked, "Do these old links matter? Should we still consider them important?" To these questions and others we can only say we don't know. Not because the search engines have forgotten them, nor because their destinations may no longer exist. We don't know if they matter because even a dead link that leads nowhere tells us something. It may be a mistyped link or it may be a memory of the past, a faint echo of content from a time when something forgotten was important.

So today's Web is very much like the universe: we are speeding away from the Beginning in many directions, leaving behind a past that becomes more and more difficult to see. We cannot halt the progression of Time, nor preserve the value we once placed in that content. When Michael Jackson or George Harrison died we grieved with their loved ones and their friends and we poured our hearts out upon the Web, but our passions for those topics are nearly spent and one day their record will be gone.

In fact, when New Zealand actor Kevin Smith died in 2002 15,000 people left messages of grief and condolences for his family on one of my Websites. Soon after I accidentally wiped out the entire server with a single UNIX command. My partner struggled ardently for a week to recover those lost messages but failed. In one heartbeat a page in human history was lost. This process has repeated itself over and over again.

That Website has gone through many transformations since 2002. Most of the old links that pointed to it now lead nowhere; their destinations no longer exist and if surfers are lucky they will find a 301-redirect to something relevant. But more likely they will simply find themselves staring at a 404 page, standing on a virtual cliff overlooking a chasm that has swallowed whole a piece of human history along with Web history, and they will not create new links pointing to that chasm.

My point in relating all this is not as simple as I would like it to be. In the measurement of information on the Web, how should we frame the context of our measurement? Should it include the frayed references to the past that no longer lead anywhere? Should it be confined to a rolling window that excludes the receding Web That Was, even though we cannot see the boundaries between the Web That Is and the one that was? How much effort should we devote to measuring those edges and understanding what they mean to us, to our information architecture and to the information ecosystem?

Search engineers tidy things up with clever formulae that evaporate PageRank, devalue dead links, delist unreachable URLs -- but they cannot erase the stelae and remove the references to the past. So the Web That Was remains with us, a Shadow Web that we cannot explore and which we may no longer be able to measure. It is always there, growing larger, swallowing up everything we leave behind. It has become a Thing Worth Mentioning, and an anomaly in our measurements, but we avoid measuring it because we see no value in it.

Perhaps that is why in the physical world we have historians and archaeologists and paleontologists, because we cannot escape the past no matter how much we are pushed away from it. It looms larger behind us and grows at the expense of the future. Which leads me to ask another of my ridiculous questions: Is there perhaps some exotic thing out there beyond the boundaries of Time that perhaps converts the past into something else, and will we perhaps one day hear the first words spoken by the first human being or find the Web That Was?

If that seems too metaphysical for science, it may be. But I find myself looking at Web measurements that include the Web That Was and I have found no satisfying formula for resolving the anomalies it creates in the data I must weigh and interpret. It is no small thing to be asked, "Should these links matter?"

I still don't know. I suppose it depends on whether anyone ever again decides to link to the documents with the dead links. And the long dry spells between links grow ever longer, except for really well-linked documents that people keep referring to.