Uncovering Switched HTML End Tags in a Long Page


The problem was that a part of every page had disappeared in one browser, but was fine in two other browsers. The disappearance was from a midpoint in a page, to the end. The page is written in HTML. I had to figure out why it disappeared, so I could fix it.

I copied a file into a new file, and saved it as a text file (*.txt), not as an HTML file.

Then I deleted every line that could not possibly have caused the problem. That left lines that might or might not have contributed to the problem. By removing what obviously was irrelevant, I could gradually see more clearly where the problem might lie. And I wouldn’t have to scroll as much, which helps in looking over a page and remembering what I already saw.

The disappearance affected several similar pages, and all of the defective pages began the loss at the same spot. Looking at the source code at that spot did not lead to an obvious answer, because the problem turned out to be more complicated, but the spot was still a clue.

If I deleted too much stuff, I was doing it only from a copy, so I could simply start again without harming my live website.

In my case, I deleted the doctype preamble, the whole head element, the header, comments, and all or most paragraphs, lists, horizontal rules, and line breaks and probably a few other elements as well.

I don’t generally use leading spaces or blank lines. If I had, I would have deleted them, so everything would have been flush left.

I don’t minify, so I have many elements on separate lines. If that was not my custom, I would have separated lines in this text file, to make any problem more visible.

I began to suspect the problem was with misnested tags. Misnesting is against the HTML 5.2 specification, sections 8.2.8.1 and 8.2.8.2 (although non-normative). This is not allowed:

 

<b>X<i>Y</b>Z</i>

 

So, if I had misnesting, I had to find it. And it might not even be there, if the problem was something else. However, now I made a new mistake: suspecting misnesting means suspecting my HTML 5.x was invalid, and I could have diagnosed that faster by using an HTML validator. But I forgot that, so I worked manually.

I suspected the div element, which I use a lot. I needed accurate counts of matching tags and if you have enough of something to count you’re likely to begin to miscount, so manually counting was out. My text editor has a useful feature: if I do a mass-replacement of any string with any string, the text editor will tell me how many replacements I made. So, I replaced a string with an identical string, just to get a count without functionally changing anything. I replaced “<div” with “<div” (both identical and without quotation marks or closing angle brackets, since I often add attributes) and took the count. Then I replaced “</div>” with “</div>” (both of these also identical and without quote marks) and took the new count. They were the same counts, which meant that all the div tags were matched. So, all I had to find was if one tag was misplaced.

I did a search for the three-letter string “div”. A text editor of mine highlights all the results in yellow. If the string appeared without being a tag name (like if I wrote “redivide” in text), I ignored that. So I considered only start and end tags. I marked every end tag that matched a specific start tag. I marked it by adding “<!-- ok -->” (without quote marks) to the left of each tag in a matched pair. Very quickly, almost the entire page showed plenty of “<!-- ok -->” in a neat column from top to bottom.

Almost, because a few exceptions suddenly stood out like sore thumbs. It helps that the text editor applies color-coding. Even without that, the appearance of a column makes isolating the problem easier. That makes finding it easier.

I found it. A div end tag (“</div>”) way down at the bottom of the page should have been a whole bunch of lines up, before the main end tag (“</main>”). It matched a div start tag that came after the main start tag, so the div end tag had to come before the main end tag. The sequence of tags should have been like this: <main><div>article and other elements</div></main>

I already knew this, I made a good template to guide my writing of code, and some of my pages are good. But I had a lot of code on my pages and I had slipped.

The reason one browser broke my pages while two other browsers saved them is that the browser that was breaking my pages was following HTML as it was instructed to, while the browsers that were showing what I intended to show were being patiently tolerant of my mistaken HTML (and the mistake is not absolutely forbidden by the spec), like with a child who’s misbehaving in front of indulgent grownups, so they were showing what I must have meant.

The browser that was holding me up to a standard was Microsoft Internet Explorer, likely a recent version, often notorious for not following standards. The browsers that were doing what I meant were Firefox 47.0 and Chromium 51.0.2704.106 (64-bit) (don’t ask why I didn’t check in later ones). I don’t keep IE at home, because I erase Windows from every machine I get, but I eventually got around to opening my pages at a public library, and discovered the symptom.

I edited my page code, so now my pages are well-behaved.