<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[The Bag of Holding]]></title><description><![CDATA[A bottomless satchel once possessed by a knight-errant.]]></description><link>https://apoch.github.io/blog</link><image><url>/ThorDogOfThunder.jpg</url><title>The Bag of Holding</title><link>https://apoch.github.io/blog</link></image><generator>RSS for Node</generator><lastBuildDate>Tue, 27 Feb 2018 17:03:12 GMT</lastBuildDate><atom:link href="https://apoch.github.io/blog/rss/" rel="self" type="application/rss+xml"/><ttl>60</ttl><item><title><![CDATA[A Few Farewells]]></title><description><![CDATA[<div id="preamble">
<div class="sectionbody">
<div class="paragraph">
<p>I&#8217;m going to depart from my habit of posting predominantly technical stuff here, and write a bit about some issues that I find very important on a personal level. This is going to be more for my own benefit than anything else, but I truly hope that the exploration is helpful to someone out there who may be confronting similar challenges. If nothing else, it will be a useful reminder to Future Me of why certain things have happened the way they have.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_internet_communities">Internet Communities</h2>
<div class="sectionbody">
<div class="paragraph">
<p>I joined my first Internet community in the waning years of the 1900s. (OK, it was 1998 or so, but it&#8217;s fun to say it that way.) It was a programming-centric web forum, geared around offering advice and assistance with whatever random code thing people asked about. Forums were, in that era, kind of the dominant precursor to things like Stack Overflow - but much less formalized. One step past a mailing list, really.</p>
</div>
<div class="paragraph">
<p>I dropped in and out of that community for a few years before moving on more or less forever in 2002 or so. By that point I had joined a couple of other notable forums and had spread out into other online communities.</p>
</div>
<div class="paragraph">
<p>Even then, it was clear to me that something about certain forums made them <strong>stick</strong> so much better than others. I couldn&#8217;t articulate it then, and I can barely identify the components now, but the general principle is that some communities resonate with me more than others. It has a lot to do with the internal culture and attitudes and general temperature of how people interact with each other within the community - and outward facing interactions, as well.</p>
</div>
<div class="paragraph">
<p>That first forum I joined turned into a cesspit. Or maybe it always <strong>had</strong> been a cesspit, and it just took me four years to catch on. Either way, I decided to move on from it because I just wasn&#8217;t having a good time there anymore.</p>
</div>
<div class="paragraph">
<p>Culture is hugely important to me - personally, professionally, philosophically - and I find that online communities have some very extreme cultures in a lot of cases. Over the years I&#8217;ve wandered into (and, frequently, right back out of) a lot of places on the web, and one of the most reliable predictors of me <strong>leaving</strong> a community is a deteriorated culture.</p>
</div>
<div class="paragraph">
<p>Much of this is coming to a head right now because of my own personal circumstances, but a lot of it is a sign of the times as well. Current events being what they are, a lot of communities that <strong>used to be</strong> friendly, engaging, and enjoyable to participate in&#8230;&#8203; well, they&#8217;re turning into polarized battlegrounds littered with meme-ridden "arguments" where people basically just sling rhetoric (or outright insults) at each other until someone gives up.</p>
</div>
<div class="paragraph">
<p>Don&#8217;t get me wrong, people have been terrible to each other since before fire was invented. This is not news. What&#8217;s news is that the terrible behavior has finally overtaken a handful of places I used to like to hang out in, and now I don&#8217;t like to hang out there anymore.</p>
</div>
<div class="paragraph">
<p>I want to specifically point out a couple of examples.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_twitter">Twitter</h2>
<div class="sectionbody">
<div class="paragraph">
<p>I joined Twitter originally because I thought it seemed like a cool way to stay in touch with a bunch of game development friends and acquaintances. And for a while, it was good for that; I could follow the people I liked to hear from, and anyone who wanted to listen to my inanity could follow me in return.</p>
</div>
<div class="paragraph">
<p>Slowly, the platform evolved, and eventually turned into a monster. Curated feeds actually prompted me to quit Twitter once before. I discovered much later that third-party apps could display the <strong>uncurated</strong> feed, and joined back up; but things were already palpably different.</p>
</div>
<div class="paragraph">
<p>The community feel of Twitter is gone. Rampant retweet-sprees and quote chains have destroyed the sensation of being able to control what viewpoints you have to be subjected to. This is a touchy issue for me, because I genuinely believe three things about viewpoints:</p>
</div>
<div class="olist arabic">
<ol class="arabic">
<li>
<p>Listening to more viewpoints, on average, is healthier than listening to a select few.</p>
</li>
<li>
<p>Some viewpoints do not deserve to be listened to, full stop.</p>
</li>
<li>
<p>These two things are in tension but not fundamentally contradictory.</p>
</li>
</ol>
</div>
<div class="paragraph">
<p>I liked Twitter once upon a time because it gave me the ability to hear things I otherwise would not have heard, but I could still lock away the really egregious garbage and ignore it.</p>
</div>
<div class="paragraph">
<p>For a myriad of reasons, that ability has been lost.</p>
</div>
<div class="paragraph">
<p>I dread opening Twitter now because I don&#8217;t want to read another thread about how gun control is against the Constitution. I don&#8217;t want to read another thread about how women don&#8217;t deserve to play video games, or make video games for that matter. I don&#8217;t want to have to hear echoes of my friends and colleagues fighting an endless war against people who quite honestly have some reprehensible opinions and habits.</p>
</div>
<div class="paragraph">
<p>I believe in listening to the opposing side. I believe in finding common ground. I believe in the spirit of equitable compromise.</p>
</div>
<div class="paragraph">
<p>And yet it hurts <strong>intensely</strong> to watch yet another conversation scroll by where someone I respect and care about in my industry has to defend her right to be a part of that industry at all; where good people who just want the endless stream of shootings to <strong>stop already</strong> are drowned out by expressions of hate and disdain simply because they had the audacity to say guns are probably something we should take a little more seriously.</p>
</div>
<div class="paragraph">
<p>I have no intention of walking away from the issues or even the debate, such as it is, in any area really. But I do feel, quite keenly, the lack of a social space where I can just engage with people in a healthy and enjoyable way. I don&#8217;t need it to be "safe." I don&#8217;t need it to align exclusively with my own biases and preconceptions. I just want a place to talk about stuff - <strong>any</strong> stuff - where everyone is a rational adult and acting otherwise gets you a nice shiny boot back to whatever hellhole you came from.</p>
</div>
<div class="paragraph">
<p>For the things I see in my feed these days, the ratio of "interesting idea that I want to think about" to "oh no why" is perilously low. Twitter no longer delivers pleasant, thoughtful discourse at a sufficient volume. (Maybe it never did, but that&#8217;s beside the point.)</p>
</div>
<div class="paragraph">
<p>So I&#8217;m phasing out my Twitter account. I will probably hang on to it so people can reach me until I have alternative arrangements made, but I&#8217;m actively cutting it out of my life.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_gamedev_net">GameDev.Net</h2>
<div class="sectionbody">
<div class="paragraph">
<p>This one is a lot more emotionally charged for me. I&#8217;ve been lurking around and participating in the GDNet community since 2002. I&#8217;ve held a moderator position there for a long time. I used to sink a lot of effort into helping people with projects and questions there. I attended my first couple of GDCs under the banner of GDNet, doing "media coverage" - i.e. I&#8217;d write an article about every session I went to, and in exchange, I got a free pass to the conference.</p>
</div>
<div class="paragraph">
<p>The "Bag of Holding" started as my journal on GDNet. I wrote a lot of stuff there - ranging from the flippant and silly, to the highly technical, to the intensely personal.</p>
</div>
<div class="paragraph">
<p>Much like Twitter, I&#8217;ve simply stopped having fun on GDNet. But the reasons are much less clear to me. It isn&#8217;t an outright toxic or hostile place to hang out. There&#8217;s still plenty of good people there and plenty of interesting discussion.</p>
</div>
<div class="paragraph">
<p>I think that in the case of GDNet, it isn&#8217;t so much the site that has changed, as it is myself that&#8217;s changed. To be clear, there has been a <strong>lot</strong> of change over the past decade and a half on that site. Some of it was harder to swallow than others. But ultimately, I don&#8217;t think my desire to leave GDNet is the "fault" of that community.</p>
</div>
<div class="paragraph">
<p>Rather, I think I&#8217;ve simply had my time, and now it&#8217;s time to move on. These things happen.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_reddit">Reddit</h2>
<div class="sectionbody">
<div class="paragraph">
<p>I&#8217;ve been using Reddit for nearly 12 years - most of its existence. I predate subreddits and a lot of other features of the community. When I first started hanging out there, it was a pretty niche sort of place. It wasn&#8217;t <strong>exclusively</strong> about programmers or software developers, but the vast majority of the population had <strong>some</strong> connection to software.</p>
</div>
<div class="paragraph">
<p>To me, my desire to leave Reddit is a hybrid of my desire to leave Twitter and my desire to leave GDNet. It <strong>is</strong> partially a failing of the community. Back in the early days, if you were a jerk, you got called out on it and the problematic behavior was strongly discouraged in a number of ways. Since most people on the site had some common ground (i.e. programming or one of the other early interest groups) it was easy to defend each other from hostility. It was easy to <strong>want</strong> to be nice.</p>
</div>
<div class="paragraph">
<p>Now it&#8217;s become a very different place. And that&#8217;s fine; I don&#8217;t pretend to be entitled to it staying a particular way indefinitely. The community has moved on and evolved.</p>
</div>
<div class="paragraph">
<p>I just don&#8217;t want to be part of it anymore.</p>
</div>
<div class="paragraph">
<p>There are still a lot of fantastic people on Reddit, and a lot of great discussion and exploration happens there. But it&#8217;s also exquisitely easy for moderation to slip. Once a sub gets a certain critical mass, if the mod staff is not ready and willing to tackle the volume, bad behavior becomes rampant.</p>
</div>
<div class="paragraph">
<p>What made me step back and think about all this was not a feeling of being attacked or wronged. What made me step back was realizing that <strong>I was also starting to act like a jerk</strong> - and very frequently. Not only was I doing things I found distasteful, but there was minimal response or reaction from the community itself to discourage more of the same.</p>
</div>
<div class="paragraph">
<p>I&#8217;m not leaving Reddit because it&#8217;s devoid of merit. I&#8217;m leaving Reddit because I can&#8217;t enjoy it anymore, and <strong>my own behavior</strong> is part of that problem.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_what_s_next">What&#8217;s Next</h2>
<div class="sectionbody">
<div class="paragraph">
<p>I&#8217;m not going to cite the "new" places I&#8217;ve taken to hanging out in - partially because I like the idea of keeping them pristine, but mostly because I&#8217;m not ready to commit to them just yet. Until I actually settle somewhere, I may be a little less visible and a little harder to get ahold of.</p>
</div>
<div class="paragraph">
<p>On a general level, though, there are a couple of things I think are interesting to consider.</p>
</div>
<div class="paragraph">
<p>Culture, as I cited earlier, is hugely important to me. The fundamental structure and even the UX of a community can have massive impacts on the culture. I&#8217;m interested in communities where the culture is generally positive, supportive, introspective, self-aware, and self-healing. I think some technological approaches to "Internet community" are more likely to produce these qualities. But again, I&#8217;m really early in this process, and I want to gather more data before making any strong statements.</p>
</div>
<div class="paragraph">
<p>Another interesting factor is what I&#8217;ve been mentally referring to as "taxonomy." Some communities are organized around sets of ideas - like subreddits, for example. If you care about Subject X, you go into the section of the community that talks about Subject X, and you hopefully encounter people who are fun to talk to. This is a totally legitimate way to build a community. In fact, I know that some people actually prefer this approach.</p>
</div>
<div class="paragraph">
<p>The other angle is to organize around <strong>individuals</strong>. Twitter and Facebook are much more oriented around people than subjects, for example. It wasn&#8217;t always the case, but for me now, at this point in my life, this organizational taxonomy is much more appealing.</p>
</div>
<div class="paragraph">
<p>I would like to join communities where I know interesting people hang out. I want to know what those people have to say. I want to know who <strong>they</strong> listen to and find interesting. I want to use the network effect to expand my own sphere of awareness and perhaps even influence.</p>
</div>
<div class="paragraph">
<p>I think a community that focuses on <strong>people</strong> and <strong>culture</strong> would really hit the spot.</p>
</div>
</div>
</div>]]></description><link>https://apoch.github.io/blog/2018/02/27/A-Few-Farewells.html</link><guid isPermaLink="true">https://apoch.github.io/blog/2018/02/27/A-Few-Farewells.html</guid><dc:creator><![CDATA[Mike Lewis]]></dc:creator><pubDate>Tue, 27 Feb 2018 00:00:00 GMT</pubDate></item><item><title><![CDATA[Code Reuse In Actual Practice]]></title><description><![CDATA[<div id="preamble">
<div class="sectionbody">
<div class="paragraph">
<p>It&#8217;s very common to hear engineers talking about "code reuse" - particularly in a positive light. We love to say that we&#8217;ll make our designs "reusable". Most of the time the meaning of this is pretty well understood; someday, we want our code to be able to be applied to some different use case and still work without extensive changes.</p>
</div>
<div class="paragraph">
<p>But in practice, code reuse tends to fall flat. A common bit of wisdom is that you shouldn&#8217;t even <em>try</em> to make code reusable until you have three different use cases that would benefit from it. This is actually very good advice, and I&#8217;ve found it helps a lot to step back from the obsession with reusability for a moment and just let oneself write some "one-off" code that <em>actually works</em>.</p>
</div>
<div class="paragraph">
<p>This hints at the possibility of a few flaws in the engineering mindset that reuse is a noble goal.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_why_not_reuse">Why Not Reuse?</h2>
<div class="sectionbody">
<div class="paragraph">
<p>Arguing <em>for</em> reuse is easy: if you only have to write and debug the code once, but can benefit from it multiple times, it&#8217;s clearly better than writing very similar code five or six times&#8230;&#8203; right?</p>
</div>
<div class="paragraph">
<p>Yes and no. Premature generalization is a very real thing. Sometimes we can&#8217;t even <em>see</em> reuse potential until we&#8217;ve written similar systems repeatedly, and <em>then</em> it becomes clear that they could be unified. On the flip side, sometimes we design reusable components that are so generic they don&#8217;t actually do what we needed them to do in the first place.</p>
</div>
<div class="paragraph">
<p>This is a central theme of the story of <em>Design Patterns</em> as a cultural phenomenon. Patterns were originally a <em>descriptive</em> thing. You find a common thread in five or six different systems, and you give it a name.</p>
</div>
<div class="paragraph">
<p>Accumulate enough named things, though, and people start wanting to put the cart before the horse. Patterns became <em>prescriptive</em> - if you want to build a Foo, you use the Bar pattern, duh!</p>
</div>
<div class="paragraph">
<p>So clearly there is a balancing act here. Something is wrong with the idea that all code should be reusable, but something is equally wrong with copy/pasting functions and never unifying them.</p>
</div>
<div class="paragraph">
<p>But another, more insidious factor is at play here. Most of the time we don&#8217;t actually reuse code, <em>even if it was designed to be reusable</em>. And identifying reasons for this lapse is going to be central to making software development scalable into the future. If we keep rewriting the same few thousand systems we&#8217;re never going to do anything fun.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_identifying_why_we_don_t_reuse">Identifying Why We Don&#8217;t Reuse</h2>
<div class="sectionbody">
<div class="paragraph">
<p>Here&#8217;s a real world use case. I want to design a system for handling callbacks in a video game engine. But I&#8217;ve already <em>got</em> several such systems, built for me by previous development efforts in the company. Most of them are basically the exact same thing with minor tweaks:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>Define an "event source"</p>
</li>
<li>
<p>Define some mechanism by which objects can tell the event source that they are "interested" in some particular events</p>
</li>
<li>
<p>When the event source says so, go through the container of listeners and give them a callback to tell them that an event happened</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>Easy. Except <em>Guild Wars 2</em> alone has around half a dozen different mechanisms for accomplishing this basic arrangement. Some are client-side, some are server-side, some relay messages between client and server, but ultimately they all do the exact same job.</p>
</div>
<div class="paragraph">
<p>This is a classic example of looking at existing code and deciding it might be good to refactor it into a simpler form. Except GW2 is a multi-million line-of-code behemoth, and I sure as hell don&#8217;t want to wade through that much code to replace a fundamental mechanism.</p>
</div>
<div class="paragraph">
<p>So the question becomes, if we&#8217;re going to make a better version, who&#8217;s gonna use it?</p>
</div>
<div class="paragraph">
<p>For now the question is academic, but it&#8217;s worth thinking about. We&#8217;re certainly not going to stop making games any time soon, so eventually we should have a standardized callback library that everyone agrees on. So far so good.</p>
</div>
<div class="paragraph">
<p>But what if I want to open-source the callback system, and let other people use it? If it&#8217;s good enough to serve all of ArenaNet&#8217;s myriad uses, surely it&#8217;d be handy elsewhere! Of course, nobody wants a callback system that&#8217;s tied to implementation details of Guild Wars 2, so we need to make the code <em>genuinely</em> reusable.</p>
</div>
<div class="paragraph">
<p>There are plenty of reasons <em>not</em> to use an open-source callback library, especially if you have particular needs that aren&#8217;t represented by the library&#8217;s design. But the single biggest killer of code reuse is <em>dependencies</em>.</p>
</div>
<div class="paragraph">
<p>Some dependencies are obvious. Foo derives from base class Bar, therefore there is a dependency between Foo and Bar, for just one example. But others are more devilish.</p>
</div>
<div class="paragraph">
<p>Say I published my callback library. Somewhere in there, the library has to maintain a container of "things that care about Event X." How do we implement the container?</p>
</div>
<div class="paragraph">
<p>Code reuse is the name of the game here. The obvious answer (outside of game dev) is to use the C++ Standard Library, such as a <code>std::vector</code> or <code>std::map</code> (or both).</p>
</div>
<div class="paragraph">
<p>In games, though, the standard library is often forbidden. I won&#8217;t get into the argument here, but let&#8217;s just say that sometimes you don&#8217;t get to choose what libraries you rely on.</p>
</div>
<div class="paragraph">
<p>So I have a couple of options. I can release my library with <code>std</code> dependencies, which immediately means it&#8217;s useless to half my audience. They have to rewrite a bunch of junk to make <em>my</em> code interoperate with <em>their</em> code and suddenly we&#8217;re not reusing anything anymore.</p>
</div>
<div class="paragraph">
<p>The other option is to roll my own container, such as a trivial linked list. But that&#8217;s even worse, because <em>everyone</em> has a container library, and adding yet another lousy linked list implementation to the world isn&#8217;t reuse <em>either</em>.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_policy_based_programming_to_the_rescue">Policy-Based Programming to the Rescue</h2>
<div class="sectionbody">
<div class="paragraph">
<p>The notion of policy-based architecture is hardly new, but it <em>is</em> sadly underused in most practical applications. I won&#8217;t get into the whole exploration of the idea here, since that&#8217;d take a lot of space, and I mostly just want to give readers a taste of what it can do.</p>
</div>
<div class="paragraph">
<p>Here&#8217;s the basic idea. Let&#8217;s start with a simple container dependency.</p>
</div>
<div class="literalblock">
<div class="content">
<pre>class ThingWhatDoesCoolStuff
{
    std::vector&lt;int&gt; Stuff;
};</pre>
</div>
</div>
<div class="paragraph">
<p>This clearly makes our nifty class dependent on <code>std::vector</code>, which is not great for people who don&#8217;t have <code>std::vector</code> in their acceptable tools list.</p>
</div>
<div class="paragraph">
<p>Let&#8217;s make this a bit better, shall we?</p>
</div>
<div class="literalblock">
<div class="content">
<pre>template &lt;typename ContainerType&gt;
class ThingWhatDoesCoolStuff
{
    ContainerType Stuff;
};

// Clients do this
ThingWhatDoesCoolStuff&lt;std::vector&lt;int&gt;&gt; Thing;</pre>
</div>
</div>
<div class="paragraph">
<p>Slightly better, but now clients have to spell a really weird name all the time (which admittedly can be solved to great extent with a <code>typedef</code> and C++11 <code>using</code> declarations).</p>
</div>
<div class="paragraph">
<p>This also breaks when we actually write code:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>template &lt;typename ContainerType&gt;
class ThingWhatDoesCoolStuff
{
public:
    void AddStuff (int stuff)
    {
        Stuff.push_back(stuff);
    }

private:
    ContainerType Stuff;
};</pre>
</div>
</div>
<div class="paragraph">
<p>This works <em>provided</em> that the container we give it has a method called <code>push_back</code>. What if the method in <em>my</em> library is called <code>Add</code> instead? Now we have a compiler error, and I have to rewrite the nifty class to conform to <em>my</em> container&#8217;s API instead of the C++ Standard Library API. So much for reuse.</p>
</div>
<div class="paragraph">
<p>You know what they say, you can solve any problem by adding enough layers of indirection! So let&#8217;s do that real quick.</p>
</div>
<div class="literalblock">
<div class="content">
<pre>// This goes in the reusable library
template &lt;typename Policy&gt;
class ThingWhatDoesCoolStuff
{
private:
    // YES I SWEAR THIS IS REAL SYNTAX
    typedef typename Policy::template ContainerType&lt;int&gt; Container;

    // Give us a member container of the desired type!
    Container Stuff;

public:
    void AddStuff (int stuff)
    {
        using Adapter = Policy::ContainerAdapter&lt;int&gt;;
        Adapter::PushBack(&amp;Stuff, stuff);
    }
};

// Users of the library just need to write this once:
struct MyPolicy
{
    // This just needs to point to the container we want
    template &lt;typename T&gt; using ContainerType = std::vector&lt;T&gt;;

    template &lt;typename T&gt;
    struct ContainerAdapter
    {
        static inline void PushBack (MyPolicy::ContainerType * container, T &amp;&amp; element)
        {
            // This would change based on the API we use
            container-&gt;push_back(element);
        }
    };
};</pre>
</div>
</div>
<div class="paragraph">
<p>Let&#8217;s pull this apart and see how it works.</p>
</div>
<div class="paragraph">
<p>First, we introduce a template "policy" which lets us decouple our nifty class from all the things it relies on, such as container classes. <strong>Any "reusable" code should be decoupled from its dependencies.</strong> (This by no means the only way to do so, even in C++, but it&#8217;s a nice trick to have in your kit.)</p>
</div>
<div class="paragraph">
<p>The hairy parts of this are really just the syntax for it all. Effectively, our nifty class just says "hey I want to use some container, and an adapter API that I know how to talk to. If you can give me an adapter to <em>your</em> container I&#8217;ll happily use it!"</p>
</div>
<div class="paragraph">
<p>Here we use templates to avoid a lot of virtual dispatch overhead. Theoretically I could make a base class like "Container" and inherit from it and blah blah vomit I hate myself for just thinking this. Let&#8217;s not explore that notion any further.</p>
</div>
<div class="paragraph">
<p>What&#8217;s cool is that I can keep the library code 100% identical between projects that <em>do</em> use the C++ Standard Library, and projects which <em>don&#8217;t</em>. So I could publish my callback system exactly once, and nobody would have to edit the code to use it.</p>
</div>
<div class="paragraph">
<p>There is a cost here, and it&#8217;s worth thinking about: any time someone reuses my code, they have to write a suitable policy. In practice, this means you write a policy about once for every time you change your entire code base to use a different container API. In other words, pffffft.</p>
</div>
<div class="paragraph">
<p>For things which aren&#8217;t as stable as containers, the policy cost may become more significant. This is why you want to reuse in only carefully considered ways, preferably (as mentioned earlier) when you have several use cases that can benefit from that shared abstraction.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_concluding_thoughts">Concluding Thoughts</h2>
<div class="sectionbody">
<div class="paragraph">
<p>One last idea to consider is how the performance of this technique measures up. In debug builds, it can be a little ugly, but optimized builds strip away literally any substantial overhead of the templates.</p>
</div>
<div class="paragraph">
<p>So runtime performance is fine, but what about <em>build times</em> themselves?</p>
</div>
<div class="paragraph">
<p>Admittedly this does require a lot of templates going around. But the hope is that you&#8217;re reusing simple and composable components, not huge swaths of logic. So it&#8217;s easy to go wrong here if you don&#8217;t carefully consider what to apply this trick to. Used judiciously, however, it&#8217;s actually a bit better of a deal than defining a lot of shared abstract interfaces to decouple your APIs.</p>
</div>
<div class="paragraph">
<p>I&#8217;ll go into the specific considerations of the actual callback system later. For now, I hope the peek at policy-based decoupling has been useful.</p>
</div>
<div class="paragraph">
<p>Remember: three examples or you don&#8217;t have a valid generalization!</p>
</div>
</div>
</div>]]></description><link>https://apoch.github.io/blog/2017/10/25/Code-Reuse-In-Actual-Practice.html</link><guid isPermaLink="true">https://apoch.github.io/blog/2017/10/25/Code-Reuse-In-Actual-Practice.html</guid><dc:creator><![CDATA[Mike Lewis]]></dc:creator><pubDate>Wed, 25 Oct 2017 00:00:00 GMT</pubDate></item><item><title><![CDATA[Source-Level Debugging For Epoch Programs]]></title><description><![CDATA[<div id="preamble">
<div class="sectionbody">
<div class="paragraph">
<p>This weekend marks a major milestone for the development of the <a href="https://github.com/apoch/epoch-language">Epoch programming language</a>. For the first time, Windows debuggers such as Visual Studio and WinDbg can perform <strong>source-level debugging</strong> on Epoch programs.</p>
</div>
<div class="paragraph">
<p>In a nutshell, this means that the comfortable modern development features of setting breakpoints and stepping through code are now available to Epoch programmers.</p>
</div>
<div class="paragraph">
<p>One notable thing left to achieve is <strong>runtime state inspection</strong>. There is currently not enough data generated by the Epoch compiler to reliably inspect variables, function parameters, and so on in the debugger. This will be my next major point of focus.</p>
</div>
<div class="imageblock">
<div class="content">
<img src="https://github.com/apoch/epoch-language/raw/master/Images/Screenshots/Debugging-VS2015-Source-Breakpoint.png" alt="Visual Studio 2015 Debugging a Simple Epoch Program">
</div>
</div>
<div class="imageblock">
<div class="content">
<img src="https://github.com/apoch/epoch-language/raw/master/Images/Screenshots/Debugging-WinDbg-Source-Breakpoint.png" alt="WinDbg Debugging the Same Epoch Program">
</div>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_how_we_got_here">How We Got Here</h2>
<div class="sectionbody">
<div class="paragraph">
<p>Attaining this functionality was not easy, but it was definitely worth the investment. It all started almost exactly a year ago, when I decided that being unable to debug the self-hosting process for 64-bit Epoch was unacceptable.</p>
</div>
<div class="paragraph">
<p>Initially debug information was generated via piping some bogus line numbers into <a href="http://llvm.org/">LLVM</a> and then routing the generated block of CodeView symbols into <code>MSPDB140.dll</code> to generate a somewhat-working PDB file on disk. This implementation took about two weeks.</p>
</div>
<div class="paragraph">
<p>That wasn&#8217;t enough, though; it introduced a heavy dependency on Visual Studio (something I&#8217;ve been keen to avoid, despite strongly encouraging use of VS with Epoch) and also had limitations via the API of <code>MSPDB140.dll</code> that were&#8230;&#8203; inscrutable, to say the least.</p>
</div>
<div class="paragraph">
<p>So I set out in search of a complete understanding of the PDB file format and how to generate my own debug information for it. The intervening year wasn&#8217;t all dedicated to PDB work; a fair amount of time went into Visual Studio integration and other tidbits of self-hosting effort. (Not to mention there were a few major spans of downtime. This gets exhausting after a while!)</p>
</div>
<div class="paragraph">
<p>The <a href="https://github.com/apoch/epoch-language/commits/master">Epoch repo commit log</a> shows the gory details of how everything came together, but the high-level is pretty simple; using a suite of tools, I reverse engineered large sections of the PDB format and developed an Epoch implementation of code to write them out.</p>
</div>
<div class="paragraph">
<p>Noteworthy projects:</p>
</div>
<div class="olist arabic">
<ol class="arabic">
<li>
<p><a href="https://github.com/llvm-mirror/llvm/tree/master/tools/llvm-pdbutil">LLVM&#8217;s pdbutil</a></p>
</li>
<li>
<p><a href="https://github.com/Microsoft/microsoft-pdb">Microsoft&#8217;s PDB repository</a> (and especially the bundled tool <code>cvdump</code>)</p>
</li>
<li>
<p><a href="https://msdn.microsoft.com/en-us/library/b5ke49f5.aspx?f=255&amp;MSPPError=-2147217396">Microsoft&#8217;s DIA2Dump utility</a></p>
</li>
<li>
<p><a href="https://github.com/apoch/epoch-language/tree/master/Tools/MSFViewer">My own concoction, MSFViewer, which displays structural information about PDBs</a></p>
</li>
</ol>
</div>
<div class="paragraph">
<p>Target debuggers have been VS2015 and WinDbg, both of which work now with source-level breakpoints and stepping. x64dbg also sort-of works, although it doesn&#8217;t like to display source for some reason; everything else seems to be fine, so I don&#8217;t know the tool well enough to say if it&#8217;s my bug or theirs.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_what_comes_next">What Comes Next</h2>
<div class="sectionbody">
<div class="paragraph">
<p>As alluded to above, my next major project is to get variable data inspection working. This is a dark corner of the PDB format that seems poorly understood in the community, so it should be exciting to try and forge ahead here.</p>
</div>
<div class="paragraph">
<p>At a minimum, I&#8217;ll need to start generating <strong>types data</strong> in the PDBs, so that debuggers know how to interpret various memory addresses correctly. There&#8217;s probably some other voodoo too, but that&#8217;ll have to wait until I discover what&#8217;s in store.</p>
</div>
</div>
</div>]]></description><link>https://apoch.github.io/blog/2017/07/09/Source-Level-Debugging-For-Epoch-Programs.html</link><guid isPermaLink="true">https://apoch.github.io/blog/2017/07/09/Source-Level-Debugging-For-Epoch-Programs.html</guid><dc:creator><![CDATA[Mike Lewis]]></dc:creator><pubDate>Sun, 09 Jul 2017 00:00:00 GMT</pubDate></item><item><title><![CDATA[Using Poison to Reverse Engineer Code]]></title><description><![CDATA[<div id="preamble">
<div class="sectionbody">
<div class="paragraph">
<p>Recently I&#8217;ve been working on a rather difficult task, namely creating <a href="https://docs.microsoft.com/en-us/visualstudio/debugger/specify-symbol-dot-pdb-and-source-files-in-the-visual-studio-debugger">PDB debug database files</a> from the <a href="https://github.com/apoch/epoch-language">Epoch Language</a> compiler toolchain.</p>
</div>
<div class="paragraph">
<p>This is difficult in part because the format of PDB files is generally not well-understood, and is certainly poorly documented. I can&#8217;t go much further without a hearty thanks to the LLVM project and particularly their tool <code>llvm-pdbdump</code> which makes it much easier to test whether or not a generated PDB is sane. When <code>llvm-pdbdump</code> has good information about the state of a given PDB, it is invaluable; and when it falls short, as is inevitably the case with a format like PDB, it at least gives me a starting point for understanding why things have gone wrong.</p>
</div>
<div class="paragraph">
<p>However, there is another tool, from Microsoft themselves, called <code>Dia2Dump.exe</code> which uses an <em>authoritative</em> implementation of the PDB format, via the file <code>MsDia140.dll</code> on Visual Studio 2015. This library is (as near as I can tell) close to or identical to the code used by Visual Studio itself for debugging programs. It also seems to parallel the implementations in <code>WinDbg</code> and <code>DbgHelp.dll</code>, both of which I use extensively in my research.</p>
</div>
<div class="paragraph">
<p>Last but not least, I must mention the <a href="https://github.com/Microsoft/microsoft-pdb/">Microsoft-PDB</a> repo on GitHub, which is <em>partial</em> source for the implementation of the PDB format. It does not actually compile right now, so it&#8217;s hard to use, but it has a significant purpose for me: I can cross-reference functions in <code>MsDia140.dll</code> with this code, and use that for some serious reverse-engineering.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_debugging">Debugging</h2>
<div class="sectionbody">
<div class="paragraph">
<p>Sometimes when feeding data into a black box like <code>MsDia140.dll</code> it can be hard to know what code paths are taken and why. For example, let&#8217;s look at the function <code>GSI1::readHash</code> (see <a href="https://github.com/Microsoft/microsoft-pdb/blob/master/PDB/dbi/gsi.cpp#L66">here</a> to follow along in the source).</p>
</div>
<div class="paragraph">
<p>This function does some stuff I still don&#8217;t fully understand, so let&#8217;s walk through the process of gaining more understanding.</p>
</div>
<div class="paragraph">
<p>First we need a partially malformed PDB. This is easy to do since PDB files are sensitive to tiny changes, often in non-obvious ways. In particular, I&#8217;m going to work on the <code>Publics</code> stream. This is a fragment of a <em>Multi-Stream File</em> (aka MSF) which contains, among other things, publicly visible <em>debug symbols</em> for some program.</p>
</div>
<div class="paragraph">
<p>At the beginning of the stream, there is a structure which <code>llvm-pdbdump</code> is sadly cryptic about. Thankfully, <code>llvm-pdbdump</code> contains some sanity checks which seem to align well with the checks made by Microsoft&#8217;s code, so it&#8217;s at least easy to use the tool to verify what we&#8217;re spitting out.</p>
</div>
<div class="paragraph">
<p><code>readHash</code> is responsible for decoding part of this data structure, which appears to be some kind of hash table for accelerating symbol lookups. Inside the code for <code>readHash</code> (see link above) there is a call to a pesky function called <code>fixHashIn</code>. By attaching WinDbg to a running copy of <code>Dia2Dump.exe</code> and setting liberal numbers of breakpoints, I traced a failure in my PDB generation code to this single function. <code>fixHashIn</code> is vomiting because I&#8217;m feeding it data it doesn&#8217;t like.</p>
</div>
<div class="paragraph">
<p>The first thing to note is that <code>fixHashIn</code> begins with a decrement instruction to decrease the value of one of its parameters. This parameter is supposedly the number of buckets in the hash table, or so I extrapolate from the source.</p>
</div>
<div class="paragraph">
<p>In my case, the parameter has a value of zero! Clearly I don&#8217;t want my hash table to have zero buckets, so it becomes apparent why <code>fixHashIn</code> is choking. What I <em>don&#8217;t</em> immediately understand is <em>why</em> it thinks zero is the number of buckets&#8230;&#8203; I had <em>thought</em> that I was passing a value in (8 bytes per entry * 16 entries) that would work. Clearly I was wrong, but where was the zero coming from?</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_msf_files">MSF Files</h2>
<div class="sectionbody">
<div class="paragraph">
<p>A little more background is in order. In an MSF file (MSF being a superset of PDB files), data is divided into <em>streams</em>, each of which is built up of one or more <em>blocks</em>. A block can be different sizes, but I&#8217;m using 1KB (1024 bytes) for convenience. Data not used is filled with junk bytes.</p>
</div>
<div class="paragraph">
<p>Crucially, I pad my blocks with zeroes. If somehow the PDB interpreter is reading one of my padding bytes, it might be incorrectly assuming I want to feed it a zero-size hash table&#8230;&#8203; obviously a problem. So what to do?</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_poison">Poison</h2>
<div class="sectionbody">
<div class="paragraph">
<p>And now the meat of everything!</p>
</div>
<div class="paragraph">
<p>Instead of padding my file with zeroes, I use carefully crafted <em>poison values</em>. For my purposes I&#8217;m working with 32-bit data, so a poison value is usually 4 bytes long. A good example is <code>0xfeedface</code> which is a funny but valid hex number that happens to be the right size.</p>
</div>
<div class="paragraph">
<p>The important thing is that we can&#8217;t just pad <em>every</em> 32-bit slot with <code>0xfeedface</code>. Instead, we want to make <em>permutations</em> of the poison value - one unique permutation per slot. Every possible 4-byte sequence of my PDB&#8217;s "padding" is now a unique string of digits.</p>
</div>
<div class="paragraph">
<p>Here&#8217;s the magic part: when I run this in the debugger, I can walk into the <code>fixHashIn</code> function, and look at its parameters.</p>
</div>
<div class="paragraph">
<p>My first run of this process is surprising - despite poisoning a bunch of data around where I thought this zero was coming from, the value is still zero when we reach the <code>fixHashIn</code> function! This indicates one of two things.</p>
</div>
<div class="olist arabic">
<ol class="arabic">
<li>
<p>The value is read from a place I didn&#8217;t poison</p>
</li>
<li>
<p>The value might be computed somehow</p>
</li>
</ol>
</div>
<div class="paragraph">
<p>To rule out the possibility that I&#8217;m not poisoning enough, I expand the poison to the entire file instead of just one block&#8217;s worth of padding bytes. The debugger still stubbornly shows the parameter as zero, meaning that <em>the zero is being computed from some other data being fed in, not read directly from the file on disk</em>.</p>
</div>
<div class="paragraph">
<p><a href="https://github.com/Microsoft/microsoft-pdb/blob/master/PDB/dbi/gsi.cpp#L96">This line</a> of the Microsoft PDB source is illuminating&#8230;&#8203; but <a href="https://github.com/Microsoft/microsoft-pdb/blob/master/PDB/dbi/gsi.cpp#L65">this line</a> even more so. At line 65 is a comment stating that <code>fixHashIn</code> is called from <em>two</em> places&#8230;&#8203; one of them is the loader for the Publics stream, but one is for a totally unrelated stream called <code>Globals</code>!</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_conclusions">Conclusions</h2>
<div class="sectionbody">
<div class="paragraph">
<p>It turns out I&#8217;ve been hitting breakpoints all evening in <code>fixHashIn</code>, but the call stack is wrong. The calls I&#8217;ve been seeing are from a totally different stream of data.</p>
</div>
<div class="paragraph">
<p>This post may not have a cheerful ending, but I hope the value of poisoning data is clear: I may have taken <em>days</em> to realize my mistake without having 100% proof that the evil zero was not coming from my Publics stream.</p>
</div>
<div class="paragraph">
<p>In any event, I use the poison technique a lot, and this is just one sampling of my adventures with the PDB format. Maybe I&#8217;ll have a better story of success tomorrow!</p>
</div>
</div>
</div>]]></description><link>https://apoch.github.io/blog/2017/06/13/Using-Poison-to-Reverse-Engineer-Code.html</link><guid isPermaLink="true">https://apoch.github.io/blog/2017/06/13/Using-Poison-to-Reverse-Engineer-Code.html</guid><dc:creator><![CDATA[Mike Lewis]]></dc:creator><pubDate>Tue, 13 Jun 2017 00:00:00 GMT</pubDate></item><item><title><![CDATA[Debugging Information Success]]></title><description><![CDATA[<div class="paragraph">
<p>Early this morning, <a href="https://github.com/apoch/epoch-language">Epoch</a> achieved a minor success in the debugging department. I was able to generate a working PDB file with symbols for a small test program, run the program in Visual Studio and WinDbg, and step through the disassembly.</p>
</div>
<div class="paragraph">
<p>Along the way, the current Epoch function was correctly tracked, indicating two things: stack unwinding information is working correctly, and function-address-to-symbol mapping is working as well.</p>
</div>
<div class="paragraph">
<p>The next step is to get line number information piped through to the PDB from the parser. This will be a major haul, but well worth it since it will allow source-level debugging of Epoch programs.</p>
</div>
<div class="paragraph">
<p>Once that is complete, I plan to tackle type metadata and variable tracking, so that the values of variables can be visualized in the debugger. That&#8217;s an even bigger lift, so I don&#8217;t expect it any time soon.</p>
</div>
<div class="paragraph">
<p>That said, I plan on waiting until debugging is in a good state before resuming work on self-hosting, because having debug information available makes that process vastly more convenient and approachable.</p>
</div>
<div class="paragraph">
<p>All in all it&#8217;s a good day for Epoch!</p>
</div>]]></description><link>https://apoch.github.io/blog/2017/06/07/Debugging-Information-Success.html</link><guid isPermaLink="true">https://apoch.github.io/blog/2017/06/07/Debugging-Information-Success.html</guid><dc:creator><![CDATA[Mike Lewis]]></dc:creator><pubDate>Wed, 07 Jun 2017 00:00:00 GMT</pubDate></item><item><title><![CDATA[Debugging Epoch Programs]]></title><description><![CDATA[<div class="paragraph">
<p>My recent adventures in self-hosting the 64-bit Epoch compiler have led me to a significant conclusion: it isn&#8217;t worth trying to self-host a compiler when you can&#8217;t debug the target language.</p>
</div>
<div class="paragraph">
<p>A much better use of my time would be to improve the <a href="https://github.com/apoch/epoch-language/wiki/Knowledge-Dump---Debugging-Epoch-Programs">languishing PDB generation experiment</a> and get the code set up to actually emit usable debug symbols for Visual Studio and WinDbg.</p>
</div>
<div class="paragraph">
<p>It presently takes several minutes to build a candidate compiler; given that fact, it makes little sense to try and brute-force my way to correctness. Debuggers are valuable tools and shouldn&#8217;t be left as afterthoughts in the development of what aims to be a production language.</p>
</div>
<div class="paragraph">
<p>So I&#8217;m dusting off the code for PDB emission and working on a tiny shim DLL that will provide some hard-coded one-off features that might be needed in the course of getting the legacy 32-bit compiler to generate debug information about 64-bit executables.</p>
</div>
<div class="paragraph">
<p>One such thing that has come up is that, since vanilla Epoch lacks pointer arithmetic, it is hard to do serialization well. The shim DLL currently contains a single function, <code>GetBufferPtr</code>, which takes an input pointer and offset and returns the pointer adjusted by that offset. In other words, it&#8217;s a glorified pointer-add.</p>
</div>
<div class="paragraph">
<p>This isn&#8217;t really satisfying to me as a long-term way to write Epoch code, but I&#8217;ve decided that debug information is more important than implementing 64-bit features, including self-hosting. As such, it&#8217;ll have to suffice for a while.</p>
</div>]]></description><link>https://apoch.github.io/blog/2017/06/03/Debugging-Epoch-Programs.html</link><guid isPermaLink="true">https://apoch.github.io/blog/2017/06/03/Debugging-Epoch-Programs.html</guid><dc:creator><![CDATA[Mike Lewis]]></dc:creator><pubDate>Sat, 03 Jun 2017 00:00:00 GMT</pubDate></item><item><title><![CDATA[Epoch 64-bit compiler progress]]></title><description><![CDATA[<div class="paragraph">
<p>Just a short while ago, the first working 64-bit compiler for <a href="https://github.com/apoch/epoch-language">Epoch</a> was produced!</p>
</div>
<div class="paragraph">
<p>Well, "working" might be a minor stretch; it launches, prints a simple text banner message, and then exits cleanly. But that represents a lot of operational code by itself.</p>
</div>
<div class="ulist">
<ul>
<li>
<p>The 32-bit compiler is able to lex, parse, type-check, and code-gen the entirety of the 64-bit compiler&#8217;s source code.</p>
</li>
<li>
<p>The 32-bit linker can emit 64-bit binaries, assisted by LLVM&#8217;s machine code generation facilities.</p>
</li>
<li>
<p>The 64-bit compiler binary is a completely functional Windows executable image.</p>
</li>
<li>
<p>This executable can run to completion on 64-bit Windows versions.</p>
</li>
<li>
<p>Inside the compiled binary is a table of string constants.</p>
</li>
<li>
<p>64-bit Epoch code can load those strings and route them out to the command-line console.</p>
</li>
<li>
<p>A number of support DLL calls are involved in this process, including loading garbage collection metadata and stack tracing for identifying GC roots.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>All told there are hundreds of thousands of lines of code involved. Building the 64-bit compiler takes about 164 seconds (just over two and a half minutes) when using debug versions of LLVM. (For comparison, the 32-bit compiler can self-host in under 20 seconds, but that&#8217;s an unfair comparison because that build process uses optimized Release versions of LLVM.)</p>
</div>
<div class="paragraph">
<p>I&#8217;m pretty pleased with this progress. There are still many things left to get working, though.</p>
</div>
<div class="ulist">
<ul>
<li>
<p>64-bit globals do not work correctly; all of them are currently stuffed into a single random address which may or may not crash when dereferenced.</p>
</li>
<li>
<p>More support DLL calls need to be implemented or eliminated.</p>
</li>
<li>
<p>Certain code constructs do not work correctly yet; this is worked around for the time being by not using them in the compiler, but they will be good to get working as soon as is practical.</p>
</li>
<li>
<p>A large number of hacks and temporary shims exist in the linker. This will need to be cleaned up substantially before self-hosting is really practical.</p>
</li>
<li>
<p>Debug metadata and symbols are not generated correctly yet.</p>
</li>
<li>
<p>Visual Studio integration has a number of bugs, ranging from the pesky to the outright unusable.</p>
</li>
<li>
<p>It is exceedingly likely that there will be bugs in the compiler, meaning that 64-bit self-hosting is still a ways out even if the basics are operational.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>Making the highly optimisic presumption that this will all happen soon, I think it&#8217;s fair to say that once all of the above is addressed (and 64-bit self-hosting is complete) it will be time to cut another release of Epoch.</p>
</div>
<div class="paragraph">
<p>In all probability, though, I&#8217;ll course-correct sometime between now and then, but it never hurts to have objectives!</p>
</div>]]></description><link>https://apoch.github.io/blog/2017/05/30/Epoch-64-bit-compiler-progress.html</link><guid isPermaLink="true">https://apoch.github.io/blog/2017/05/30/Epoch-64-bit-compiler-progress.html</guid><dc:creator><![CDATA[Mike Lewis]]></dc:creator><pubDate>Tue, 30 May 2017 00:00:00 GMT</pubDate></item><item><title><![CDATA[Epoch Code-Generation Update]]></title><description><![CDATA[<div id="preamble">
<div class="sectionbody">
<div class="paragraph">
<p>A few minutes ago, the first 64-bit self-hosted compiler for Epoch finished the code-generation process&#8230;&#8203; unsuccessfully.</p>
</div>
<div class="paragraph">
<p>For context, this means that the 64-bit compiler (as built by the existing 32-bit compiler) was lexed, parsed, type-checked, and turned into LLVM IR. What <em>didn&#8217;t</em> happen is a successful code emission, i.e. the compiler is not yet producing a working executable image.</p>
</div>
<div class="paragraph">
<p>What it <em>does</em> produce is about 6.8 MB of errors, or just over 121,000 lines of output. This indicates that something in the code-gen process is off. We&#8217;re generating LLVM IR but it can&#8217;t be turned into machine code because it is malformed in some way.</p>
</div>
<div class="paragraph">
<p>Inspection of the error output shows that one of the biggest offenses is bad linkage on a global variable. Epoch aspires to minimize the use of global state but it&#8217;s a useful construct while bootstrapping a compiler. Fixing this mistake is trivial and reduces the error volume to much less.</p>
</div>
<div class="paragraph">
<p>In fact, the vast bulk of the output is actually the text of the LLVM IR in pretty-printed form. This "dump" is generated to help diagnose code-gen bugs, but it&#8217;s meant for much smaller programs than the entire compiler! Culling the dumped IR shows that there are in fact only <strong>208</strong> errors left (after the global linkage fiasco was addressed). And all of them are the same "sort" of error&#8230;&#8203;</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_terminator_found_in_the_middle_of_a_basic_block">Terminator found in the middle of a basic block!</h2>
<div class="sectionbody">
<div class="paragraph">
<p>LLVM divides code into sections called <em>basic blocks</em>. A single basic block represents a linear sequence of instructions, i.e. every instruction in it is executed exactly once. The way to accomplish branching flow control is to <strong>end</strong> a basic block with a branch instruction, likely a conditional branch of some kind. Branches target <em>other</em> basic blocks to allow different code paths to execute based on the branch.</p>
</div>
<div class="paragraph">
<p>The dreaded error "Terminator found in the middle of a basic block!" means that the constraints have been violated. Someone tried to issue a branch instruction in the middle of a block, which ruins the idea that every instruction in the block executes exactly once.</p>
</div>
<div class="paragraph">
<p>In concrete terms, this error signals a bug in the code generation process. It means that somewhere along the line, the Epoch compiler lost track of a basic block being terminated, and continued shoving instructions into it after a branch.</p>
</div>
<div class="paragraph">
<p>Thankfully, LLVM barfs a "label" when it emits this error, and that label is sufficient to locate the offending basic block:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>1&gt;  ; &lt;label&gt;:41                                      ; preds = %11
1&gt;    br label %9, !dbg !3338
1&gt;    br label %42</pre>
</div>
</div>
<div class="paragraph">
<p>Sure enough, there are two branches being attempted here. The larger context is uninteresting (it&#8217;s a nested if-statement inside a binary tree insertion routine) but the specific failure appears many times, meaning that it&#8217;s probably a small number of actual code-generation bugs to solve.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_testing_1_2_3">Testing, 1 2 3</h2>
<div class="sectionbody">
<div class="paragraph">
<p>As with any good software, robustness in a compiler happens only when enough bugs have been fixed <em>while simultaneously ensuring that no new ones are introduced</em>. The best tool I know of for doing this is <em>automated testing</em>. Now that a compiler bug has been identified, the objective is to replicate it in as tiny a program as possible.</p>
</div>
<div class="paragraph">
<p>This "test case" provides two things: a way to reproduce the bug on-demand so a fix can be tested, and a way to detect if the bug ever reappears. The Epoch compiler test suite is still small, but invaluable for addressing this sort of problem. I will add this particular code to the test suite and hopefully have a fix in short order.</p>
</div>
</div>
</div>]]></description><link>https://apoch.github.io/blog/2017/05/29/Epoch-Code-Generation-Update.html</link><guid isPermaLink="true">https://apoch.github.io/blog/2017/05/29/Epoch-Code-Generation-Update.html</guid><dc:creator><![CDATA[Mike Lewis]]></dc:creator><pubDate>Mon, 29 May 2017 00:00:00 GMT</pubDate></item><item><title><![CDATA[Epoch 64-bit self-hosting progress]]></title><description><![CDATA[<div class="paragraph">
<p>For a decent while now, I&#8217;ve been working on <em>self-hosting</em> the <a href="https://github.com/apoch/epoch-language">Epoch</a> 64-bit compiler. This involves getting the compiler to a point where it is robust enough to actually compile itself. In order to do this, I&#8217;m using a modified 32-bit compiler which generates 64-bit binaries. Once a working 64-bit compiler is emitted, I can feed <em>that</em> compiler back into itself, thus completing the head-trip ouroboros that is self-hosting or "bootstrapping" a compiler.</p>
</div>
<div class="paragraph">
<p>At the moment, the compiler can successfully lex, parse, type-check, and partially code-gen itself. In practical terms, this means that the <em>front-end</em> of the compiler is working fine, but the <em>back-end</em> - the set of systems responsible for turning code into machine language and emitting a working executable - remains incomplete. For a slightly different perspective, I&#8217;m generating LLVM IR for <em>most</em> of the compiler at this point.</p>
</div>
<div class="paragraph">
<p>The bits that are left are corner cases in the code generation engine. There are things like intrinsic functions that need to be wired up, special semantics to implement, and so on. In particular, right now, I&#8217;m working on solving a corner case with the <code>nothing</code> concept. <code>nothing</code> is an Epoch idiom for expressing the idea that there is no data; except, unlike traditional <code>null</code>, <code>nothing</code> is its own <em>type</em>. If something has a type it cannot be <code>nothing</code> - again, unlike <code>null</code>. The usefulness of this may seem questionable, but the distinction makes it possible to avoid entire classes of runtime bugs, because you can never "forget" to write code that handles <code>nothing</code> - the compiler enforces this for you!</p>
</div>
<div class="paragraph">
<p>Anyways, the trick with <code>nothing</code> is that you can pass a literal <code>nothing</code> to a function as an argument, to signify that you have no semantically valid data to pass in. This is handled correctly by the parser and type checker, but falls down in code generation because we can&#8217;t actually omit the parameter from the function call.</p>
</div>
<div class="paragraph">
<p>What happens is the code generator creates a function with, say, 3 parameters. If the second parameter is <code>nothing</code> at a call site, we have to still pass <em>something</em> over to the function, from LLVM&#8217;s perspective. So we generate a dummy parameter that essentially translates the <code>nothing</code> semantics into <code>null</code> semantics - something LLVM can recognize.</p>
</div>
<div class="paragraph">
<p>Now things get complicated.</p>
</div>
<div class="paragraph">
<p>If we have an algebraic sum type that includes the type <code>nothing</code>, and we pass a sum-typed variable into a function which expects <em>concrete</em> types, the code goes through a process called <em>type dispatching</em>. This process basically matches an overload of a function with the <em>runtime</em> types of the arguments passed in. Think of it like virtual dispatch with no objects involved. (Strictly speaking, type dispatch in Epoch is <em>multiple dispatch</em> rather than the <em>single dispatch</em> seen in more popular languages.)</p>
</div>
<div class="paragraph">
<p>To facilitate all this, the compiler inserts <em>annotations</em> into the code, so that it can deduce what set of overloads to choose from when the runtime dispatcher is invoked. Some of these annotations survive at runtime - analogs of <em>virtual-table pointers</em> in C++.</p>
</div>
<div class="paragraph">
<p>Annotations are passed as hidden parameters on the stack when invoking a function. And at last we reach the real wrinkle: a <code>nothing</code> annotation can come from <em>two distinct places</em>: either the construction of a sum-typed variable which allows <code>nothing</code> as a base type, or a literal <code>nothing</code> passed to a function call.</p>
</div>
<div class="paragraph">
<p>The headache is that, to LLVM, <em>both uses look like a function call</em>. There is special case logic that exists to fix up the annotations for sum-typed constructors. Unfortunately, that logic collides with the logic needed to fix up annotations for general function call usage because LLVM doesn&#8217;t know the difference.</p>
</div>
<div class="paragraph">
<p>It&#8217;s an imminently solvable problem, but it&#8217;s a headache. Hopefully once this bug is gone there won&#8217;t be <em>too</em> many more to swat before I can start code-generating working 64-bit compilers.</p>
</div>
<div class="paragraph">
<p>(Spoiler: I&#8217;m not optimistic.)</p>
</div>]]></description><link>https://apoch.github.io/blog/2017/05/29/Epoch-64-bit-self-hosting-progress.html</link><guid isPermaLink="true">https://apoch.github.io/blog/2017/05/29/Epoch-64-bit-self-hosting-progress.html</guid><dc:creator><![CDATA[Mike Lewis]]></dc:creator><pubDate>Mon, 29 May 2017 00:00:00 GMT</pubDate></item><item><title><![CDATA[Welcome to the Bag of Holding]]></title><description><![CDATA[<div class="paragraph">
<p>This is a quick test of HubPress.io to see how I like it. Assuming all goes well, I will probably resume posting Bag of Holding entries here soon. Ancient archives from the Bag of Holding are on <a href="https://www.gamedev.net/blog/355-the-bag-of-holding/">GameDev.net</a> if you happen to like my writing. They are very, very old though.</p>
</div>
<div class="paragraph">
<p>For now, here&#8217;s a sneak preview of where <a href="https://github.com/apoch/epoch-language">Epoch</a> is headed:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>simplelist&lt;integer&gt; types = new 0, nothing</pre>
</div>
</div>]]></description><link>https://apoch.github.io/blog/2017/05/28/Welcome-to-the-Bag-of-Holding.html</link><guid isPermaLink="true">https://apoch.github.io/blog/2017/05/28/Welcome-to-the-Bag-of-Holding.html</guid><dc:creator><![CDATA[Mike Lewis]]></dc:creator><pubDate>Sun, 28 May 2017 00:00:00 GMT</pubDate></item></channel></rss>