Revision history [back]

Hi Aaron,

We don't have a resource on this but we'll consider keeping some sort of document going forward as it's a good idea.

Generally, the dependencies for each tsv are set based on a combination of engineering judgement, what variables are available in the input dataset, and how large the dataset is. When we observe that a housing characteristic varies with a variable, we tend to include the variable as a dependency. This observation can be simply through plotting of distributions or comparing the amount of information gained marginally when a variable is included or excluded as a dependency using a tool we developed during the End Use Load Profiles project. We do the latter more when we need to limit the number of dependencies a tsv file can have due to the size of the dataset. We try to ensure that each row in a tsv is informed by at least 10 data samples so smaller input datasets usually mean a smaller or a coarser set of dependencies to maintain this fidelity.

Another thing we do when developing the tsv distributions is to ensure that we are capturing the relationship between the housing characteristics and each of its dependencies from the input dataset. We do this by visually comparing the marginal probabilities from the tsv (i.e., distributions by one dependency at a time) as determined by the row probabilities and the sampling_probability column (the latter of which accounts for the impacts of upstream dependencies in ResStock) to that from the input dataset.

Hope this helps provide some context as to how our dependencies are determined.

Best,

Lixi