Where is the resource outlining how the dependencies of the ResStock housing characteristics were determined?

asked 2023-12-28 12:35:39 -0500

carolyng
75 ●1 ●6

updated 2023-12-29 14:44:24 -0500

14060 ●81 ●24 http://bigladdersoftware.com/

I am looking to use the ResStock housing characteristics probability distributions to generate residential populations for a different modeling tool. Within each tsv file, the dependencies of each characteristic are noted. I am looking for a resource that outlines how those dependencies were identified. I would assume some correlation analysis was completed to identify the interdependencies.

edit retag flag offensive close merge delete

add a comment

answered 2024-01-03 11:27:01 -0500

lixil
31 ●1 ●1

Hi Aaron,

We don't have a resource on this but we'll consider keeping some sort of document going forward as it's a good idea.

Generally, the dependencies for each tsv are set based on a combination of engineering judgement, what variables are available in the input dataset, and how large the dataset is. When we observe that a housing characteristic varies with a variable, we tend to include the variable as a dependency. This observation can be simply through plotting of distributions or comparing the amount of information gained marginally when a variable is included or excluded as a dependency using a tool we developed during the End Use Load Profiles project. We do the latter more when we need to limit the number of dependencies a tsv file can have due to the size of the dataset. We try to ensure that each row in a tsv is informed by at least 10 data samples so smaller input datasets usually mean a smaller or a coarser set of dependencies to maintain this fidelity.

Another thing we do when developing the tsv distributions is to ensure that we are capturing the relationship between the housing characteristics and each of its dependencies from the input dataset. We do this by visually comparing the marginal probabilities from the tsv (i.e., distributions by one dependency at a time) as determined by the row probabilities and the sampling_probability column (the latter of which accounts for the impacts of upstream dependencies in ResStock) to that from the input dataset.

Hope this helps provide some context as to how our dependencies are determined.

Best,

Lixi