I support the idea of data sharing and transparency of analyses. This is particularly relevant for clinical trials that may produce data useful for ancillary analyses.
There are some issues that may arise when considering secondary analyses of administrative data. The most significant issue related to my work has to do with the data use agreement (DUA) with CMS, which requires data destruction after the study is complete and the DUA expires. It's possible that they might allow continued holding of data for a requirement like this, but I'm not sure. CMS has become somewhat more restrictive with durations of DUAs, now requiring them to be renewed on a yearly basis. This adds personnel time to the cost of holding data for 7 years, in addition to data storage costs . Holding data may also require an IRB approved project, which means additional administrative costs for maintaining IRB approval and IRB oversight. I'm also all for reusing data that has been purchased for new studies, or even replications, but these require data reuse agreements which have a cost. The bottom line of these comments is to ensure that CMS regulations or contract arrangements relating to the use of other administrative claims data providers (e.g. large insurers) are considered when thinking about this requirement. I suspect the cost of maintaining copies of these large data sets is also somewhat higher than more typical smaller data sets, just given the size, though I don't think our data storage costs are particularly prohibitive.
From a practical standpoint, I also want to note that the processing of raw CMS data to produce an analytic file isn't typically completed in one concise analytic program. Our core set of programs to complete our primary study numbers a dozen or more, and it may require significant orientation to the data and programs for someone to try to interpret what they're doing. Even the source files being used in each program may not be obvious, depending on the naming conventions used when reading the raw data into SAS or another program's format. Beyond that, we have a large number of exploratory analyses that led us to choose the methods we chose. Assumedly these would not be required to be shared. The main point of this is just to consider the additional complexity introduced when there are multiple programs in different places on a server that all combine to produce an analysis. We have organized and documented these in such a way that they could be shared, and are open to sharing these with other investigators looking to reproduce our methods, but it does add layers of effort and complexity beyond sharing the typical analytic program for an analysis-ready data set.
In summary, I support the idea of data sharing, but am hopeful that the complexities of doing this for studies using large administrative data sets such as CMS data sources are considered when shaping the requirements.
There are some issues that may arise when considering secondary analyses of administrative data. The most significant issue related to my work has to do with the data use agreement (DUA) with CMS, which requires data destruction after the study is complete and the DUA expires. It's possible that they might allow continued holding of data for a requirement like this, but I'm not sure. CMS has become somewhat more restrictive with durations of DUAs, now requiring them to be renewed on a yearly basis. This adds personnel time to the cost of holding data for 7 years, in addition to data storage costs . Holding data may also require an IRB approved project, which means additional administrative costs for maintaining IRB approval and IRB oversight. I'm also all for reusing data that has been purchased for new studies, or even replications, but these require data reuse agreements which have a cost. The bottom line of these comments is to ensure that CMS regulations or contract arrangements relating to the use of other administrative claims data providers (e.g. large insurers) are considered when thinking about this requirement. I suspect the cost of maintaining copies of these large data sets is also somewhat higher than more typical smaller data sets, just given the size, though I don't think our data storage costs are particularly prohibitive.
From a practical standpoint, I also want to note that the processing of raw CMS data to produce an analytic file isn't typically completed in one concise analytic program. Our core set of programs to complete our primary study numbers a dozen or more, and it may require significant orientation to the data and programs for someone to try to interpret what they're doing. Even the source files being used in each program may not be obvious, depending on the naming conventions used when reading the raw data into SAS or another program's format. Beyond that, we have a large number of exploratory analyses that led us to choose the methods we chose. Assumedly these would not be required to be shared. The main point of this is just to consider the additional complexity introduced when there are multiple programs in different places on a server that all combine to produce an analysis. We have organized and documented these in such a way that they could be shared, and are open to sharing these with other investigators looking to reproduce our methods, but it does add layers of effort and complexity beyond sharing the typical analytic program for an analysis-ready data set.
In summary, I support the idea of data sharing, but am hopeful that the complexities of doing this for studies using large administrative data sets such as CMS data sources are considered when shaping the requirements.