Reading and using EpiJSON data#
EpiJSON is a framework which tries to capture epidemiological information in a JSON format 3.
PyGOM provides the functionality to process EpiJSON data with a view to preparing it for its various modelling features previously discussed in this guide.
The input can be in a string format, a file or already a dict
.
The output is in the cumulative form as default, shown below, in a pandas.DataFrame
format.
from pygom.loss.read_epijson import epijson_to_data_frame
import pkgutil
data = pkgutil.get_data('pygom', 'data/eg1.json')
df = epijson_to_data_frame(data)
print(df)
Death
1854-04-12 17:56:51+00:00 1.0
1854-04-14 18:02:26+00:00 2.0
1854-04-15 01:09:18+00:00 3.0
1854-04-15 20:09:58+00:00 4.0
1854-04-15 21:24:46+00:00 5.0
1854-04-16 03:45:29+00:00 6.0
Given that the aim of loading the data is usually for model fitting, we
allow EpiJSON as an input directly to the loss class
EpijsonLoss
which uses the Poisson loss under the hood.
from pygom.model import common_models
from pygom.loss.epijson_loss import EpijsonLoss
ode = common_models.SIR_norm([0.5, 0.3])
obj = EpijsonLoss([0.005, 0.03], ode, data, 'Death', 'R', [300, 2, 0])
print(obj.cost())
10.865594602563727
print(obj._df)
Death
1854-04-12 17:56:51+00:00 1.0
1854-04-14 18:02:26+00:00 2.0
1854-04-15 01:09:18+00:00 3.0
1854-04-15 20:09:58+00:00 4.0
1854-04-15 21:24:46+00:00 5.0
1854-04-16 03:45:29+00:00 6.0
Given an initialized object, all the operations are inherited from BaseLoss
.
We demonstrated above how to calculate the cost and the rest will not be shown for brevity.
The data frame is stored inside of the loss object and can be retrieved for inspection at any time point.
Note
Initial values for the states are required, but the time is not. When the time is not supplied, then the first time point in the data will be treated as \(t0\). The input Death indicates which column of the data is used and \(R\) the corresponding state the data belongs to.