Questions about data

For the gender dimension, which genders do 1 and 2 refer to, respectively?

Are you looking int he Github repo? That is some sample data, and in there 1 is male and 2 is female. We just have those to show an example of preprocessing. The actual predictor takes in the strings “male” and “female”.

Yes, was looking at the example input. Thanks, that’s helpful!

Is there a step missing in the example code to map the ICD10 codes to the corresponding CCSR category? The nodes specified in ccsrNodes.txt are not referenced to ICD10, to my knowledge.

In the example code there is a function called “getTestDataFrame”. Inside this function if you look at the section titled “Generating the features for each node”, it contains the logic to map the ICD-10 codes to the appropriate CCSR column.

We are coming out with a new version of the code that will automate this more.

In person.csv, what does the flag column represent? I’m wondering if I need to include in the person input. Thank You!

No, you don’t need to include that column. The example data we include here is part of a larger synthetic data set we have and the flag column came from that. It isn’t used for this model.

Yes, I am using that function but the features are mostly showing up as False when applied to our claims data. I was trying to investigate, but I will be on the lookout for a newer code version.

Never mind, I discovered the cleanICD10Syntax function was not necessary because our ICD10 codes were already formatted correctly.

The diagnosis categories are pretty specific, so most of the columns will be false most of the time.

I have been running your notebook on my Linux VM (Python version 3.5.2) and run into the same error every time when I try to run your code “from cv19index.predict import do_run”. I’ll paste it here.

Traceback (most recent call last):

File “/usr/local/lib/python3.5/dist-packages/IPython/core/”, line 3326, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)

File “”, line 1, in
from cv19index.predict import do_run

File “/usr/local/lib/python3.5/dist-packages/cv19index/”, line 92
SyntaxError: invalid syntax

I thought it might be a data problem so I ran it with your data and hit the same error. Installing your package works fine but I hit the wall here every time.

Mike Joyce
Senior Data Scientist
Ascension Technologies

Hello. I have a question regarding a number of observations in input and output data files. The example data has 1000 patients based on ‘personId’ column from person.csv file. However, example_input.csv file has 1069 rows and some patients have more then one row (for example, person with id ‘57dec386b49374b3’ is listed twice). When I looked at example_output.csv file the same id is listed 8 times. Please, help me to understand these differences in number of unique patients. Many thanks in advance!

That is actually an error and will be fixed in an upcoming version.