Resources

Benefits and constraints of shared software teams

2024-04-08T15:54:40+00:00

Shared Software Team

If you as a customer want software developed, supported & maintained - but the size of your project is such that it doesn't require a full time person, let alone a software team of multiple people - then shared software team is an option for you. Some examples of where it can be commonly seen.

B2B product implementation / customisation projects
CMS (like Wordpress) based website development
Small custom application development projects

It is lot more common for smaller business customers and non-profits organisations. In my experience when in large business software projects - this was rarely seen.

What problem it solves and how

When the software needs to be developed over disjointed periods of time, it may need technical support in an unpredictable manner (at any point in time). A shared software team can solve this problem. As the name suggests there is a permanent software team that works on these multiple software projects and provides production support for them. While the projects start, stop, end, resume, etc - the team is always in place and busy working on some project or the other.

Such software teams are built around a theme. The theme could be a product (like Avni, Wordpress) or simple applications developed on common tech stack (like Ruby on Rails, Django, etc). This theme allows the team members to be able to switch across projects and still be quite productive.

Economically a shared team, helps the customers in not having to pay for software services when they have no work to be done while still having a team that they can depend on when they do have work. It is also a fit for the need of ongoing production support that can be provided by the team - on continuos basis.

The above is quite an intuitive idea and that is why shared teams are quite prevalent. But the downsides or tradeoffs are not quite obvious and well understood - at least it was not for me. Lack of awareness of these downsides can make the experience of the software teams as well as that of customers quite frustrating and unpleasant. Customers may even believe that their software team is not capable while the issues often are inherent to the model itself. The main goal of this article it to enumerate and explain these tradeoffs.

A shared software team technically works because of the familiarity with the codebases across projects. The familiarity come from uniformity due to - common approach, language, libraries, code pattern etc. The less uniform the code is, even on the same platform, lesser the number of projects the team will scale to. Standardisation is key. It could be enforced by the underlying platform (done quite well by Wordpress and web in general) or by the software team themselves.

Tradeoffs and constraints of Shared Team model

Unless the team is created for a specific domain - the domain of the customers can be many. This means that the software team is really not an expert in any of the domain.
Discontinuity has cost for software team. Even for the same person working on a project, coming back after 3-6 months may mean significant loss of context and understanding. This is especially severe for production support. The customers on the other hand, have no discontinuity on their side as they are using the software regularly. The loss of context causes basic errors that is difficult to accept / appreciate for the customers. Customers may end up explaining the same business / domain concepts and sharing same information with "each new person". This can be frustrating for them.
I have found that switching across software projects is psychologically quite demanding for all members / roles of the software team. From their perspective as soon as they start becoming comfortable with a codebase/project/domain/customer they have to switch to another project.
Attrition in team members has severe negative impact on shared teams than a dedicated team. One person leaving may take away significant percentage of context for any given customer, if not nearly 100% sometimes. This is a risk that customer carries.

What one customers and the software team do?

It is better to work in larger batches as much as possible. It is better to batch smaller requirements, from a customer, over time into a small project and then execute in one go than work on them in smaller sizes. This reduces discontinuity in work and switching to some extent.
It is better for customers to have more quality safe-guards on their end compared to dedicated team software projects. The defect rates, after first release, are likely to be high from shared teams.
People with some technology skills on customer end can be quite a boon in these arrangements. Such people lend continuity, context, and specific technical skills that the shared software team lacks, by its design.
Sustainable and useful documentation. Good documentation can help team members new to that project to learn quickly - but it has to be done well and then used too. Having said this - increasingly I find people have lesser appetite to learn from documentation and show preference for learning by working oneself as well as learning orally from other team members. The team should decide based on their own context on this item.
Customers should ask the software team to explain their working model - so that they can also adapt their own processes based on this.
Finally - I believe the most helpful thing is to reset expectations if one has worked in or with dedicated software team. These expectations are unlikely to be met and may cause only frustrations to all involved. And of course nothing like having a permanent team.

What one should avoid?

Avoid non-standard type of work, even if it is technically possible for the team. Always ask the question will someone else in your team able to understand this after 6 months. It is very important to reflect and understand what is "standard" work. The scalibility of the software team depends on this. It is possible that both team and customers may want non-standard projects. The team may want it as it provides some diversity in work. This is where the "business" should come in and decide whether it is a good idea from long term perspective or not.
Sometimes frustrated customers may want to the take path of creating their own in-house software team at reduced cost so that they have a permanent team. But in my view, it is quite difficult to pull this off for non-profits. We have written about this.

Planning for security testing of open source projects

2023-09-05T04:39:16+00:00

Open source projects in social sector undergo security testing when they deal with a largish entity like government, who have security standards for what they will run in their data centre. Having gone through security testing for three projects now - Bahmni, Gunak and Avni, we see that there are many commonalities in the process, testing, issue types, and their resolution. These two write-ups (this is first) is aimed to help anyone going through such process. In this we discuss considerations for planning for security testing. That is, imagine when you have a software and the customer wants it to take it through security audit - for clearance.

Following are the main topics that one needs to think-about and plan for

Scope of testing
Server environment in which the testing will be done
Process - from testing start to closing all the issues
Preparedness for negotiation on the resolution of the issues

Scope of testing

Usually the scope of testing may be clear i.e. involving everything. But it maybe useful to work backwards from production environment to make a list of components that may be in/out of scope. This is quite useful in the following scenarios:

You have an open source project with multiple components and not everything may be used in the project/implementation in question. For example in the Bahmni deployment we were worked on - Odoo and PACS was not used, so those go out of scope.
You may have replaceable components. We use Cognito and S3 for our own hosting of Avni, but the customer's deployment may use open source equivalents like Keycloak and Minio. This is of course assuming that your product allows for such switching.
There are components that need not be made available publicly. For example - you may be using a BI tool that falls here because it is meant to be used by customer internally. If possible then you may want to restrict its usage behind VPN for select users - hence taking it out of scope.
The platforms that are configurable like Avni per customer, then one can define the scope of testing for the features relevant to that customer.

In case of software platforms the above decisions should be reflected in the deployed system. If the deployed system doesn't reflect the scope of testing - then it will like cause a lot of communication issues and perhaps wasted effort for both the teams.

This process should also help bring out questions like - look we are using Power BI but that is not something we can do anything about, so should it be even part of the testing? If such a component is not part of the testing then how should we one go about certifying them, requires discussion.

Isolate important systems from the blast radius

In social sector open source projects, funds are scarce and the temptation maybe to use one of the existing deployments to perform security testing on. But this may create bigger problems. It is important to setup the security test environment in such a way that there are no important systems in its blast radius.
About blast radius
When offering one's system for security testing, one must be pessimistic. For example - one should assume, whether likely or not, that the security testing team can find a SQL injection attack and execute any SQL on your database. Even if it is remotely possible for them to do this, then you must setup your security test environment to protect other important system from such an event. It may be important to consider the shared development environments as production environment of as well. If shared development environments are corrupted they can result in loss of days/weeks of productivity for entire team.
It would be ideal if you have an automated process and backups for setting up an environment as it is quite likely that your security testing team may come back to you to "reset" their test environment so that they can resume testing - since they have broken something and made the system un-testable.
Practical issues that I have seen happening here is that the security team can do is:

change the super-admin password and doesn't remember it
has deleted data/tables (including the base/user data)

Process

The team that is going for first/second time for security testing may have certain perceptions of the complete security testing process. The team that does security testing - does it daily. Their process is very clear to them. There is usually and naturally large process understanding mismatch between these two teams.
It is much better for the software team to get into as much detail as possible, right in the beginning to understand the process. The questions you can ask:

How many rounds of testing will there be?
How it will be reported - severity/priority/versions?
How issues will be tracked?
- This can be done quite poorly. The issues can end being all over emails, PDF, and excel files. It is better to agree on a shared editable document (like Google Doc) for error reporting.
The open source software teams must avoid putting up these issues in public domain or in open issue tracking system - since to avoid making the job easier for the exploiter and to not make your customer's nervous.

Negotiation

This is quite important part of security testing and issue fixing projects. While there are standards but application of standards has to be negotiated. This is because:

There are less nuances in the standard. The standard is based on the idea that all applications are the same and must adhere to the same standard irrespective of the number of users, type of usage, and the criticality of the data. While some issues like SQL injection, unprotected URLs are non-negotiable, but how much importance should be placed on social engineering attacks, rogue user, can vary.
While all the parties may want fix all the security issues, but there may not be time or resources available to fix all everything.

That is why relationships, consulting skills, negotiation approach, and above all collaborative process - matters a lot in such projects. It is in software team's interest to explain that issues or resources and time may come up even before security testing starts - as many a times the customer may also be going through this process for the first time.

In the next 1 or 2 article we will discuss the common (90%) of the issues that get reported and technical approach to solve them (in similar stack as ours).

Published On: 05 September 2023

Author: Vivek Singh

Logo credit: Raj Agarwal, https://www.canva.com/p/templates/EAE-T26xHMo-internet-security-logo/

Software volatility is a good thing

2023-06-02T07:08:11+00:00

What do we mean by software volatility

Software volatility is the variation in existing working functionality of a software product. A less volatile software is one that is quite stable in how predictably it performs all its user functions over time. Of course, we all want such systems and want that new features get added over time.

Users and customers convey this expectation to the leaders of the software team. The leaders communicate this to the engineers. They may even have quality metrics that measure number of new defects per release and keep them as low as possible. They may even judge the software team's capability based on this. THIS IS A FOLLY. Complex systems do not evolve in this way (may be simple software systems do). Similarly for customers to lose their confidence when a product degrades in certain areas once in a while is a mis-judgment. Why is this a so?

Low volatility system

When a software engineer is asked to add new functionality - he/she may instinctively or due to the nature of work environment avoid changing code that is already working and adds the new functionality in a way that the code for both old and new don't overlap much. This is a low risk way of adding software features and is quite common in our experience. When this process is repeated over and over again - everyone is happy.

This was one example but a software team can take other types of such short-sighted approaches:
- forking the code for a new customer because it is too risky to generalise existing code and jeopardise existing customers functionality
- not upgrading the underlying software libraries because the impact can be high
- not changing the code so that it more testable

- not deleting code

But this cannot last very long. Slowly this option of not changing the old code much (hence not improving it) and adding only the new becomes less and less available to the engineer. This leads to a very bad place. Now, there is a lot duplication in the code. Changes need to be made in several places for one thing. Some changes get missed out. But most importantly - it is no longer possible to reason about the code and change it. This is because the code was written with a different objective in mind - deliver new things without breaking old things - and not understandability of the code. Such low volatile software hide much bigger risks.

Why you must not take volatility for sign of a problem

Volatility by itself is not what we want, but if we build the software the right way there is no escaping functional volatility. Making software involves regularly improving the internal structure and organisation of the code - to be in line with the new reality i.e. updated functionality, usage, ecosystem. This internal improvement while in very short term (a few weeks) introduces defects, but the decision to go ahead and take this risk, improve the code, is essential to creating long lasting good quality software.

Published on: 02 Jun 2023

Author: Vivek Singh

Product integration as a solution approach

2023-04-05T15:08:23+00:00

There are several projects that do not neatly fit into a single product's target domain. For example let's consider a project that requires beneficiary data management from the field (typical Avni use case), but also requires stock/inventory management of items distributed by the field workers. When creating technical solution for such a project there are following options:

Find a product that does most and extend it's scope
Develop a custom solution
Use two functionally complimentary products
Integrate two functionally complimentary products but create a new user facing app

In this article we will discuss these options so that we can make a decision.

1. Find a product that does most and extend it's scope

Usually people who have this need approach a product maker (like Samanvay) and enquire whether the project's requirement fits their product. When there is a project, extending the scope of product can be quite tempting (because makers like to make things). The customers, in nonprofits, find it comforting too - that they can just use one product.
But mature product teams will avoid this route because:

Adding unrelated features to one's product makes the product development unsustainable because of increased complexity.
Orthogonal features usually get used by very few customers and slowly get less love and the quality deteriorates.

The decision of what goes in the product is a subjective one and the product team is best placed to make that decision. The product teams must explain their rationale to the customers.

2. Develop a custom solution

We have written a lot about this earlier here, where we give reasons for why in most cases for nonprofits this is not a good idea.

3. Use two functionally complimentary products

Before getting into this option, let us further detail our example. In our example project, let's say we have two set of users - field users and stock managers. The field users use the mobile app while they provide service in the field and the stock managers use desktop based web interface.
Potentially, we can use two separate products and give it to each type of users respectively. But what happens when we want the field users to see/manipulate stock data? They can use both apps - field app and the stock management app. But what if the stock management app is not capable of working offline or it is too complex for the user to learn and use two apps. More importantly what if the field users workflows requires managing beneficiary and stock data in the same flow (e.g. fieldworker hands over certain medicines to the beneficiary while doing anaemia screening).

Integrating two products can solve these issues. The field users can capture/view data related to stocks in field app's regular workflows (e.g. filling a form) and let the integration component ensure that the stock management system is updated. But the idea in this approach is not to develop all the stock management screens in field app.

The benefit of this approach lies in reusability of the integration component. The integration component can be generalised to cater to more use cases where there is are similar requirements. This makes the approach sustainable for the development team - as they have to maintain less software and hence for the customer ecosystem too.

4. Integrate two functionally complimentary products but create a new user facing app

Another approach is to not do integration but develop a new user facing app that uses the API of both the products. This approach is useful if the use case is simple. But as the use case becomes complex one may find that one is duplicating the user interface of both the products into this new app.

At Samanvay we have experience of having used option 3 quite a lot. In fact one of the value proposition of Bahmni was that instead of re-developing everything into the product it integrated 3 different products - OpenMRS, OpenELIS, and Odoo. Similarly, we have also integrated Avni with Bahmni using the same idea.

Author: Vivek Singh

Published On: 05 Apr 2023

REST API Pagination and race condition

2022-07-19T10:38:42+00:00

When we integrate two systems, we use the REST APIs to access the data. Using these API we get access to the historical data and once we have processed all the historical data, we want to get any of the newer data - as time passes. The APIs present the data in the form of pages of records by the source system (for more you can see the Feed and Pagination patterns). The pagination is usually done based on the last modified date time of the record. This is quite simple for the consumer of the API to understand as the data is arranged in chronological order of change. But there is something interesting going on with the records that have been modified in the past few seconds. If these few seconds are not handled by the API provider correctly, then it can cause a lot of issues for the consumer of the API and/or lost records.

To understand this, let's start with a simple system. The table below describes the sequence of the last modified date time assigned to a record, it getting saved in the database, and then appearing in a paginated feed.

STEPS	TIME	EVENT
S0	T0	The user saves a record
S1	T1	Server application assigns current time = T1 value to the records last modified date time
S2	T2	The record gets saved in the database
S3	T3	This record appears on the page requested by the API user at time = T3

Let's see what happens to the records coming into the system from various users.

STEPS	USER 1 R1	USER 2 R2	USER 3 R3	USER 4 R4	USER 5 R5
S0	T0	T100	T200	T300	T410
S1	T10	T110	T210	T310	T420
S2	T20	T120	T220	T320	T430

If an API user asks for all records newer than T0 and with pageSize = 3, the API will provide the data in two pages as [R1 (T10), R2 (T110), R3(T210)] followed by [R4 (T310), R5 (T430)] in step S3. For simplicity, we assumed that the time gap between various steps is the same across different users.

But this assumption cannot be made due to a few reasons - different amounts of data processed for each record, thread scheduling, IO availability, other processes interfering, etc. This lack of consistency in the time gap between S1 and S2 referred to as race condition, causes step 3 (S3) not to work as one would expect. Let’s see how.

Let’s use the scenario same as above but change the timings for USER 4 - note the time gap between steps.

STEPS	USER 1 R1	USER 2 R2	USER 3 R3	USER 4 R4	USER 5 R5
S0	T00	T100	T200	T202	T410
S1	T10	T110	T210	T208	T420
S2	T20	T120	T220	T223	T430

Since there are a couple of common ways to implement pagination, both described here - we will try to see what resources we get in response for both approaches.

1. Using offset and limit

(a) When an API user tries to get the first page at T222 (limit=3, offset=0). The user would get R1, R2, and R3. Then the API user navigates to the next page at time T230.

(b) When an API user calls at T230, s/he will get R3 again on the second page at T230 (limit=3, offset=3). R3 & R5 both will appear after T430. Note that R4 will never appear for this user. But note that if the user came much later (let's say T500) for offset=0 and offset=3 - one will get all the records as {R1, R2, R4} and {R3, R4}.

2. Using seek pagination

(a) When an API user tries to get the first page at time T222 (limit=3, time < T00 ). The user would get R1, R2, and R3.

(b) Then the API user navigates to the next page at time T230 (limit=3, time=T220). In this case, the API user gets only R5 on the second page (again missing R3). Again here if the user comes later s/he will get all the records as above.

The main reason R4 is being missed out is that the timestamp assignment has happened but it is not visible to the API user yet as it is not saved. The caller of API and the system assigning timestamp plus saving are in a race condition with each other.

Consequence

In some software systems, missing odd records may not be an issue e.g. getting notifications. But in some systems e.g. in the business process flow, this could be a bigger problem. If one is trying to read through multiple such APIs from a source, of dependent entities (e.g. customer, account, transaction) then missing customer records will mean also not being able to process all accounts of such customer records. In a large graph of entities, this can have a cascading effect.

Solutions

API Provider

Once we are aware of this issue there can be a number of solutions we can come up with using messaging, another data store, etc. But there is also a simple solution that one can employ without increasing the complexity by adding additional infrastructure. We have already hinted at that above.

Since the records can be missed only when the API call and timestamp assignment are happening in the same real-time. What if we can avoid this? If the API provider cuts all resources from the paginated response that have been created/updated in let's say in the last 1 minute.

Assuming that 1 minute translates to 10 units of T, let's see what happens.

1. Using offset and limit

(a) at time T222 (limit=3, offset=0) user gets R1,R2

(b) at time T230 (limit=3, offset=2) user gets R4,R3,R5

2. Using seek pagination

(a) at time T222 (limit=3, time < T00) user gets R1,R2

(b) at time T230 (limit=3, time=120) user gets R4,R3,R5

In your system, you can decide what is the right time duration to use for what we have used 1 minute. This time duration essentially needs to be the maximum expected time duration between S1 and S2. As you can see, in most systems 1 minute will be sufficient.

API Consumer

It is possible that you are integrating with another system and you do not have a way to get this race condition fixed. In such a case as a consumer of the API, you can figure out a way to not process records that have timestamps up to the last minute (or time duration of your judgment). It is important to note that you will need to ascertain that the timestamp you are getting in the API resources is semantically the same as what you expect it to be. Basically, you should rule out timezone and other such issues - so that get timestamp matches the wall clock time.

Conclusion

This issue may seem trivial to consider and solve - but for someone to technically support such systems in production, it is quite important that it was never there.

Author: Vivek Singh

Published on: 19-July-2022

Factoring software codebase size and complexity is most important factor to consider when taking ownership of generic open source products

2022-05-27T10:44:30+00:00

Since the time we have made Bahmni but more so with Avni, we get inquiries during our sales about customers wanting to own the software solution after the initial few years. This happens mostly with large nonprofits or when government entities are involved. Since these products are open source technically there is no restriction in taking the entire source code of the solution - without even asking the authors of the software. But in practice taking over open-source products is far more complex than it appears. In this article, we explain why this is so.

Bahmni, Avni, and other such products are generic in nature by design, so they can serve the purposes of multiple organizations with slightly different needs. We have written about the generic and specific nature of such software previously and discussed what it technically means to use open source. When considering such a solution from an ownership point of view one has to account for the complexity of ownership. These platforms are much larger in their codebase size compared to if the same solution was developed for one customer for the same use case. This is for two reasons:
1. Most customers of these products will not use all the features of the software. But there is usually no clean way to remove such unwanted code. Some components can be done away with but feature-wise removal of unnecessary code is quite difficult.
2. This is the more important reason. A generic software solution implements many things in an indirect manner. For example in the case of Avni usage for maternal and child health programs, one will not be able to find even the mention of the term pregnancy in the code, or any table called mother and so on. These are concepts implemented on top of generic entities like Subject, Encounters, Visits, and so on. For any specific customer, most of the generalization done in code is of little value. But more importantly, the code base is far more complex due to this necessary generalization.

Overall one may be owning the size of the codebase which is 5x to 10x larger in size compared to if one just developed a custom solution for oneself. For a software team that will maintain and enhance this software on their own - there is a lot of unnecessary complexity in the code without a corresponding value. The value of this complexity is only for the product team - because they want to solve the problems of multiple customers. Hence owning such a codebase requires a higher level of skills and team size, making it more expensive and not practical.

Surely for smaller generic open-source solutions some of these problems are easier to tackle - as even a generalized product code may not be so difficult to maintain.

This doesn't imply that one should develop one's own solution. We have discussed here why it is a better idea to use open source products and here why developing one's own solution is difficult. Developing in-house poses risks and costs which are not obvious. But in generic products, there is a complexity-related tradeoff involved which one should be aware of.

We have a couple of suggestions to resolve this problem to some extent.

Support the existing product team in ensuring that they continue to support you in the long run. Maybe for other reasons, but this is getting recognized across the world with open source software projects that underlie many software systems in use.
Expect these software products to have the integration ability so that in the future one can slowly migrate to a different system.

Author: Vivek Singh

Published on: 27 May 2022

Measuring and scaling community engagement

2022-03-23T15:12:57+00:00

The social programs in the domain of environment, water (like others) eventually aim to improve the lives of the communities in which they work. These programs aim to achieve this by working with governments and nonprofit providers. The last mile of the main stakeholders which is community members (or citizens) is difficult to bring into the intervention - as it increases the scale of effort required by a few orders of magnitude. This has been a challenge. The community engagement project described here is trying a simple technology-based idea to make a small headway in this direction. We have developed the solution but the idea itself came from the work of Arghyam and Vrutti. In the article, we try to explain the idea.

Context

Let's say as a social organization wants to improve the condition in villages in the following areas:

Creating pasture for farm animals
Pest and disease management in farms
Water, soil moisture, soil nutrients management

...and other such topics of concern. While one may have some knowledge and beliefs about what a particular community's needs are - how does one find out what the community's top needs are and hence what to focus on and improve upon. Secondly, how do organizations reach more and more people about spreading awareness on these topics? For organizational activities related to information dissemination at scale, this is a challenge.

Idea

The idea being tried out is towards this is as follows.

Create QR codes where each code links to a piece of web content (video, pdf in local language about a topic). One such sample content can be seen here.
These QR codes are distributed at the village level via posters, WA messages. The posters are put up at prominent places in the villages like the gram panchayat office, Anganwadi center.
When people scan these QR codes the link takes the user to a website that asks the user to share their location. Irrespective of whether they share location or not they are taken to the content. If they share their location then it gets recorded too. No other details are asked for.
If people find content relevant they would spread it themselves (instead of organizational effort alone to scale).

One of the posters used in Karnataka

This creates the 3 major data points for each scan - which QR is scanned, when, and from where. This data (as explained in the technical section) allows for, people who created these QR codes, to understand the interest level in topics by village/block/district and over time. Based on this data community-based organizations can further engage with the communities on the topics of communities' interest - for better outcomes.

Technical Implementation

The diagram below explains how the system works. qrd.by is a service (QRD) where one can design the QR code containing a hyperlink to the content which should be provided to the user upon scanning.
Once the user scans the QR code the user is first redirected to QRD. The QRD stores the location (if provided by accepting a pop-up message), DateTime, anonymized IP, a few other device details. The user is immediately redirected to the educational content.
Then (on a regular basis) the community engagement solution developed for this project, downloads this data from QRD.
The service uses then uses reverse geocoding service API provided by FES to resolve latitude/longitude data from QRD into the village, panchayat, block, district, and state information.
The service provides a dashboard (using Google Data Studio) based on all the data for organizations who created the QR codes and content, to understand the nature of engagement from the communities. This is achieved by securely exposing the PostgreSQL database to Google Data Studio.

Conceptual Data Flow

Screenshot of dashboard

Conclusion and Possibilities

It is important to mention that, the content to which the QR codes link can be continuously be improved upon based on feedback and changing times - with no on-ground work required.

The project is currently rolled out by FES in Rajashthan, Odhisha, and Karnataka. If you have similar requirements please get in touch with us, FES, Arghyam, or Vrutti.

Author: Vivek Singh

Published On: 23-Mar-2022

Making Gunak - enabling journey from access to healthcare to quality of healthcare

2022-03-07T06:22:28+00:00

Background

The first objective of the National Rural Health Mission (now just National Health Mission) was to provide access to the public healthcare system over almost the last two decades. As this objective is getting met to a great extent, the focus in the last few years has evolved from access to quality of health care provided in these hospitals and clinics. We are not experts but as we understand this quality involves - assessment/measurement, training/mentoring to improve quality, and certification. The scope of the Gunak system is helping in assessment and certification.

Towards this, we* have been engaged with NHSRC India in the development of Gunak for the last 5-6 years. Gunak is a software platform for various quality improvements programs run by the ministries of health (national and state) in India across the country.

About the program

The public healthcare system consists of various types of health facilities (clinics, hospitals) right from health and wellness centers (for every half a dozen villages) to medical colleges (which are tertiary hospitals, one or more in number in each state). NHSRC (part of the national MoH) has been working towards improving the quality of these health facilities. These quality improvements are related to the core functioning of the hospitals, focus areas of maternal and child health, and the cleanliness of facilities (under Swatcch Bharat).

Following are the key activities involved in this.
1. Development of very detailed checklists on which the health facilities can be measured. The checklists do vary by the level of facilities, programs, and sometimes by state. Some of the checklists are shared here, here, and here - under the tools section. For a quick feel of these checklists please see the images below.
2. Conducting training for quality assessors who can be internal or external. External assessors can be state or national-level assessors.
3. Verify assessments done by assessors and provide certification reports to the facility administrators.

SAMPLE from KAYAKALP Checkpoints

Sample of NQAS Checkpoints

What Gunak does

Broadly, the scope of the Gunak system is to allow for management of the checklists and performing assessments. For context, the Gunak system supports:

42 different types of assessment checklists
1,29,176 individual assessment units (called checkpoints) across them A single assessment may involve assessing thousands of checkpoints across various departments in a large hospital.
Gunak mobile is currently active on 9700+ mobile devices. These are assessors across the country who use the app to perform assessments.

Technically the software system to support this must be able to do the following (apart from the regular functions of such a system):

A platform for checklists where the users can define them instead of programmers. Essentially the checklists should be data and not code.
The assessors should be able to perform assessments and get insights into them from their own devices without unnecessary restrictions.

Main user components of Gunak

1. Mobile app which can be used by assessors to perform assessment while on the move within the premises of the hospital.
The mobile app allows for performing assessments without requiring the internet. Hospital buildings, the remoteness of some facilities, and the number of data points to be collected require the mobile app to work as efficiently as possible for the user.
The mobile app (available from Playstore) doesn't require any login, except when the user wants to submit their assessments. The assessor can perform a basic analysis of scores on the device itself and share a CSV format of the assessments via email or other channels. The ability to perform assessments without login is a win-win. The people operating the system do not have to issue login to everyone (as there are many assessors who may be doing internal assessments). Certain types of assessments require login as they may result in rewards or official certificates given to them.

Few screenshots of mobile app used by assessors

2. Web application to manage the checklists
Checklists are managed in two modes - bulk and corrections. Since a checklist could contain thousands of checkpoints entering them one by one is tedious and error-prone. Excel is a much better tool for this. Gunak supports importing checklists in excel form. Once checklists are imported into the system the Gunak provides the ability to edit them for making corrections, retiring them, adding new items.

3. Dashboard and reports for various state and national stakeholders to understand and analyze the quality of health facilities, districts, states across various types of checklists.

Comparison of average and median scores of various departments for a state (demo data not actual)

4. REST API for other systems to access the checklists and assessments done using them.
Almost all the data managed in Gunak is available via REST API for third-party systems to integrate with. The API documentation is available here. Gunak doesn't provide API for facilities because it integrates with the national registry for the same - hence for facilities, one must integrate with the national registry.

The software for Gunak is available here - https://github.com/nhsrc

* Disclaimer

All the ideas about the domain and use cases came from specialists at NHSRC India. We have interacted very closely with (in no particular order) - Dr. J.N. Srivastava, Dr. Deepika Sharma, Dr. Nikhil Prakash, Dr. Rashmi Wadhwa, and Anand Yadav.

Folks at Samanvay have all worked in an *individual* capacity with NHSRC over the last 5-6 years.

The initial mobile app design was created by Varun Pai.

Author: Vivek Singh

Published on: 07-Mar-2022

Deployment Architecture for Low Resource Contexts

2021-12-07T09:05:36+00:00

Point of work and reporting systems is one way to classify the low resource systems (LRS). It classifies the LRS based on the objective of such systems. LRS when considered from a technical implementation perspective, can be classified differently. In this article, we shall try to understand the key features of LRS - when considered technically.

Technical architectures of LRS make them different because of two main constraints - availability of Internet and technical complexity of deployment. These constraints restrict what we can offer to the users and where.
a) The user dimension can be split into single-user or multiple users applications.
b) Further, these users can use the system in the facility or in the field (on the move).

Let's cross these two dimensions and see an example of each type of LRS.

1. Single user in a facility - Classic data entry application
2. Multiple users in a facility collaborating with each other - Rural hospital system
3. Single user in the field - Community health worker application
4. Multiple users in the field collaborating with each other - Large health camps or refuge camps

1. Single user in a facility
From an engineering standpoint, there aren't many interesting things about such systems. So let's jump to the rest three.

2. Multiple users in a facility collaborating with each other
Engineers who are used to developing software systems in high-resource environments have a very high probability of making mistakes here - based on our numerous discussions. The biggest illusion is that a progressive web app can be used because they provide higher availability of the application to the user despite poor internet. But the users need to collaborate with each other in real-time e.g. if the doctor orders a lab test, the pathologist should be able to view the order when the patient turns up in their department. Each of these two users having a PWA is not sufficient.
There are two options here.
a) Deploy the server on the premises of the hospital available to the users from LAN. This provides a great user experience but requires local technical support to deal with issues of network, server, power. At scale, with a lot of hospitals, this becomes quite an expensive proposition.
b) Use the service from the cloud, but each facility has two independent Internet connections. The higher the independence of these connections, the higher is the likelihood of Internet availability. We have not seen this tried out anywhere but seems like a good idea - and less resource-poor context can benefit from it.

3. Single user in the field
Offline mobile applications are now widely used in low-resource settings. In simple terms, each mobile device needs to keep ALL the data, for their work area, that the field user will require. Important again to highlight that offline applications are not typical PWAs - which can hold user-created data on the device and save when the internet is available. The issue with PWAs is that they do not hold all the data that the user needs - hence the application becomes useless in many use cases.
But there are certain issues with offline applications too.
a) At some point the volume of data even for a single work area becomes very large. This causes the first-time download of data to take a long time.
b) Management of data for clients moving from one work area to another is a tricky problem.
These are not insurmountable problems but they do require better engineering and user experience design to make it work. Even we in Avni have not fully tackled these.

4. Multiple users in the field collaborating with each other
Since this is the same as 2 but in the field. In 2, while technical operations are a challenge, in 4, deployment is also difficult. This requires multiple computing devices, having the applications and connected to each other. Given the technical complexity and place of technology in the hierarchy of requirements - such systems are not seen in the real world. Perhaps they may remain so.

Author: Vivek Singh

Published on: 07-Dec-2021

Evolution of public system from reporting systems to point-of-work systems

2021-11-22T06:37:21+00:00

In a reporting software system (unlike point of work), the users don't use it when they are providing services to their clients (citizens). They use them later on, periodically, to report data about their work. Intuitively we like the point of work system, but it is the reporting systems that have dominated.

Why do reporting systems dominate?

The success and dominance of dhis2, which is a reporting system, over the last 2 decades are because it understood three important aspects of technology for development. The first two have been understood well, the last one is appreciated much lesser.

1. No code > Low code >> Programming - when it comes to distribution in LMICs. It pushes the cost down significantly, reduces project risk, requires less maintenance, and drives down the need to have developers available all the time.
2. Be a platform for both local tech communities, funders, and health organizations like WHO - making it easy for governments to get done.
3. Only reporting systems are feasible and point of work systems are not.

There is criticism of dhis2 based on what it offers as a user experience (not the user interface, which certainly can improve). The lament is that dhis2 is successful because of non-technical reasons. This is not true. The user experience offered by dhis2 is a natural outcome of the design of reporting systems. The real question is why point-of-work systems haven't succeeded.

Point of work and reporting system by example
Very simply, in such a system, the user does their work, and data gets generated and is useful for other purposes like reporting. Example of point of work system - a person doing vaccination of children is the user and the system offers workflow like - register, look up child, create a new immunization schedule, check due/overdue vaccines, give vaccine dose, so on. A reporting system on the other may be of one of the following types:
- At the end of the day report the number of jabs given, the number of children who came, missed jabs
- At the end of the day report child name, age, vaccine-given, etc.

The reporting systems can be quite sophisticated and can allow for longitudinal records, but essentially the design of such a system is to report and not to use previously reported data much, at least in ways that support the workflows of the user at the point of work.

Reasons reporting systems dominate in governmental systems

1. Point of work system requires a mobile device or a personal computer for each user. On the national/sub-national scale providing so many devices/computers to the users require a very large budget. Whereas with reporting system the users can use paper at their point of work and later use a common or public device/computer to report. Such implementations do decide the frequency of reporting based on the volume of data and availability/access of common/public computers.
2. The reporting of data from paper can be done by a data entry person as well (supporting several end service providers) - if everyone cannot be trained to use a computer or workloads.
3. A user who is once familiar with dhis2 can use the same product to report some other type of data for another program. Hence such users need less training subsequently.

In short reporting systems are very cost-effective.

Point of work systems will replace reporting systems with economic growth/development

As a country/state develops and can afford to spend more on technology, the reporting systems will have to give way to point of work systems. This is because the point of work systems can deliver significantly more value to all the users and stakeholders compared to a reporting system.
1. When software is used at the point of work, the user can become more effective and efficient - supported by features and data. They can provide better services to their clients (citizens) and provide feedback to the organization to improve the processes.
2. The quality of data generated is far superior because:
- Unlike in reporting where data moves from real-world to paper, then paper to the software system, in point of work it goes in straight.
- Data entry is quite mundane work and the users lose focus when they are entering data in bulk and they make mistakes.

Unlike reporting systems like dhis2 which can be developed once and used again and again, one will require many point-of-work systems for each domain-specific use case. This is likely to further elongate the life of reporting systems as it will be competing with all different quality of point-of-work systems.

Reporting systems will give way eventually, but they have served very well and they should get their due appreciation.

Author: Vivek Singh

Published on: 22-Nov-2021

Domain expertise doesn't travel from high-resource to a low-resource setup

2021-11-09T05:46:53+00:00

In software development, it is well understood that product development requires knowledge of the domain in which the software will be used. People who have this knowledge are called subject matter (domain) experts or business analysts. For example, for developing a hospital system you require hospital administration understanding, or for a tax filing system tax-accounting knowledge.

But, is hospital administration in a regular metro city hospital the same as the hospital administration in a rural (i.e. rural, tier-3, tier-4 city) hospital? At a technical level, it may appear that both have patients, departments, doctors, similar processes e.g. for outpatient registration->queue->doctor->test->prescription->pharmacy->home. One may use this understanding and develop a hospital management system - but will it work in the rural hospitals. We argue that a domain expert not exposed to the rural hospitals - will bake numerous bad decisions into the product, making it an inappropriate solution eventually. This phenomenon applies to many domains.

Let's take some examples of a rural hospital and understand how it is materially different from a tier-1 hospital.
Almost all patients do not take appointments before visiting the hospital. So if one develops a patient-facing appointment system, it is completely useless.
Patients are not good at keeping their older medical records. Hence usually hospitals tend to keep their record with them instead of handing it over to them.
Patient's phone number registered with the hospital changes all the time (either because their number changes or they provide different family members' numbers every time). As a result, patients take a lot of time at the registration desk - as the reidentification process takes a lot of time. The software should be sophisticated in this area.
Rural doctors see 2 to 3 times more patients. They simply do not have time to enter any data in the hospital system. Also, a token-based queue in front of the doctors' room is useless in rural setups, as many patients are not that literate to follow it. Usually, there is an usher.

You can read more about two other examples, recently in the news where this same phenomenon is in play.
a) COVID Vaccination system - https://www.livemint.com/opinion/columns/the-challenges-of-our-vaccination-drive-s-final-stretch-11631122982430.html
The system developed a pull-based system i.e. citizens will take appointments/walk-in and take vaccines. But in a low resource setup, it works based on a push mechanism. Pushed by health workers. Secondly it the very basic mistake of assuming good internet.

b) Nutrition system to be used by AWW - https://www.livemint.com/science/health/why-childcare-workers-are-suddenly-up-in-arms-11631117319254.html (use reader mode in firefox). Same and wrong internet assumption. Too complex workflows in the app.

Examining this as a software practitioner, we see that although we have a domain expert among ourselves but the domain in itself is quite different when we switch context to a low resource setup. We cannot assume even the basic understanding of low resource setup since most domain experts (like many urban folks) have almost never set their foot in such places - for any meaningful length of time. It doesn't come very naturally to someone who is in a high-resource setup.

In essence, we are dealing with an orthogonal dimension (see the diagram below) - to the extent that each domain expertise needs to be further qualified into a high resource (being default) and low resource.

Broadly, the general low resource expertise consists of deeply understanding how the following factors impact the inner workings of the domain for which software is built. This is not an exhaustive list.

Internet is not everywhere and/or is not available all the time.
Demand massively surpasses the supply of providers.
Local IT support is difficult to get and retain.
Client literacy levels are low.
Client finances are poor.
Clients have to work hard for their livelihood hence leaving less time for availing themselves of even important services.
Distances are high and play an important role (unlike cities)

In software teams, we should strive to have domain experts who have low resource understanding to avoid having serious design mistakes in our solution.

Author: Vivek Singh

Published on: 09-Nov-2021

Taking stock of the state of in-house lightweight data analysis

2021-10-13T05:17:04+00:00

Nonprofit projects are adopting software and data systems for monitoring and tracking purposes. These are done through pre-developed dashboards and reports. But in this article, we look at whether organizations are doing data ad-hoc data analysis to gain insights about their work, what determines the success, and inherent challenges of nonprofits in analyzing the data in-house.

In-house data analysis in nonprofits is traditionally been done by using tools like SPSS, Excel, Stata, and R. There are trained public sector professionals who specialize in these tools. This approach fits projects that require research. Usually, this is done by someone who is not running the program but by a dedicated personnel/team. Let's call it heavyweight in-house data analysis and very few projects are required to do this.

But is there is a much wider need for in-house data analysis which can be done in few hours than few weeks?
1. People who run programs have questions and curiosity about their projects. Since they also have data, they want to query the data to get some rough answers to their questions. This is extremely important for the continuous improvement of the program.
2. When a project is completed, the software provider hands over dashboards and reports. Before the data is in place and the project is in full swing - it is very difficult to pin these down exactly. But when the data is in place (maybe due to lack of funds) the funds for software are exhausted.

Even when the best of the analytics and data warehousing tools are available, user-driven data analysis has a learning curve. Without the conceptual/intuitive understanding of entities/tables, relationships between them, grouping methods, drilling along dimensions, etc - there is a limit to how much one can do. Many tools allow the above using visual methods, but one still will have to tell it what it should do. The tool cannot do that by itself. Hence in my assessment, there is not that much scope for tools to help you further.
On the other hand, budgets are constrained hence paying a software provider to help with reports, dashboards on an ongoing basis is an expensive proposition.

Some organizations that have long-running programs have been able to learn and get better over time with data analysis. Shorter programs that run only for few months, do not provide the opportunity - hence unless one already has someone in-house who can do it, it doesn't materialize - given other responsibilities.
Secondly, what follows is that there should be some funds available (or not used upfront) so that the right type of dashboard, reports can be built at the right time.

Finally, economically speaking, one can argue that the value of such in-house ad-hoc analysis has not been recognized yet fully, or maybe the value isn't there (at least relatively speaking compared to other priorities). But if the value is there and once it is recognized an economic solution will emerge. Organizations will have in-house people to do the analysis, or have formal training in data analysis for their program people, or will have funds to hire technical support for such work. To a large extent, it is not a technology problem.

Date published: 13-Oct-2021

Author: Vivek Singh

Functional considerations for integrating community health and hospital systems

2021-09-03T10:04:02+00:00

The main objective of the integration of data between hospital and community programs is to make the patient/member health data accessible to the counterparties - so that better health services can be provided to them. This is explained in more detail here. For the integration between these software systems to be of value let's consider the following four scenarios.

Common Identifier

It is perhaps obvious, but if this data has to be made available across these systems then we need an individual identifier (ID) or a reliable set of fields that can be used to link records. In Ashwini's case, community members are provided an ID in the community by the health workers. If the patient brings their ID card with them to the hospital then the hospital uses the same ID to register the person by putting some prefix to it which separates such patients from non-community patients.

Let's assume for the rest of this writeup that we have scenario 3 above as that allows us to explore all other issues that we may have to deal with - which may not arise in other scenarios.

Should a community member be automatically registered as a hospital patient by the integration process - if they are not present?
Automatically registering a patient in a hospital system may confuse the registration and hospital administration department - when the patient in future visits the hospital for the first time. It will also inflate the number of patients registered with the hospital, used for operational planning. Bahmni allows for keeping individual records without registration with the hospital - this provision is quite useful because if and when such patient visits for the first time, his/her community records will be ready to use after registration.

Now let's look at the reverse.

If there is a patient in the hospital but not present in the community program - should such a patient be automatically registered. Intuitively, we should follow the same as above but if a community program is where ID is provided then a member not existing indicates that something is wrong and hence better resolved manually.

Do not synchronize - Lab, Radiology Orders, and other workflow-related data
We decided that making these available in the community health system is not of much value. What community health workers will require is the results, not the order information. The order information is really about the workflow within the hospital. An even simpler example is if the hospital provides a token number to the patient, it is not all that relevant to anyone after the patient leaves the hospital. Making such records available in the community health system is not very useful.

For the inpatient, records synchronize only discharge summary
Making the outpatient clinical records available to the community health worker is quite useful. But when a patient is admitted to a hospital the s/he is prescribed and administered several medicines and many tests are conducted. We thought that making all these drug orders and lab results available to the community health worker is not going to be helpful (too much detail, perhaps also requires deeper clinical understanding to make sense of it). As in the same way, a patient rarely uses/requires such inpatient records after one is discharged. Patients mainly use discharge summaries. We decided to do the same for the integration, i.e. in case of inpatient visits, we synchronize discharge summary with the community health system and leave out drugs administered and lab results.

Hospital record drives the workflow of CHW, but the reverse is not true
Let's say if a woman delivers in the hospital. When she is discharged the CHW should follow up with her (and her child) for post-natal care (PNC). This implies that there must a be mechanism by which CHW can find out this information. There are a couple of features in Avni (the CHW system) that help here - Avni provides a dashboard where CHW can find such women from their catchment and perform the next step. Avni android app performs periodic auto-sync which ensures that hospital records are made available at the earliest.
The follow-up within the community after someone comes back from the hospital to CHW needs to be supported digitally because CHW's role is to provide proactive care. The same is not true for the hospital. Hospitals typically provide care only when a patient visits the hospital. Hospitals are not structured to provide proactive health services. In fact, the workflow from the CHW to the hospital is also facilitated by the CHW - by asking patients to visit the hospital. Hospitals are reactive setups.

To summarise the integration between community health and the hospital system should be based on understanding how patients behave and how community programs and hospitals operate. Lastly, the integration should leave difficult decisions of deduplicating individuals, deleting records, when to register, when to enroll in or exit from disease programs to the humans - because doing them automatically can create difficult-to-resolve data problems and may impact health services provided to an individual. Following do no harm is a good thumb rule.

Author: Vivek Singh

Date: 03-Sept-2021

The challenge of building technology team in nonprofit organisations

2021-01-07T09:37:50+00:00

The usage of software and technology in nonprofits is increasing. Nonprofits use software that are general purpose like office tool, video conferencing, excel, etc. But there are also solution that are specific to the sector (customisable products) or to be custom made for an organisation. Acquiring or developing them requires access to software development professionals. There are various ways in which the organisations can go about it.

Hire external software service provider for software project execution
Do long term partnership with a software service provider
Hire senior technology person inhouse and do one of 1 or 2 for junior people, or hire junior technology people inhouse and do one of 1 or 2 for senior people
Develop in-house software team

To help you decide what is the right choice for you, lets us lay down a few fundamental issues that have a bearing on it and how it impacts the outcome.

It is found that software teams understand less about your nonprofit ecosystem, organisation's context, and their requirements - since they often do not work for nonprofits. It is common to hear from nonprofits that communicating requirements and expectations is one of the biggest challenges. A common complaint is - "they (software people) just don't understand what we are saying and we don't understand what they tell us". Needless to say, this leads to unsatisfactory outcomes. This drives nonprofits towards seeking to develop in-house software teams as a solution. The logic is right but the conclusion is not. Let us see why.

Talented software professionals like working in teams where they can learn from each other about technology, design, etc. They also prefer a working culture suited for technologists. We cannot explain what this culture means, as it would take up a lot of space. But the key is that this culture is unique and is not easy to re-create/replicate in small embedded teams within large non-technology organisations. Hence when non-profits try to create in-house teams, the inability to hire the right skills, high attrition rates, short tenures etc - plague them. There is also the issue of compensation, which is well known and plays an important role too. Many organisations which have tried, find it extremely difficult to keep a stable team in house.

(here it must be said that there are exceptional individuals who are beyond these rules for their own special reasons)

So what should organisations do? Before we get to that it is worth reminding that the need for software teams is a second-order question. The first-order question is - do you need your own software built or can you use a product already available or customise one. Even if one customises an existing product, the demand for having a technical team reduces significantly. With that dealt with, let us get back to the question at hand - what should organisations do, if they do need a software team.

We believe that - partnering with software organisations that have a good understanding of the social sector is a sustainable long term approach. One can offload the problem of developing a technology culture and competence to the partner - who can focus on that part. Please also note that we have are saying partner, not provider. A partnership most importantly develops understanding of your organisation within the people of software organisation.

Finally, what about the other hybrid approach mentioned in point 3? We reckon that in theory, it is easier to have an inhouse senior person working with the external software team. But this runs the risk of a single point of failure which one needs to reconcile with. Lack of continuity can be quite challenging especially if technology is strategic to your organisation.

Conclusion

Professionals like doing work they find meaningful. Finding meaning is usually not an issue in the social sector. But professionals also seek fulfilment from their work which they get by applying themselves fully and constantly getting better at what they do. Meaningful work without fulfilment is insufficient and is not sustainable. A software provider can afford to assemble a larger group of people, in which people can be inspired and learn from each other, as it serves multiple organisations. This also provides opportunity for people to work on different problems - which will be difficult working in a single nonprofit.

Published on: 07-Jan-2021

Author: Vivek Singh

What does it mean to use open source?

2020-12-23T10:05:29+00:00

If I use open source then does it mean that

...everyone will have access to our data?

...we have to give our data to the organisation which provided us with the software?

...anyone can change the code and introduce defects in our system?

These are actual questions that get asked by our nonprofit customers. We usually take a step back and discuss what is software, how it's ownership works, data, and hosting. In this article, we go through the same process and try to explain this in as simple terms as we can.

These questions are not about licensing but what is the overall "customer experience" of open source. Through these conversations, we have understood that it is about - "open source" "software" and "reusable products". We have gathered that we cannot explain open source alone without explaining about software and product as well. But to keep the article shorter, we would stick to work-software-products that is used by organisations (like donor management system, accounting systems) and not by individuals for personal/official use (like MS Office, browser, WhatsApp etc).

Let's start by understanding what is software. Each software project exists in two forms - as passive source code in forms of text files, and as a running package on the server or your device. Please see the diagram below.

Let us expand the server a bit to see that the server also has your data stored in the database.

The power of software, unlike physical goods, is that it can be copied any number of times - provided it is functionally reusable (e.g. an organisation's website code is likely to not reusable for other companies). Hence we can provide it to multiple organisation at almost no additional cost. But, when we do that it is important to note that each organisation has its own servers running the software and their own databases.

Now let us get back to the top part of the above diagram - the three arrows. Even though the software has the ability to be copied an infinite number of times, there are legal and commercial restrictions to it. The organisation which owns the software programs decides who gets the software - usually with permission and fees. The other difference is that the organisation usually provides only the installable files and not the source programs. This is because mostly the customer organisation doesn't have much use for the source code.

Let us look at the ownership of each of the pieces now.

Now let us look at how it works in case of open-source software. The software source code and installable are both available free of cost publicly and anonymously. That is, you don't need to pay anything, take any permission, nor do your need to tell anyone. Like, commercial software earlier, you retain the ownership of the server and database.

With that context, let us try to answer the questions at the top of this article.

1. Does opensource mean everyone will have access to our data?

As you can see from the diagram above, there is no relationship between your data and whether the software is open source or not. Your data is owned by you.

2. Does opensource mean we have to give our data to the organisation which provided us with the software?

Again as you can see in the diagram, the open-source software provider (community) doesn't impose any such conditions on the users of their output. While they would like attributions, it is not binding.

3. Does opensource mean anyone can change the code and introduce defects?

...since anyone can copy the code and change it. Let us look at what would happen in this case. The diagram below has a couple of examples of such copying - one by another open source community or by someone in your organisation. Pertaining to the question - other people can make changes to the code but to their "own version" of it and their version generates their own installers. It doesn't affect the software that you are running from your provider (community 1).

Conclusion

One of the goals of this publication is to reduce information asymmetry between software providers and nonprofits. There are a number of misconceptions about the open-source which are not in the interest of nonprofits. In fact, we believe that given the public nature of the work open source and nonprofits are naturally aligned with each other. Finally, if there are other topics/questions about open source that you would like us to write about, please do contact us.

Author: Vivek Singh

Published On: 23-Dec-2020

Factors to consider when procuring mobile devices

2020-12-11T08:21:01+00:00

Mobile devices get increasingly used by frontline workers which are procured by nonprofits and provided to them. These devices need to be procured in bulk and hence it becomes important that the right decision is made. Let's look at various factors we should consider.

Perhaps this is obvious but better to get this out of our way. When we are discussing mobile devices it is implicit that it is an android device :-). In our experience, we have never seen anyone consider iOS devices because of the cost.

Form factor

Usually, a large screen device of around 6 inches serves most needs of the frontline work. There may be scenarios where even higher form factor may be ideal - like showing videos to a group, performing complex work with maps etc.

Tablet or Mobile

The answer overwhelmingly is to procure a mobile device. The fundamental reason being that android tablet market overall is very small compared to mobile. This has led to much better make quality and choices - for the price.

Battery life

The frontline worker is likely to use the device for work for 2-3 hours daily, hence one must choose devices with higher amp-hours battery.

Pocket/Purse friendliness

Pocket friendliness of mobile devices is quite an important factor from wear and tear perspective. Tablet devices being larger require a separate bag/box which is not only cumbersome to carry around (frontline work involves a lot of travel) but also prone to getting damaged during travel/movement. Pocket/purse is the best place for convenience and long life of the device.

Personal device or work only device

Should the frontline worker be expected to use the device for personal purposes or only for work? This is quite tricky to answer emphatically - hence we would only list down factors and let you make the decision.

Mobile device with the Internet is such a promising instrument for human learning and development that restricting its use - feels criminal.
But, when the device is allowed to be used for personal reasons it can result in various issues like virus, excessive use (limiting its overall life), running out of mobile data quota. Allowing personal use can also result in the device not being available for the intended purpose itself - because of potential "misuse".

Electricity

If homes of frontline workers are plagued by long and regular power outages then we should choose a device which has fast charging facility. One can consider using power banks if electricity supply is even poorer.

Backup devices

We have often found that an exact number of devices as the frontline workers are procured. When some frontline workers devices breakdown they switch back to the paper system. Such arrangements send mixed signals to the frontline workers - that paper system and digital system both must be maintained. This is sub-optimal and to some extent defeat the purpose of having a digital system. We recommend that backup devices must be planned for and to be made available in case of breakages.

Author: Vivek Singh and Arjun Khandelwal

Published On: 11-Dec-2020

Cover photo by pics_pd on Pixnio

Why it is difficult to implement a useful national personal health records system

2020-11-19T10:21:40+00:00

I fully appreciate the value of having an integrated digital personal health record which I can use when I go to the doctor, monitor my own health, save money from repeated tests, carry papers, the anxiety of losing records, and so on — and do all of the same for my family members too. Creating this is a noble idea. But it is difficult to implement. It is difficult even for governments to do this. I would try to go into the details of why it is done so and make it useful. (I fully understand that there is a privacy, data exploitation aspect of this too — and am sure many people have written about it, I am not focussing on that here).

Where our important health records are stored?

Independent laboratories
Radiology (X-Ray, MRI, Ultrasound) centres
Clinics
Hospitals

(in India pharmacies do not keep much health record of you as they are mostly like shops)

Of these, laboratories and radiology centres, usually are technically geared up and they could email you a PDF file of results if you ask for it. Lab Information System and Radiology Information System are common and your diagnostic results are kept in structured & digital form here.

Clinics are digitally least equipped. Well equipped ones may have an appointment system and billing system, but that's not what we are talking about here. All health records in clinics are in paper-pen form.

Hospitals are more complex places. One way to understand them is to consider them as a combination of lab, radiology and clinic plus few more like inpatient/critical-care wards, operation theatres and emergency. Now from a digital perspective, the following could be said about many hospitals in urban areas.

Interestingly you do not readily get a PDF file from radiology and lab in a hospital. Unlike independent diagnostic centres, their business is not about producing this output (PDF file) as most of the times the output given to the doctors (who use paper records readily since it is a lot easier for them).
All the examination, assessment, diagnosis and treatment details are on paper. Primarily because the workload on doctors is way too high and EMR technology is not mature enough yet to solve this problem. I know this from personal experience of developing Bahmni and is also well documented in the book Digital Doctor (highly recommended).
Good hospitals are great at doing operational things using technology (billing, inventory, appointment, registration, bed assignment and so on) since this has economic value. But when it comes to your clinical record, it is all paper. Some may keep scanned copies, but that's mostly to solve the archival problem, medical records do take up expensive real estate.

I have purposefully painted a picture that is true for urban setup, then for rural or resource-poor setups, because I can imagine, rural hospitals replicating this over time — having done many of these myself, as part of Bahmni. Government hospitals are more difficult because of resources, but it could be done in operational areas of the hospitals.

Let's examine the technology solution space

Broadly…

India luckily has had good IT capacity in the country leading to widespread implementation of software systems across the private eco-system (profit and non-profit) and to some extent in public too.
We perhaps have hundreds (touching maybe even thousands) of software vendors who have health information system business, having one or more products in the area.
If we count all the lab, radiology, clinic, hospitals systems we must have tens of thousands of distinct software running across the country (I mean unique software codebases). Sometimes even in a single large hospital, there are multiple software running e.g. different departments running their own specialised (or even non-specialised) software.
Almost all of them have been built without any agreed standard for storage and communication of medical records. The reason is simple if a hospital wants to shell out only 10–15 lakhs for software, the software vendor cannot find engineers and domain experts, who understand HL7, SNOMED, ICD10, and then implement it in the code (radiology images are an exception to this). In such competitive ecosystems (of hundreds of software vendors) this is expected. I do not blame the hospitals/clinics too, these standards make software expensive and economic value is non-existent.

Realising the value of personal health records

I have also been part of developing personal health records in the past, and I think that, developing a cloud-based, API based, secure, generic, extensible, downloadable — personal health record service is doable and fulfilling to develop. It is not very expensive to build it, especially when one is doing it for the whole country, the comparative fund for this is negligible.

But what happens after we have an API in the cloud? How does the data come into it, such that it is useful for citizens?

Obvious idea is that all these Labs, Radiology, Clinics and Hospitals must integrate with it. But this is not a one time work. Each one of these tens of thousands of the systems needs to be enhanced independently. There is no module that one can just stick into this software and they will simply integrate— remember they don’t talk in standards. Standards for data, metadata, data transfer, protocols — nothing. Who will pay for this? Once you start examining these systems, you would find all sorts of software — including ones developed by someone’s relative for free, who doesn’t write software anymore. Software integration requires better skills that writing applications.
The other idea is to take a few good software, that already integrates with the API and ask everyone to only use “certified” software. What would happen to the rest of the software vendors? Who would pay for the replacement? This is not practical and cost-effective either.
Let's keep ideating. What about placing data entry persons in each hospital? They feed in data directly into the personal health records system. Even if we set aside the cost of salary of data entry person, which hospitals now need to bear — what would be the quality of such health records? Remember our health records are very technical to understand, plus not always very legible.

I have been involved in brainstorming this even for a network of hospitals dealing with Cancer, which is much smaller scale, but pulling this off seemed like a mammoth task — for the same set of reasons.

There are examples of other countries where this has been tried and we can learn from this (I would love it if it can be done, ensuring privacy issues are well taken off). But it is important to understand that we do not have leapfrogging advantage in this case, applicable in other infrastructure/technology rollouts. The fact that we have been successful at digitising our hospitals etc, is a disadvantage in this case, sadly. Many countries who do not have this advantage may be able to leapfrog*, but for India, it seems difficult, quite difficult.

*Poorer countries have to decide whether to spend on personal health record systems or spend that limited fund for buying medicines, setting up hospitals, labs, paying doctors. The choice then becomes obvious.

Published on: 18-Jul-2018

Author: Vivek Singh

Technology architecture for public health service delivery and why designing system for PHC is difficult

2020-11-19T10:16:36+00:00

Health services are delivered in four kind of setups, illustrated by the following matrix — between resource availability and service team size (number of people working together to provide the service to their clients aka patients). It will become clear why I have chosen to highlight the number of people as a factor, because it is an important aspect of designing such systems, but not because of scale.

With this diagram, I am also trying to highlight that, resource availability takes a significant jump from secondary hospitals (running at district head-quarters) to tertiary hospitals in cities. As far as workload per staff is concerned, it is roughly the same throughout (or in the same order of magnitude).

What characterises a low resource setup though, from a technology rollout perspective?

Poor quality Internet (speed, latency, uptime, intermittency)
Low proficiency levels of (presumed) users
Low absolutely of IT support and administration personnel
Lack of funds for tech

Since many systems have been designed for large organisations in abundant resource setups, I will not delve into the architectural characteristics of such systems— there is more than sufficient material on this. Also, to highlight an architectural blind spot, I will get into the Primary Health Centre last.

Why do we need software systems in health service delivery setups?

Management of the facilities' operations (hospital/clinic/centre)
Storage and retrieval of health records of the clients
Standardising the work
Improving the availability of data within the organisation
Work-schedule management (applicable to primary and frontline healthcare which is proactive in nature — e.g. screening, followup)

Given this background of needs, context — let us try to examine the technical architecture constraints and options at each level.

Rural hospitals — District Hospital (DH) and Community Health Centres (CHC)

All hospital systems need to be highly available (one cannot afford to make the patients wait or have long queues because “system is down or slow”). Rural hospitals of this type are at district headquarters or places like it. Most district headquarters in India are not so well connected to have a highly available Internet connection — essentially ruling out the possibility of running such systems off the Internet*. One needs to run the server in a local area network (LAN) so that the connectivity is good and response time is small.

Running server in a hospital requires a local IT team to support the server, application, network and the user devices/machines. If this has to be done in hundreds of hospitals (for state government, missionary hospital chains), it becomes expensive and dependent on human capacity.

This not to suggest that it is technically not possible, some motivated organisations have been able to do this (running local IT setup) and as we get richer as a country, more hospitals will be doing the same, but at the scale it is difficult. One important thing to note is that the health data of patients is available only from the internal network of the hospital and is not available to someone who is outside the hospital (for example doctors who consult remotely or want to provide an opinion in certain cases). There are a few options:

Buy public IP and setup VPN (but poor Internet bandwidth and knowledge of how to connect via VPN acts as a barrier)
Publishing data centralised server is an option (the cost of the software and operating further increases the economic burden on non-profit or public hospital)

* One can definitely run systems off-Internet that do not make patients wait if the Internet is down. e.g. Inventory management, periodic reporting system, etc.

Front line health workers

On another extreme end of service delivery hierarchy are Frontline Health Workers (FHWs), who usually work by themselves in villages or in outposts alone, covering few villages. So, when one thinks of bringing digital technology to these workers, helping them in their work, the first thought is — well there is no Internet. But thankfully there are android devices — robust full-fledged computers with few GBs of storage.

First, let’s step back and understand the context of FHWs and their work. They are responsible for small areas. At sub-centre an ANM works alone. A sub-centre area is roughly five villages, i.e. around 5 thousand population — let’s call this their service area. Similarly, a village level worker, e.g. ASHA, is responsible for one village. ANM, ASHA and similar FHWs’ work require them to have historical primary health data of all the people in their service area, with them (i.e. without requiring Internet connection).

If we calculate how much data this really is, we would see that it is not all that much. At the rate of 25 KB per citizen, we are looking in the ballpark of 125 MB. Modern Android devices can handle this much data in databases like SQLite or Realm — in hands of good developers, providing great performance. Once this seems feasible, we have found a way out of Internet quality problems.

Each FHW can carry these android devices with work-app and complete database of their service area. Usually, this data is required by people in other parts of their organisation like medical officers, program designers, ministry. But this access though is not needed in real-time. In most scenarios, a delay of even a day is fine, in some scenarios even more. This allows for FHWs application to perform incremental sync (sync = send and receive data; incremental = any data not known to the other end) of the data with a central server, whenever Internet becomes available.

Few challenges still remain (but largely this is a viable and useful solution)

Sometimes users want to keep a copy of data on paper, in case their device is not working then they cannot access the records to provide the service — even though this feels wasteful. But even in this scenario the fact that the FHW doesn’t have to travel miles to file monthly reports is a big plus — to name one.

Primary Health Centres (PHCs)

There is one primary health centre, per 5–6 sub-centres, so we are talking about roughly 30 villages now in a PHC area. The typical rural primary health centre has (rather is supposed to have) a doctor, a nurse, a small laboratory, a small pharmacy and a registration person.

When designing software systems for PHCs, usually it is approached from two ends i.e. either thinking of it as a smaller hospital or a collection of front line workers in a facility. But both of these are fallacious approach. Let us see why.

When PHC is considered a smaller hospital, one tries to put a server (even if smaller in specs or running in one user’s desktop) in each PHC — running an application server and a database. But, there way too many PHCs than District/Community Hospitals — multiplying all the operational challenges listed above for the district hospital. The operational expense for support is too high for it to be acceptable. Just the user by themselves, cannot support this server and network to connect to it.

The other approach is, to think of PHC as a collection of field health workers. Hence instead of hosting the server locally, it is put in the cloud, and each user has an app like FLW has, fully capable in offline mode. But this design is flawed technically, as it misses an important difference, i.e. in PHC, unlike field worker’s scenario, “continuous sharing of data captured by one user with other users” (in other words users collaborate among themselves to provide service to the patient). For example, the data captured at registration desk should be sent to the server and be pulled onto the doctor’s (or any other department’s) device before the patient physically moves from registration to the doctor. This is not guaranteed to happen if Internet is unstable. Soon there will be a lot of real-user downtimes. For the first few occasions, people may tolerate such downtimes, but over time, users are bound to get frustrated and junk such a system because they cannot depend on it.

Then there are ideas like peer to peer data synchronisation between the devices in PHC. As far as I know, there isn’t a mature general-purpose-database that run on android or web-browser that can do peer-to-peer-multi-node-data-replication**.

(Many a time, progressive web apps are thrown in as a solution, and ordinary consultants like me need to explain why PWA is no magic to solve no Internet problem, at the risk of coming across as someone who is not latest with one’s tech)

Overall, designing a system for PHC is tricky and requires making more tradeoffs. We do not have standard technical architectures to solve this because there are not many use cases like this. While FLW and Hospital use cases are standard, PHC sort of falls in the weird and interesting middle.

More on this in, other options that don’t work and how we can think of solving it, in another post.

UPDATE (31/08/18): ** My colleagues at Samanvay pointed me to few databases that promise to be handy in solving this problem because they provide peer to peer replication—gundb, rxdb, orbitdb, couchbase lite. These indeed look promising, especially gundb.

Published On: 28-Aug-2018

Author: Vivek Singh

Creating a cost-effective multi-tenant platform

2020-11-19T10:16:35+00:00

Introduction

Multitenant products have become quite commonplace nowadays. Any B2B product deployed on the cloud needs to have some kind of multitenancy. The exact kind depends on the business case and the product infrastructure and development team structure. You have a bunch of options, each with its reason to exist.

Multitenancy provides isolation of data or processing (or both) of one tenant from others. The purpose of this isolation could have multiple reasons. It could be for better debuggability, providing access to tenants to parts of the data/infrastructure they care about, differently allocating resources to tenants, or for invoicing.

In this post, we talk about the different areas of an application, and how multitenancy manifests itself in each one of them.

The frontend server

Many workloads are frontend heavy. Such as e-commerce solution where each seller can Whitelabel their site. Hosted content-management solutions. If you are working on such a solution, then you have a few problems to deal with.

Isolation of issues to specific tenants — You might not want high traffic on one site to disrupt others.
Provision for additional infrastructure for specific tenants — You might want to measure how much each tenant uses, or provide a better experience to paying customers.
Separate logging, which might need to be exposed to tenants directly.

Usual patterns

One fat frontend — If you are not working on a frontend heavy workload, this is the best way to go. The one fat frontend means you still can scale (horizontally, or vertically based on what your preferences are).
One or more virtual machines per tenant — If all your tenants are decent sized to take up space of at least one virtual machine, have a lot of frontend customization, then providing virtual machines per tenant is a decent approach. Logs are automatically partitioned, machine stats are easy to debug per tenant, and billing is easy.
One or more containers per tenant — If you still need all the isolation that approximately equals a virtual machine, but have many small tenants, this might be a more attractive approach.

In any of these cases, keeping user state out of the frontend is usually a good idea. This is true even without multitenancy.

Another point of interest here is that isolation is often necessary when the assets of each tenant are different. If they are all the same, returns on isolation are lower.

Choose sophistication on the frontend based on requirements; Often, less is good.

Database

The next most important decision would be to decide how to isolate different tenants on the database. There are 3 approaches when it comes to relational databases.

1. Single database per tenant

If your tenants can pay for it, and the situation demands it (legal constraints etc), then this is the safest solution. It also happens to be the costliest. Why?

10 databases are 10 times costly (well, roughly) to purchase on the cloud than one.
Multiple databases mean multiple migrations which increase product release times. Imagine a 1-minute migration taking place for 100 tenants.
Multiple databases potentially mean different versions of the software on each tenant. It is normal for each tenant to have their team managing their software, which can lead to the addition of new items on the schema (if you are not strict enough), differing software versions per tenant, eventually leading up to multiple deployments of the same software. This is just multiple deployments of the same software in adjacent hardware infrastructure, not multitenancy.

There could be valid reasons to go this way but know that your deployment is getting costly.

Database-per-tenant provides the highest isolation levels in data at the highest cost.

2. Single database, multiple schemas per tenant

There are some minor differences between database-per-tenant and schema-per-tenant.

Schemas are comparatively cheaper but do not offer the same levels of hardware isolation that database-per-tenant provide
Database versioning pains still exist
You can afford to keep the same connection pool. Remember that this brings some complexity within the middleware code. We will need to prefix every query with the schema corresponding to the tenant.

3. Single database with a discriminator column

Here, a column (tenant_id) is present on every table of the database where there is tenant data. This provides the least isolation among the different solutions available. It also is the cheapest possible solution in the medium term.

You cannot have different versions of the software for every tenant if you use this approach (there are ways such as multiple deployments, but then you don’t call it multitenancy anymore).

Consider having a single database with discriminator option if you can. It works well in the long term.

4. Single database with a hierarchical discriminator

This is a slight twist on the regular discriminator column. If you have a lot of reference data that is common to many organisations, then it might make sense to have the ability to inherit that data. In this case, your tenant table will have a hierarchical structure, and your queries will start retrieving data belonging to you and your parent(s).

The case for hierarchical discriminators

Using hierarchical discriminators come with its troubles. Use with caution. At Avni, we were planning to have common metadata for programs that we hoped will be used by many tenants. To enable this, we started with hierarchical discriminators. Soon, this approach proved to be hard to maintain.

Sharing of data between unrelated organisations is painful — First, we noticed that there were minor changes each tenant wanted on the base metadata. We tried to fix this problem by providing options to override. Providing options to override came with its maintenance problems. The base common metadata was hard to modify because each change needed to be tested on all tenants that use the metadata. Sometimes, it was easy to test, but we had to go to every tenant to verify if the change makes sense to their organisation.
Slicing of common metadata is hard — The other problem with hierarchical discriminators was to find out the right size to slice the base metadata. We were adding base metadata for multiple health programmes into one big tenant, but our tenants needed only one. We could not implement multiple-inheritance on the metadata, which means a lot of metadata each tenant received was not required for their operations.

Finally, we ended up using plain old discriminator-based multitenancy, with the option to copy over metadata to a tenant where necessary. We still have hierarchical discriminators, but for a different reason. We are also seeing customer that want to run a program through their partners where they decide the metadata. The partners need to see only the transactional data they generate. Hierarchical discriminators make sense in such a scenario.

Different tenants MUST not share common metadata

Middleware design

If you are using discriminator column-based multitenancy, every query needs to be filtered using the tenant id. While it is possible to audit all your queries before deployment, it is an error-prone approach. All the more so when you have your ORM silently running queries on your behalf. The best way forward is to bake multitenancy at the design level of the server. All code assumes that there is just one tenant in the database.

If you use Hibernate, it provides options for switching databases when using database-level or schema-level multitenancy. It is comparatively simple.

Isolate multitenancy functions of your middleware through design

Multitenancy in Avni

We now discuss a concrete implementation — Avni.

Frontend — Avni has a fat frontend. This is because there are no tenant-specific modifications in either our API, admin app or our web-based data entry modules. We maintain the servers ourselves, and we don’t have a scaling problem yet. A fat frontend is sufficient for now.

Database — At the backend, we use a single database separating data using discriminators. This was primarily because our tenants mostly had low volume (10,000–100,000 transactions per annum), having multiple databases or schemas was unnecessary complexity.

We use the row-level security (RLS) feature available in Postgres. Each tenant gets their own Postgres role. All tables have RLS policies applied to ensure data does not leak between tenants. Now all that remains is to ensure that operations are performed using the role assigned to a tenant. The middleware takes care of this.

Middleware — Avni server identifies the user and their tenant from a JWT that comes from the client. This information is set at the thread level in a UserContext. The user context is then used by a tomcat JDBC interceptor to change the role of the database connection. The tomcat JDBC interceptor provides a hook at the connection level. This helps us keep all business logic free from the knowledge of multitenancy.

Before committing to the database, a hibernate JDBC interceptor ensures that all data inserted or updated by a user has their tenant_id. Once the request is serviced, we reset the role on the database connection.

Reports — Reporting servers identify each tenant as its database, with tenant-specific database user. This is not very efficient because Metabase collects database statistics on all the databases it has. This became a bottleneck because the number of tenants keeps increasing, adding to the load of the database. We switched Metabase statistics to manual. Metabase uses JDBC connection pool which provides control over the number of database connection per tenant.

The fat frontend — the web app and the android app are not multitenant aware. Nor are the queries for reporting or the business logic of the server. By containing multitenant behaviour to a small slow-moving area of the system, we can confidently work on newer features easily.

Published On: 03-Feb-2020

Author: Vinay Venu

Schemaless platforms

2020-11-19T10:16:32+00:00

A lot has been written about schemaless databases, but not much on schemaless platforms, on how to make them, how to choose one for your own software project — while there are so many of them around us. CMS, Electronic medical record systems, ERP, CRM, and Forms utilities, to name a few in common domains. Each niche sectors/verticals have such platforms too — like in the social sector there is Avni, CommCare, Bahmni.

What are schemaless platforms?

Schemaless databases allow you to perform data operations, without telling the database about the schema of your data. And they deliver good performance for such use cases.

Schemaless platforms, on the other hand, are designed to tackle the generic end-use cases in a domain and leave the specific parts for the user to define onto the platform. It is the second part (in design and tradeoffs) that this article deals with. At this point, it is important to delineate the term “user”. In most schemaless platforms there is a platform-user who defines their specific data model and an end-user who simply uses that solution. Consider Google Forms which is also a schemaless platform. The person who defines the form is a platform user and the one who fills the data in the form is an end-user. We will refer platform user as a user here.

Unlike Google Forms where the whole platform is about providing a schemaless facility, many products may have a schemaless platform embedded within the product. Such as — a health record management platform may provide features like registration, out-patient, in-patient, laboratory, etc. But within some of these modules, the user can define their own schema for data. For example —there could facility to define forms for different diseases within the out-patient/in-patient modules. The product maker makes a design choice that, it is best to leave some data modeling to the user.

Three types of schemaless platforms

Products where schemalessness is the defining feature of the product — like Google Forms, ODK, AirTable.
Products with an embedded schemaless facility in multiple parts of the system— electronic medical records, SalesForce.
Products that support the definition of custom fields, but they are not very powerful in what they allow — like multiple data types, skip logic, validations, calculated fields, schema migration, etc.

Why are schemaless platforms important?

Schemaless platforms offer an alternative to fully custom software in many scenarios — especially type 2 above. Type 2 schemaless platforms if done right are compelling because they solve a domain-specific problem and at the same time provide the user with the ability to customize as per their needs. We have also written about this here.

This article deals with 1 and 2 — because schemalessness is quite important to these products. We will get into broader technical issues here and go in detail of them in a subsequent article(s) in this series.

Making schemaless platforms

Database for schemaless platforms

Do I need to use NoSQL databases for creating schemaless platforms? Strictly speaking no, because there are ways to achieve schemalessness on relational databases as well. This can be done using one of the following approaches:

Entity Attribute Value (EAV) — In a nutshell, keep one row per field value. A key column that represents the name of the field and a column for storing the value. This is a good article that explains this pattern.
Embedded schemaless facility within relational database products. For example support for JSONB within PostgreSQL.
User-defined database schema — Here the user can specify the schema, using which the platform creates database objects (tables, columns, index, etc) — providing a schema full structure when deployed. This is followed by Strapi and Drupal. One cannot do this if you want to use a single database schema for multiple customers who will all define their schema for themselves.
Spare columns — The platform provides spare columns in the database tables, where it wants to provide support for user-defined fields. It can choose to represent all data types as a string or provide spare columns for multiple data types. Obviously in-elegant, but clever from a performance perspective, as we will see later. This can be further extended to have spare tables with spare columns.

If you are developing a schemaless platform of type 1 (above) then choosing a NoSQL database may make sense. NoSQL databases are diverse in how to model the data and allow querying of it, unlike relational database products which are quite similar to each other. The choice of the right NoSQL database is out of the scope of this article — but the checklist here applies to them as well.

While for the platform of type 2 you may need to make the decision based on how schemaless vs schema full you are required to be. That is, for what percentage of your product, is the schema known? We cannot prescribe but hope to provide enough details in this series to create a checklist that helps in making a decision.

Management of user schema

While your platform is schemaless, it doesn’t imply that there is no schema. There is almost always a schema. It is with your user. Hence, all schemaless platforms need to provide the ability for the user to define their schema and for the platform to store and serve it. The success of schemaless platforms depends a lot on how simple one makes the process of defining and managing the user schema.

The user-defined schema consists of the same elements that software developers have always dealt with, except they are now in the realm of the users. In relational thinking it would be:

Entities/Tables
Fields (with name, data type, behavior)
Relationships (one-to-one, one-to-many, many-to-many)

Similarly, in document/object modeling, it will be — Aggregates, Objects, and Fields.

Elevating these concepts into the realm of the users where they can use them via a GUI, is a difficult design problem, which has not succeeded at scale. It is tackled in two ways, both of which are tradeoffs:

Service providers for schemaless platforms—They become the “users” and set up the solution for their customers. Most such products have an ecosystem of “product-implementers”.
Less featured schemaless platforms make the tradeoff by keeping their platform anemic, by avoiding some features like entity relationships, field-level behavior. These are the most difficult concepts for a non-technical platform user.

Lack of standards — Even though there are many schemaless platforms, this space has lacked standards for defining user schema. Perhaps there are not enough platforms, that will warrant an emergence of standards. The platform users need to learn to define their schema for a new product, every time. XForms is one such standard but it has not seen wider adoption and neither it provides a specification for everything in the schemaless platform world.

Promoting user schema through environments

When you publish a Google Form you are basically deploying the solution from a “development environment” to the “production environment”. All schemaless platforms have to eventually support this process in their product. The more feature-rich user schema the platform supports, the more complex it is to implement the deployment. Products may also be required to support multiple logical environments like development, test, staging, and production — through which the user-schema can be promoted.

At the core of the solution to schema promotion, is the ability to maintain multiple versions of user schema (one per environment) and to merge one version onto the other. Anyone who has written code for merging objects, or version control systems :-), will appreciate the complexity. You can simplify a bit by supporting merge paths in only one direction, i.e. from development to production, and not paths like production to staging (for such scenarios one can simply implement a delete followed by copy).

Technical tradeoffs in schemaless platforms

Schemaless platforms, not surprisingly, are not a silver bullet. There is a price to be paid for going schemaless. Schemaless platforms succeed in scenarios where they keep this cost low. Let us look at the technical issues that emerge with schemaless platforms, to help to make the tradeoff. This section is not about, consumer-targeted platforms like Google Forms, but scenarios where custom development is a real option.

Reporting tools

Relational databases schemas are quite standardized. This allows for reporting tools like Metabase, Tableau, and others to provide numerous features because they can decipher and create an internal model of the user’s schema, automatically by using metadata maintained by databases. With schemaless platforms, we lose these benefits. These reporting tools do not understand EAV, JSONB, and NoSQL very well (standardization could help here too). There are techniques that can be employed to get around some of these issues, as we will see later in this series — but they require additional work. Overall, schemaless data models lead to complex queries and worse performance comparatively.

Database level checks

The database constraints like foreign-key, unique, not null and custom constraints cannot be taken for granted anymore. Depending on the approach taken one may have to give away one or more of these. For example — in EAV, JSONB you cannot implement null/not null and unique constraints for user-defined fields. In JSONB you cannot do foreign keys as well. You have to handle these in the code but they work less well because:

the database can also be updated directly (via migrations, and data fixes)

database checks are far more efficient

code can have defects in how it implements these checks

Data migration on schema change

In schema full applications, the schema change and its associated re-arrangement of data are handled by the programmers using SQL (with flyway, Liquibase, etc). In schemaless platforms, supporting the change in user schema over time is simpler to implement but performing data migration to the new schema is tricky.

For example — adding a new field is simple. But adding a default value for that field (like column default) is complex to implement in a performant way. Similarly, scenarios like moving a field from one entity to all its children, keep getting more and more complex to implement. Most schemaless platforms shy away from implementing these features because they may have to implement complete SQL via GUI — which is a project in itself. So when required, such problems are resolved by contacting technical support.

Published On: 17-Feb-2020

Author: Vivek Singh

Open-source products offer a technology evolution path and help avoid risks in technology projects

2020-11-18T10:11:27+00:00

There are several uncertainties with technology endeavours of nonprofit organisations. They pose significant risks of wasted resources, inability to evolve with time, and obsolescence in the long term - to nonprofits. Open source products offer a sustainable path.

Uncertainties faced by non-profits

Beginning
- To what extent can your organisation adopt technology sustainably?
- If it is a public system (i.e. not to be used by the organisation's staff) - what would be the uptake of the system?
Medium-term
- The context in which you plan to use technology can change because of government decisions, the situation on the ground, etc?
- Your own organisational uncertainties (e.g. funding outlook, attrition of key employees) can have an impact.
- On scaling do some unforeseen risk come to the surface?
Long term
- What happens, even if one is successful with all of above but the technology becomes obsolete?

For an organisational leader to make investments in technology so that one can reap benefits on the other side - depending solely on organisational wisdom seems unsatisfactory. We believe that open source* products offer compelling tool against these uncertainties. So how does it do this? Let us see the key characteristics of open source products and how it helps in each of the distinct phases.

But the core, may be counterintuitive, idea is to not worry about procuring the exact technology solution at the beginning so that one can continue using it in the distant future.

Key characteristics of open source products

Open source products are designed to provide the common functionality needed by the sector (and sub-sector) and allow for customisation of what is specific to one organisation in how it works.
They are modularized such that certain components can be replaced with bespoke solutions while retaining the rest of the product. For example, Moodle allows one to replace the mobile app or the student-facing desktop app or reporting platform so on - while retaining the other components.
Open source products track the sector and the technology evolution and evolve themselves continuously.
They can be used by cloud providers or be self-hosted.

Open source products provide 90-100% of what you need out of the box

Starting on a new technology project

In this phase, beginner's risk must be disproved - else a lot of invested resources could go waste (as it does often). Here open source allows you to get off the ground quickly and cheaply via customisation option - so that one can test out whether indeed it is a good idea. Even though the software can be evolved quite quickly, our intuition tells us that the first version must be really good and do everything. If we have such expectation then we will be disappointed - although there is no real reason to be since we are trying to uncover the risks.

Medium-term

As we have successfully demonstrated the value and feasibility - in the medium-term we would like to scale at the pace suitable to our organisation. This could imply that we may expand the scope of software or improve the user experience. If we want to expand the scope there are likely existing features and components in the open-source product which we were not using earlier now can be used. This is because other organisations in our sector may have also required the same. Additionally:

we would like the mobile app to be more specific to our use case so that it is easy to scale to a large number of users
we may want to improve the user interface of a component to look specific to our needs or branded as per our organisation
we may want to add a new component not available in the product

The flexible, modular and extensible nature of the open-source product allows us to achieve this at a reasonable cost without having to rewrite everything.

Long term technology risk

In the long run, new ways of using the software and mainstreaming of new capabilities may emerge - unforeseeable at the start of the project. What has happened in the last ten years namely mobile, social media, WhatsApp, the cloud, may continue in the future - with a voice interface, chatbots, AI, beneficiary facing software becoming ready to use widely.

If we take bespoke solutions route we will have to keep developing them in future as well. In the long run, bespoke solutions usually get thrown away and new ones are developed - because the software provider changes or no one can understand the software program to change it anymore. But open-source products keep evolving on their own without us having to do anything - since they cater to the wider sector and someone or other is looking for new things. e.g. When mobile app became mainstream moodle** added this component to its portfolio.

Open source products protect you against technology obsolescence

Conclusion

The benefits of open source products can be seen in the short to long run. Note that it is in the economic interest of custom software solution providers to develop bespoke solutions for every customer as they are larger in the ticket size. But in our experience, it leaves non-profits worse-off in the long run and doesn't cover for risk in the short and medium-term.

* most products with widespread use in the social sector are open source

** we have used moodle as an example but same can be said about DHIS2, OpenMRS, WordPress, ODK, too.

Published: 18-Nov-2020

Author: Vivek Singh

Difference between software platform and bespoke solution. How to make the choice?

2020-11-04T09:37:54+00:00

In the last few years, we have suddenly seen a spike in the usage of the term software platform. All types of software applications started getting called platforms. But the term platform has a specific meaning which differentiates it from technology/software applications. A platform enables other software applications to run on it, mostly to be used by end-users. In other words, we could say that a platform by itself is useless to the end-users unless it is programmed or customized for specific usage. WordPress, Moodle, Google Forms, ODK, Avni - they are all platforms. Google Docs, MS Paint are not platforms as they are mostly consumed directly by the end-users. Though the ease of customization can blur the lines between the user and the customizer and the same person may be playing both roles.

It was important to establish this distinction so that we can answer the more relevant question - how should we decide when to use a platform and when to develop a bespoke software application? Bespoke means custom-made—made based on the specifications of the person ordering it, as in a bespoke suit (from dictionary.com). The diagram below shows two distinct scenarios.

The essential tradeoff between these two options is cost, time, risk, and requirement match. A list of tradeoff involved are as follows:

	Bespoke application	Customization on platform
COST, TIME	Requires software development hence high costs, time	On the right platform, an order of magnitude is smaller compared to a bespoke application.
RISK	Uncertainties in quality being poor, cost, and time overruns of the software development undertaking. Often getting all three right has been difficult for most projects. Finding the right technology development organization (important to consider that domain expertise doesn't travel from high resource to low resource context)	Quality can be pre-determined. Platform quality matures over time. The extent of requirement fit offered by the platform has to be determined at the start itself, which could be technically challenging. The roadmap of the platform is determined by the third party. Depending on the platform roadmap too much, could be a double-edged sword. On one hand, you get features for free. On the other, they may take longer to arrive than expected.
Requirement Match	The end solution will meet all your requirements. In control of your destiny for subsequent development, albeit it will continue to require resources.	Cannot expect a 100% match, but one can aspire and achieve 90-95% in most situations. A solution developed on platforms, with the platform-provided user interface, may require higher levels of training for end-users.

Let us look a bit deeper into what this means more specifically for community and field programs. Such programs lack the following resources.

Funds available to use for technology deployment
In-house technical skills to develop such solutions

Hence the best solutions are those which require fewer funds and can be managed with technical skills available within the organization, while largely satisfying the requirements. With these key parameters of evaluation in mind let us see the matrix below which presents various scenarios and our recommendations.

Whenever our functional requirements are commonly shared by a large number of people in various disciplines - the widely available consumer platforms can serve such needs.
When additional features are required that are specific to the domain of work, we have two options. Either use a platform or get the software developed in-house (via a software development partner). Unless the features required are not present in the platforms available, going for bespoke software development incurs much higher cost - with the same result. Additionally, platforms afford you the ability to perform customizations in-house as they require much lesser software development skills. Hence you are less dependent on your software partner (for certain customizations one may still require a software vendor's help, but with good platforms, they are less and keep getting lesser over time).
When one finds oneself in a situation where only very few functionalities required are missing the platform - which is quite often the case. One can choose the bespoke application route or use the platform without the missing functionality. We have observed that there is an additional route available with open source platforms where one of the following may work out.
- You may check the feature's availability in the product roadmap of the platform. Open source products share their roadmap publicly.
- You should connect via the community route and work out a mechanism to add the functionality to the product roadmap or even get it done by paying a small amount (which may be still much lesser than developing a bespoke application).

ps: Lastly, the question mark space may be of academic interest to some. Why does such a blank space exist? As we understand, it is difficult for platforms to move up the feature ladder without moving right in customization complexity as well (and vice-versa). Hence where a platform places itself is a strategic tradeoff made by the organization/people behind the platform. Overall one should always expect the blank space.

Author: Vivek Singh

Published On: 04 November 2020

Why your field team procrastinates data reporting

2020-10-22T06:47:23+00:00

Context

In most field programs there is a time lag between data collection and data reporting - since the reporting is expected only periodically. This collected data is compiled manually by the field staff at the end of the period and reported up in the hierarchy. There are two types of compilation - aggregation and data filtering. Some examples of aggregation done are calculating the number of new houses registered, the number of people tested positive for a disease, so on. These aggregate data elements to be reported periodically may run into dozens. When the reporting formats are at the intervention unit level - e.g. name, age, gender, caste, religion, APL/BPL status of the newly registered. Here the field staff is essentially reporting sub-set of the fields s/he has collected for each intervention unit.

It is possible that the data entry is done by the field staff themselves at the end of the period

Reason for procrastination

While, the field teams in the social sector enjoy working with their communities, servicing community members, meeting their own program targets - but they do not particularly like entering data into an excel, paper, or a data entry system (aka MIS) - the way it happens. Hence, it is not uncommon that one of the tasks of program coordinators is to followup with field team about data submission - at the end of their reporting period. On the other hand, the funders and senior leaders do want to know about the exact impact being created - and data has become the primary evidence and communication tool. So, nonprofit projects/programs face these two opposing forces, looking for solutions.

Before we think about potential solutions to this, we must look deeper into the issue itself, by asking a very simple and obvious question - why does the field team dislike entering data in the first place? The answer is not very difficult to discover. Some of the common root causes are as follows:

The data compilation and entry is monotonous work and time-consuming.
The person performing above doesn't experience any benefit from it. As far as they are concerned the data compiled and submitted by them pretty much vanishes in ether, never to be heard about again.
He/she is stressed about making mistakes while doing such work. This is not good to experience which people want to go through - hence one instinctively tries to procrastinate such types of work, as long as they can.

Our experience suggests that these issues are recognised by all level of field workers, not simply by the more skilled ones. Even community workers at village levels experience this, quite naturally. It’s just that in most cases, they hesitate in expressing their emotions (because of social hierarchy) - but upon genuine prodding in a safe-environment one can discover the same reasons at play, quite easily.

But if instead of empathising with these circumstances, if the organisation’s response is of economic reasoning, like:

but funders need the data after all so what to do
without funds, there would not be any fieldwork in first place

...then it creates a no-win situation and the status quo continues. Field team procrastinates, data is delayed, data is of poor quality, to cleanup data more time is spent - burning down everyone involved.

As we can see from our root cause analysis that procrastination and its direct effects present only half the picture. The harm and underlying missed opportunities it hides are far more serious issues for our programs - worthy of our attention.

Solution ideas

The root causes listed above themselves, point to the solution. In most cases, a "happier" system looks like the following.

The data capture happening as part of the users’ workflow instead of being batched together in a separate activity, to be done later. The crux of the solution here is that 1 hour of data entry per day, gets broken down into 30 instances of 2 minutes each. These are for 30 clients which field worker is interacting with, on that day. The data capture work mixes and pleasantly disappears with other tasks one is doing (as elaborated here more).
The software has functionality that has direct benefits for the field user - unlike a black box system in which fieldworker only adds data but never gets back anything useful. e.g. community health workers can view critical medical information of their client from their history; in agriculture, community resource person can provide information to their client which are generated by calculating based on previous data about land/inputs/micro-plan. A few other sample benefits are (there are many):
- Software system maintains users work schedule based on their client data.
- The software provides reports to the field user based on their work. e.g. the number of clients they have supported, outcomes achieved, and so on. This gives a sense of accomplishment which is highly important.
The software aggressively validates data and performs computations for the user. One comes across numerous programs where the user is burdened with tasks that are much easier done by software. These software systems consider validation to be merely about mandatory, character size, number range etc. This is simply lack of imagination. The system can enforce much more useful validations like:
- The client cannot be born before the date of their registration
- A male client cannot be pregnant. A child cannot be older than his/her mother (or only 5 years younger).
- One cannot produce more than certain kilos of wheat from an acre plot size.
This may not be possible everywhere, but if the software system can go even further such that in monthly meetings (which is a common practice), the fieldworkers can present their data using the software itself - it can have quite an empowering effect. We have seen it in a few places, this needs to be tried more.

We highly recommend doing 1 and 2, at a bare minimum. 3 is also important if you find yourself spending too much time in, what is referred to as a necessary activity, called "cleaning of data" or "validating data". In fact, these are unnecessary tasks and quite stressful because human beings are not good at these tasks.

In conclusion

The fundamental idea that transforms from the state of procrastination to involvement is to make the fieldworkers productive and involve them fully in the data system. Most importantly it will improve the quality of service provided to clients. Finally, the resulting data flow looks rather unimpressive in the diagram :-). We also recommend reading this article which covers issues around transitioning from paper to digital.

Author: Vivek Singh

Published: 22-October-2020

(Icons thanks to icons8)

Designing systems that use paper and technology together

2020-10-13T06:06:29+00:00

Many NGOs are thinking of adopting software technology for their data management and reporting (instead of paper) - for their community/field-based programs. When evaluating such projects usually there is a lingering doubt about the extent to which technology can be used in these programs. What is the anthropological and economic feasibility of doing so?

In this article we have:

created a checklist of issues to help you in doing an assessment of the profile of technology that can be rolled out.
done a classification of different data storage options
provided one example of how one can think about using hybrid approaches. Since almost all such technology solution deployments use multiple data storage options i.e. they are not fully digital or completely paper-less.

Checklist of issues to consider

Infrastructure
- Electricity availability - There is a bare minimum hour of electricity required so that the devices used can be charged. The minimum hours depends on the extent of usage you expect.
- Electricity quality - The electricity should be such that it doesn’t harm the devices (via voltage spikes for example), requiring very frequent maintenance or replacement. Gathering anecdotal evidence from each cluster may be sufficient to determine this if one is not sure.
Anthropological issues - Many of the village/slum or even higher level community workers are not fluent in using smartphone apps. They still don’t have a smartphone of their own. But before you get disappointed by this and make a decision, it is better to look deeper put people into four groups, look at the breakup, and then decide - based on what the breakup looks like.
- Community workers not literate in the reading/writing in the local language
- literate in written language but not used to smartphone and probably not trainable yet
- literate in written language, not used to smartphone and but trainable
- literate in written language and use smartphone already
Cost
- What will be the total cost of mobile devices, data connections, repairs, and replacements?
  - This can be easily calculated by using excel.
- If you will be dependent on the external trainer to train your staff in technology usage, then the induction of new staff will require training as well. In our assessment train-the-trainer approach works quite well and hence we feel that attrition/replacement is not an issue other than adding one more item to your training curriculum.
- How much value you will derive by spending on technology per field worker? What is the percentage value of this cost compared to all other costs associated per field worker?
Data strategy - Please note that you may be dealing with different types of data and your issues may not be relevant to all types of data.
- Is it time-consuming to enter data into a mobile app? Yes, the speed at which data can be entered into mobile is slower compared to paper. But this tells only half the story. One should also factor in how much of the client interaction time for the community workers is in entering data versus talking, listening, thinking, travelling/switching between clients. Usually, it is a very small percentage and even a two-fold increase in data entry time has negligible impact on the overall client interaction time. In our view, you should not factor this point in your decision making - unless the data being entered is free-text notes. Free text notes, above tens of words, could start adding significantly to interaction time, also be quite error-prone, and frustrating for the user.
- Should one keep backup of data that is entered in the mobile app in paper records - otherwise how would the fieldworker access data if the mobile app is unavailable because of mobile breakdown. Do consider the following factors:
  - Do you have beneficiary retained records? If yes, do they serve as a backup in need?
  - How many outages of mobile phone are probable in a year?
  - How much service outage of the community worker, for their clients is realistic?

We project that you may get a non-binary picture even after considering all of the above. But hopefully, now you have more concrete data to base your decisions on. To further deepen our understanding of the utility of the technology, let us compare various types of data storage and access mediums.

Types of data storage

There are three types of data storage options - paper, paper scanned (or unstructured digital), and structured digital. When thinking about data management, it is important to view paper and scanned paper also as a data tool so that we can do comparisons and understand tradeoffs.

The term structured is applied from the perspective of whether computers can fully understand the contents. Unstructured to structured is a spectrum not a binary as computer’s ability continue to improve. For the generally available “mainstream” technology, we would consider it to be binary though and assume that computers can tell us very little about the unstructured content in a practical sense (practical being the operative phrase here).

1. Paper

Advantages

Everyone in your organization knows how to use it. It is an implicit entry criterion to an organization which we don’t have to even think about.
Paper is flexible and efficient to use for large data-input and reading it. Beneficiaries record many a time have a lot of information like health record, agriculture plan. e.g. One of the main reason digital has not replaced health records is for this reason.
Paper has almost no upfront or one time costs.

Disadvantages of paper

Mobility - Data on paper moves slowly, i.e. it has to be moved physically to another person for them to access. This restricts several useful applications of such data related to information sharing, speed of availability, timely action based on the information and so on.
Paper records cannot be "queried". One cannot ask a set of paper records any question and get an answer unlike digital. e.g. One cannot ask patient records room in a hospital “how many patients have diabetes and hypertension both” - and get a count as an answer!
Paper quality deteriorates over time. Paper records can get lost. One can get around both by making copies of the paper, though.

2. Paper scanned

This gets around two of the disadvantage of paper - mobility and quality deterioration. Mobile technology is good enough to perform scanning of paper and can be transmitted easily over data networks (though it does require slightly better networks than transferring structured digital records which are quite a size efficient).

3. Structured digital

This does away with all the disadvantages of paper mentioned earlier. Although it has a few disadvantages of its own - which we have discussed in the checklist of issues in first section.

Example of a hybrid approach

Most community programs that employ software use paper also. In a way, they are hybrid even when they are using technology extensively. Since you as a reader of this article may have already come across how mobile applications have been used in community programs, we restrict ourselves here to discuss one uncommon but an interesting approach which may help you think in creative ways about getting around the issues we discussed at the beginning.

Paper records scanned but with associated structured digital metadata

Context

Village water survey has to be conducted and submitted to the block level office.
Block-level office checks the survey and approves it.
This water survey record is of interest to the wider community.
There is no requirement to perform data analysis of the water surveys from different villages, except for a few data elements like village name, date of survey, duration of the survey, etc.

A solution

Village resource person (VRP) surveys using paper forms.
VRP scans the paper forms using mobile and submits into a software system.
Along with scan of paper VRP also fills a small structured form on the mobile app with information like date of survey etc mentioned above.

The salience of this solution

Our technology instinct will take the paper forms and create mobile forms and VRPs fill the mobile form with hundreds of data points. But in this particular case, there is not much value in having all the data points as structured data. Secondly, the user will use this app only once in a year hence getting trained in using a more complex app is not required.
Few key data points captured in the structured form allows for the basic analysis to be done. These structured data points will also allow for searching of the water surveys online. Once found, the scanned images of water surveys themselves can be viewed.

Conclusion

Ultimately, we use technology to solve problems. Deployment of technology without considering its feasibility and its impact on the problems is pointless and wasteful. At Samanvay we brainstorm with our nonprofit customers and come up with solutions that use paper and software together. As organisations and the context of use evolves, the solution usually evolves to using more software steadily.

Author: Vivek Singh

Date: 14-October-2020

Architecture of community program organisations and their data systems

2020-10-13T06:05:32+00:00

In this article, we create the foundations on which the rest of the publication can be developed. Some of the fundamental concepts are explained here in a generic form which will be referred to from the rest of the publication. It is also an attempt to develop a common vocabulary that hopefully makes us more efficient in communicating with each other.

About community program organisation and its beneficiaries

Unit of intervention

This is the most fundamental unit towards which the activities of the social program are directed.

Client
Household
Group (self-help group, village committee)
Facility (hospital, school, water source, farming polyhouse etc)
Location (village, block, cluster, etc)

When the unit of intervention is facility or location the activities are aimed at improving them in such that its members ultimately benefit from it somehow.

Organisation levels in a community service intervention (or program)

Service or intervention target - Client, household, community, group
Community/field workers
Technical specialist
Field-based coordinators/facilitators/supervisors
Program coordinators/managers
Funding organization(s)

Please note that 2-4 may not be the same as the legal organisation but that is not of much interest to us - in order to understand the logical structure of the community program organisations.

Let’s take two examples one from health and another from an education program and apply the above. The concepts explained above have been highlighted in bold in the diagrams.

In our attempt to develop this knowledge base we have generalised concepts for all sectors and all type of community programs. While this is consistent with 75- odd programs we have come across - we would welcome feedback if your program differs from the ones we have described here.

We would also like to state that we have excluded programs which perform periodic activities like health camps, disease screening, one time surveys, etc. Having said that most of the concepts discussed here and later, we believe, may still be of interest to you.

Schema of data managed in community programs

Input/source data

Input data is the data which organisation members add to the system.

Longitudinal data

When one or more service providers, collect information about a unit of intervention, over a time period - the entire dataset about the unit of intervention is also called longitudinal data. This type of data is of most interest to us. There are two types of longitudinal data we may have.

Observational data - This could be for establishing the intervention units’ status at a point in time, e.g. baseline, midline, end-line data.
Service data - Data on service provisioning like individual’s health record, farmer’s agricultural activities/intervention record. The diagram below illustrates two such data examples.

Cross-sectional data

It could be also referred to as survey data. It is the data collected about multiple entities (like intervention units) at the same point in time. Since they are collected at the same point in time, they are not multi-level and rich like longitudinal data. In the diagram above for the water body, if the desilting program and water quality program are absent and the organisation is performing only annual surveys on water bodies, in the month of April, in a district - then that would be like cross-sectional data.

Event data

These are like longitudinal data generated over time but without belonging-to or applicable-to an intervention unit. This could be the activity data logged by the community service provider like transportation record, supply received, expense reports, so on.

Sometimes event data can be about the intervention unit but there is no established trusted identity of the intervention unit. This is usually either because establishing the identity is difficult, cost-prohibitive, or not useful. e.g. this could be the list of people who feel sick of Dengue in a given season, in a water and sanitation program which is more interested in the maintenance of water bodies (in epidemiology it is called line lists).

Finally, in case you are wondering, we have not covered the structure of supporting data like master data and metadata (e.g. question-answer data, answer options, etc). These are generally well understood since they have very broad application and not just community programs.

Input data in an organisational context

When the above data is overlapped with service providing organization then we can see the following classification of data.

Beneficiary retained record (e.g. health card). This is a record which is maintained by the intervention units themselves and could be longitudinal in nature.
Intervention unit service record, maintained by community worker for each intervention unit (e.g. individual-level health record, student’s record, household’s livelihood data, water source monitoring record).
Service monitoring record could be same as intervention unit service record or could have fewer data elements based on relevance to the field coordination. This is for the purpose of project monitoring and execution planning.
Service management record could be maintained by any/all levels of the servicing organization. These records are not specific to any beneficiary, although they could be linked to multiple beneficiaries sometimes. e.g. transportation details of a community worker, allocation of certain assets to multiple intervention units, training sessions attended, etc. This is of type event data.
- It could also be data related to financial accounting which should not fall under this classification as they have independent standard structures.

Output data

Output data can be completely generated from the input data. It is of three types.

Service indicators consisting of derived indicators from the underlying data (1-4 above). They are derived and maintained against aggregation dimensions. E.g. number of children vaccinated (indicator) in a month (time-period) in a village (location) from the CSR XYZ fund (funding source).
Outcome/output targets created for assessing the intervention/project. These are for measuring achievement and not for service delivery. e.g. Target for number villages reached, self-help groups formed, total-patients-cured.
Insight data is gets created by running a domain-specific computation on the input data to derive some actionable insights. e.g. getting a list (or count) of children who are not gaining weight with age.

The annual work plan which is maintained by program managers can be thought of as data sets consisting of service indicators by dimension and targets. Similarly, a comparison of service indicators against organisational or external standards is another type of data set used to evaluate output/outcomes of such programs.

Measurement dimensions of output data

Social intervention programs capture the data to provide services and carry out their activities. Along with this, the input data is also important in understanding how their intervention is performing. This is usually done by tracking a set of indicators over certain dimensions. The indicators can be like, number of SHG’s formed, number of handpumps installed, so on. These are usually done along the following dimensions.

Location hierarchy - e.g. village, block, district, etc.
Period - e.g. week, month, quarter, season, so on.
Organisational entities - Funding organisation, NGO Partner, Projects
Intervention unit's attribute classification - these dimensions are derived from the values of attributes of the intervention unit. e.g. age group, gender, caste, religion, water source type, school-level (primary, secondary etc).

These are also usually additively rolled up the hierarchy, period, and entity - but not necessarily. The number of handpumps installed in the block is a sum of handpumps installed in villages under it. Similarly over period and entity. This, rolling up along the dimensions, is an important attribute of the output service indicators data.

Author: Vivek Singh

Published: 14-October-2020

(Icons thanks to icons8)

Resources

Benefits and constraints of shared software teams

Shared Software Team

What problem it solves and how

Tradeoffs and constraints of Shared Team model

What one customers and the software team do?

What one should avoid?

Other relevant articles

Credits

Planning for security testing of open source projects

Scope of testing

Isolate important systems from the blast radius

Process

Negotiation

Software volatility is a good thing

What do we mean by software volatility

Low volatility system

Why you must not take volatility for sign of a problem

Product integration as a solution approach

1. Find a product that does most and extend it's scope

2. Develop a custom solution

3. Use two functionally complimentary products

4. Integrate two functionally complimentary products but create a new user facing app

REST API Pagination and race condition

Consequence

Solutions

Conclusion

Factoring software codebase size and complexity is most important factor to consider when taking ownership of generic open source products

Measuring and scaling community engagement

Context

Idea

Technical Implementation

Making Gunak - enabling journey from access to healthcare to quality of healthcare

Background

About the program

What Gunak does

Main user components of Gunak

Deployment Architecture for Low Resource Contexts

Evolution of public system from reporting systems to point-of-work systems

Domain expertise doesn't travel from high-resource to a low-resource setup

Taking stock of the state of in-house lightweight data analysis

Functional considerations for integrating community health and hospital systems

The challenge of building technology team in nonprofit organisations

What does it mean to use open source?

Conclusion

Factors to consider when procuring mobile devices

Form factor

Tablet or Mobile

Personal device or work only device

Electricity

Backup devices

Why it is difficult to implement a useful national personal health records system

Where our important health records are stored?

Let's examine the technology solution space

Realising the value of personal health records

Technology architecture for public health service delivery and why designing system for PHC is difficult

Rural hospitals — District Hospital (DH) and Community Health Centres (CHC)

Front line health workers

Primary Health Centres (PHCs)

Creating a cost-effective multi-tenant platform

The frontend server

Database

Middleware design

Multitenancy in Avni

Schemaless platforms

Making schemaless platforms

Technical tradeoffs in schemaless platforms

Open-source products offer a technology evolution path and help avoid risks in technology projects

Difference between software platform and bespoke solution. How to make the choice?

Why your field team procrastinates data reporting

Context

Reason for procrastination

As we can see from our root cause analysis that procrastination and its direct effects present only half the picture. The harm and underlying missed opportunities it hides are far more serious issues for our programs - worthy of our attention.

Solution ideas

Designing systems that use paper and technology together

Checklist of issues to consider

Types of data storage

Conclusion

Architecture of community program organisations and their data systems

About community program organisation and its beneficiaries