Data and Public Policy In a New Era

Artificial Intelligence – Machine Learning (AIML) and ‘disruption’- we can’t keep them away from most discussions anymore. There is hope and despair, often expressed in the same breath. The case of disruption is real-and it is often on the foundation laid by AIML, particularly affecting the industries related to service delivery. It leaves us often wondering-what about the most essential service provider-the Government? Would AIML also disrupt them? If it does how will it be? Is that desirable? Avoidable even? Or will it improve what the service provider is supposed to do-deliver more efficiently to most? Some answers are obvious-We are not talking of disrupting or toppling a Government, least of all, the one elected by popular mandate. But certainly, AIML leveraged well can be just the disruption we need to solve the service delivery bottlenecks faced in India. To leverage it fully, what needs to be done? As a starting point, let us acknowledge the importance of data.


Public policy needs to be data driven because it is objective. This is particularly true for India as the elected Government has to address various diverse groups in the society differentiated by demographic parameters and thus any policy announcement not only needs to be unbiased, it must appear so too. What is more objective than data? So far, most of the data used for policy making was generated through survey-asking respondents.  rounds. Specific programs were then designed and targeted at solving problems based on the data gathered. While the data gathering process has become more efficient over the years, some problems continue to remain. The enormous time that is usually taken to finish each survey round across India inevitably means that the data is available for analysis and hence any decision making more than two years after the process had started. Such lags can often render decisions based on such data ineffective. More seriously, data collected as such are often subject to Moral Hazard, that is the respondents may misreport if they anticipate such misrepresentation could be beneficial for them. Economists have for long tried to address the issue through designing appropriate contracts. However, in spite of their best efforts, the solution they propose is often the second best, and termed so even in the literature. Simply put, being able to form policies on a data that is not manipulated by the respondents will always give higher benefits to the society than any data that is subjected to moral hazard. This is where AIML, through their innovative approaches and usage of unstructured datacan help.  For example, estimating property tax using Satellite images of constructed area will be a far better and accurate proxy than sending boots on the ground to verify the same. Data from such images are real time and almost impossible to manipulate either by the respondents or by the inspectors. Citizens expressing dissatisfaction about local or national level Governance on social media often are far better proxies than when they same individuals are coaxed to respond to a set of questionnaires. While by now, the importance of such data and associated techniques are well appreciated, the focus is now rapidly shifting towards the role of the State in it.


In this setup, there are two major roles the Government has. The two roles of the  Government, that between the enabler and the regulator is needed to be minutely balanced when it  comes to using AIML for service delivery. A too conservative approach and we lose the opportunity to remove the bottlenecks that are prevailing and a too slacked regulatory regime, the damages could be irreversible.


The role of the Government as the regulator is mainly examined in the context of ownership, privacy and security of such data. While, these issues are not new, the importance of addressing them now swiftly is necessary owing to the process and the speed at which unstructured data is captured. Unlike the data that we have been using so far, capturing such data do not require consent from the individual. For example, we do not need the permission of an individual to calculate their square feet built up area if we use Satellite images. Further, most of the data is generated real time implying that any regulations on the process or the frequency of such data collected cannot be regulated. It would be futile to put restrictions on what data one can capture unlike in the case of questionnaires where one can restrict the types of questions that can be asked or even the individual may refuse to divulge sensitive information. Thus, the regulatory framework must address -what part of the data can be used, by whom, for what, when and why! The issues are far too complex to hope that we can codify laws prescribing do’s and don’ts. However, it would be worthwhile to separate the subjective elements from the ones which are objective. Trying to define ‘privacy’ may be one such subjective element and an exercise which we are trying to codify in details. It is doubtful how far will we go this route. After all, what constitutes ‘privacy’ may have different interpretations across societies, and then within it, across individuals. Perhaps focusing on anonymity would be easier to address and hence regulate. A structure that preserves anonymity of an individuals’ behavior and preferences are more likely to elicit true responses than that which doesn’t preserve anonymity. While the State needs to quickly put down a structure, an incremental approach is the order of the day. With a fast-changing scenario and new challenges being posed daily pertaining to data secrecy, one cannot hope to wait for long to come up with a law that would eventually address all issues. It is time that we address the anonymity versus privacy debate.


However, the more interesting role of the Government is one of the enabler. With the advent of AIML, the disruption in Public Policy is eminent too. We no longer need the Government to solve the problems all by themselves, nor do we need Governments to appoint expert committees to solve them. Using newer data and compatible techniques is the domain of experts in AIML. They merely need the Government to devise the appropriate platform to showcase the solutions. Using data from the drone or the Satellite image to map agricultural production and predicting harvest, doesn’t require Government to assign the task. The Government merely needs to maintain and open the data access for validating their findings, often with their own databases.  New ideas and solutions based on modern techniques will compete and prescribe the best outcomes without explicit guidance by the Government. The solutions from such exercises are customized to local needs. The main task of the State in this framework is to ensure that the data it has are regularly updated and most importantly consistent across its various arms. The biggest complain the Data Scientists interested in Policy related analytics is that the database of the Government itself ‘do not talk to each other’. To illustrate the simple point, it would be impossible to find any two Government records that have identical spellings of all the Districts in India! Getting the data on simple time series on any Economic variables are haphazard with too many missing years and observations. With the appointment of the new Chief Data Officer, one hopes that the office focuses on making reliable and complete data available to all, much more than collecting new data.


With new techniques comes entirely new approach to solve the problem. Let me end with the example shared by the panelists. An experiment was done in Delhi regarding areas that were ‘unsafe’ for women after dark. Women were given a simple device which they would click whenever they felt unsafe in an area. The data was collated and a heat map of troubled areas were determined. The list was the shared with the local SHO. This is what power of Data driven Analytics can do…identify problems where they are, even locally. But what about the solution? For the police to increase the ‘rounds’ of the beat cop in the identified area required clearing of bureaucratic red tape. The proposal therefore, although ingenuous could not be acted upon. Few hundred Kilometers way in Punjab , a similar exercise was conducted. Only this time, the solution did not require State intervention. Instead what was proposed was an encouragement for street hawkers to run their businesses in these dark spot. A classic example of complex problems being solved by data Science, by stake holders at a very local level, but a solution that did not require Bureaucratic nod. AIML enabled policy recommendations will work best when the State merely allows them to identify, design and solve problems.


Disruptions will happen in Public policy too. As long as the Government manages to balance its regulatory role and that of the limited interference enabler, such disruptions is what will augur well for India. AIML is based on information flow and processing it efficiently, any steps to curtail or structure it is futile.


Bappaditya Mukhopadhyay

Leave a Reply

Your email address will not be published. Required fields are marked *

About us

India is marred with a complex social, economic and political structure, which requires innovative solutions to solve the most difficult problems of today. India is also a land of opportunities despite its challenges, mainly due to its demographic dividend and cultural diversity. The Dialogue is founded with the vision of harnessing the opportunities present in India today by reinventing the policy and political discourse in order to drive a forward looking narrative for the country.