🚀 Launching the Zombie Tracker
We (CivicDataLab & Internet Freedom Foundation) launched the Zombie Tracker on January 15th, 2021. For the event, we invited Mr Zakir Ali Tyagi, to share his experiences after he was charged under section 66A of the Information Technology Act. Joining us for the discussion were Abhinav Sekhri, who wrote a research paper in collaboration with Apar Gupta on Section 66A and other legal zombies, and Sanjana Srikumar, who was a part of the PUCL litigation team that filed an application in the Supreme Court to ensure full compliance with the Shreya Singhal judgment.
As per our project plan for the first phase of the project, our aim was to develop and launch the platform within the first six months, using data from seven states. We ended up including eleven states, and it took us close to ten months to reach where we are today. Our work started in March, just before the lockdown, and it definitely contributed to the delay, but we were also working with a few assumptions that got highlighted once we started working with the datasets from e-Courts. In this post, we write about our experiences in building the tracker and the problems that we faced. We believe that the lessons we learnt in the process might help people who are working or will work with similar datasets in the future.
Note: For the scope of this research, our primary database was created using data from the e-Courts portal. While most of the points mentioned below should be applicable to a diverse set of datasets and databases for legal research, we’re writing this on the basis of our experience on working with the e-Courts database.
Geographical scope of the project
We had selected seven states for the first phase of this project which was a decision that we arrived at based on IFF’s experience of dealing with similar matters. Additionally, we also analysed the trends of cyber crime cases from the NCRB portal and checked a few responses shared by the government during the parliamentary sessions.
What we learnt
- We should start small and then gradually increase the geographical coverage of the research.
- For certain key data fields like type of case, section, etc, we found that there was a lot of variability, among states, in the way data was recorded on e-Courts This meant spending more time on standardising data before we could even begin with the research.
- Availability of older cases (cases that are registered before 2015) on e-Courts, varies a lot among states.
- This should be a criteria in selecting geographies for conducting the analysis. The level of case digitisation varies among states which might be a reason for the lack of data availability from certain states.
Selecting time period for data curation and analysis
We wanted to collect data from October, 2009, when the provision came in effect and kept Feb 15th, 2020 as the last-date for collecting data from e-Courts.
What we learnt
- The quality and the completeness of data has improved in recent years. We found more case records, more orders and judgements and better quality of information from 2015 onwards.
- Out of 236 judgements that we could find for disposed cases, the time period between Oct, 2009 and Dec, 2014 contributed to less than 10% of judgements.
- So, if our objective is to curate data points using the unstructured data from judgements, we should fetch the data for more recent years.
End dates (Date when the information was obtained from the portal) are important to mention especially when you’re including cases that are still pending as on that date. In the case of periodic data updates, the status of pending cases should be updated as well, in addition to updating the database with newer cases.
Fetching case records under section 66A
- Act - We selected Information Technology Act
- Section - We selected 66A
What we learnt
- Sections don’t have standardised values in the e-Courts ecosystem as compared to Acts. This is a burden on the researchers who have to standardise the section details . For our use-case, this step took more time than expected. We have also written in detail about the problems we faced here.
- Focusing on fewer states or districts could have helped in dealing with the standardisation problem. Generally, the problem gets bigger if we include more geographies in our analysis as that leads to more diverse patterns of dealing with unstandardised information. We should focus on how we can minimise the standardisation and pre-processing of data instead of focusing on more data for the analysis.
Our partners have also written about the data standardisation problem on e-Courts. You can find some reports here and here. This change, of standardising all fields for all case records on e-Courts, if implemented, shall take us a long way forward in improving the quality of information that is available from the portal.
Testing hypothesis and forming research questions
On e-Courts, every case record is supposed to have the following components:
- Metadata fields which capture the history of the case, important dates, details about the corts, judge, case type, etc
- Judgements (only for disposed cases)
- Daily Orders (for pending and disposed cases)
We started our research by forming a set of hypotheses and related questions that went on to form the basis for us to collect and process the data.
What we learnt
- We did not find decent coverage of orders and judgements for cases on e-Courts. We had access to only 236 judgements from 1,189 disposed cases. Daily orders were not present in most cases, and even if they were, there was hardly any useful information present within these orders.
- Non-availability of data points for conducting holistic research
- As we did not have access to FIR’s or to orders and judgements in most cases, it wasn’t possible to identify the stage of the matter where we could find the first reference to 66A. This is important to identify the causes for signal-failure.
- In cases of conviction, sentencing orders were not present
- In the cases that were disposed, we couldn’t find the cases that took into account the Shreya Singhal judgement while acquitting the victim[s]
- We could not figure out the total convictions in cases, if any. The field titled - Nature of disposal describes whether a case resulted in conviction, acquittal, dismissal, etc but we cannot use this field to identify the total number of persons convicted or acquitted in a case.
- There was no way for us to confirm whether our database of cases registered under 66A was complete.
- We could not find any official reports on the level digitisation of cases on the eCourts platform, which means that there could be cases that are registered under 66A but still not present on the e-Courts portal
- We had to remove a lot of cases where we could not identify or confirm the section under which a particular case was registered. This is because the section is a non-standardised text field (it does not contain fixed values) and for such cases if we don’t have access to additional documents, like orders or judgements there is no other way to cross-check and verify the section details.
Priorities for the next phase of the project
The team has already started planning the next phase of the project. Here is a list of things that we would like to do going forward:
- Increasing the geographical scope of the Zombie Tracker. Currently, we have data from 11 states and we would like to include the remaining states iteratively.
- Simplifying the course towards processing, standardising and visualising the datasets after we fetch the case records from the e-Courts portal. Modularising this process shall help us in involving more volunteers and keeping the tracker updated on a regular basis.
- Zombie Tracker is developed using open source technologies and we would like to follow open source guidelines and principles on building and maintaining the portal. To invite more developer contributions, we will need good documentation of our processes.
- We have identified a few areas that we need to work on to enhance the functionality of the tracker and make it more useful for advocacy. We will be writing a post to share data, tech and content related tasks where we will require support from the community.
- We’re working on releasing the processed datasets on the portal. We will also be working on a data sharing agreement that will enable us to share the raw data with researchers who can take this research ahead.
Our objective is to see the provision, section 66A of the Information Technology Act, not being invoked again in the future. This case study offers an opportunity to identify the issues of signal-failure and then work towards fixing them so that we learn from our mistakes and don’t repeat.
We appreciate the support from Tech4Dev in taking this work off the ground. To carry it forward and to further develop and maintain this platform, we need your help. Your donations will help us in continuously upgrading our tracker as well as ensuring S.66A is not invoked again and any pending charges are dismissed. Please check this link to know more about how you can support us in this endeavour.