Search Engine: Centralizes, integrates, and facilitates the search for data on public contracts and contractors. Monitors procurement processes and alerts about potential corruption risks.
Learn: Studies public procurement processes, including the legal framework, ongoing policies, their implementation, and results. Additionally, it offers recommendations based on the findings of its evaluations.
To achieve the integration of public data, Sembrando Sentido has developed an information system that collects data from a series of public records, starting with the Puerto Rico Comptroller's Office Contract Query page. Using a series of scrapers programmed in Python and relying on libraries such as BeautifulSoup for data extraction and RapidFuzz for match processing, the system identifies and extracts information about contractors from the Corporations Registry of the Department of State , the Electoral Comptroller's Political Donor Registry, the Lobbyist Registry, and the Beneficiaries of Tax Decrees Registry.
These scrapers automate the navigation and extraction of data from websites, then process and match them with information from other records. It is crucial to identify the contractor in the Corporate Registry. If a unique match is found in this registry, the general data is extracted and integrated into our database. In case of multiple matches or if none are found, the system employs fuzzy matching algorithms to try to identify the correct corporation. This phase differs for individual contractors, as they are not listed in the Corporate Registry and therefore have less additional information.
In cases where the system fails to identify the corporation, the Sembrando Sentido team and volunteers conduct thorough manual research to find the match between corporate and individual records. Additionally, the system classifies entities by type (such as government agencies, corporations, individuals, etc.) using developed parameters to facilitate their analysis. For matching in the Political Donors, Lobbyists, and Beneficiaries of Tax Decrees registries, the system compares the names of contractors or their officers with the names registered in the respective registries. If an exact match is found (excluding the middle name), the system extracts the information and labels the contract as 'Donor,' 'Lobbyist,' or 'Beneficiary' in the database.
This system creates a database that integrates, centralizes, and enriches the data, facilitating its presentation and analysis.
The beta version of Contratos En Ley is just the beginning of promoting clear, complete, accessible, and integrated public information. Currently, we are working to develop rigorous evaluations of public contracts, create a contract monitor, and expand the search engine to extract decentralized public data from over 56 government portals.
We are striving to share more information, make it completely public, and standardize it according to the Open Contracting Data Standard. We are also working on an API for those who request direct access to our database.
At no time does the scraper access confidential information, manipulate the exposed data, or seek to cause any harm to the mentioned Public Records or the data exposed in them. The data extracted originates from, and remains as, it is presented in the mentioned Public Records without any changes. We are not the official or original source of this data, and it may contain undetected errors or omissions, as well as mistakes in data matching.
For this reason, we cannot guarantee the quality of the presented data, but we do provide a mechanism to report errors, and we make ourselves available to verify and present corrections as possible. Even so, since we cannot guarantee the quality of the presented data, the user chooses to consume the data at their own risk.
We are not, and will not be, responsible for any damages or losses of any kind resulting from the use of the presented public data. Additionally, although we have a robust security system, we cannot guarantee that our system's servers will not be affected by viruses or improper interventions that may affect the data or the continuity of the Search Engine at any time.
Our goal is to make information about public contractors and public contracting processes more complete, integrated, and widely available. However, we must ensure that we continue providing the service for the benefit of everyone, which requires us to establish certain restrictions on the use of the site.
The absence of a robots.txt file does not mean we allow arbitrary scraping, as it can place an excessive load on our servers. If our system detects suspicious activities, such as hacking attempts or damage caused by a user, the use of our Search Engine may be prohibited under certain circumstances. If you wish to obtain direct access to all the data in our database, please contact us: imasses@sembrandosentido.org.
We obtain the information in our databases from the government and other sources through various means, including directly from government websites and APIs, as well as integrating publicly available data or through information requests. We dedicate significant time, effort, and even money to obtain these data and turn them into a viable and highly usable resource. We do not claim any rights over the data we receive from government sources, and we attribute them whenever possible. However, we do claim rights over the evaluations we conduct and expect respect for the same.
We appreciate citing the information about our platform or evaluations when any work directly relies on the work of Contratos En Ley. The work of Sembrando Sentido, through its project Contratos En Ley, is protected by Puerto Rico intellectual property laws and a Creative Commons license. For any questions or requests, write to us: imasses@sembrandosentido.org.