Data Sovereignty and the Village: Who Really Controls the Data of Frontier Communities?

In a village school in Flores, a student opens a tablet to access an AI tutoring platform. She types a question in Bahasa Indonesia with the cadence of her local dialect. The platform answers. She learns something. And somewhere, on a server she will never see, operated by an organization she has never heard of, a record of that interaction — her words, her patterns of inquiry, her hesitations and corrections — is logged, stored, and potentially monetized.

She has no knowledge this is happening. Her parents have no knowledge. Her teacher has no knowledge. The school received the tablets through a government program. Nobody read the terms of service, because nobody explained they existed, because in most frontier communities in Indonesia, digital literacy infrastructure is insufficient to make such explanations meaningful even if they were given.

This is the data sovereignty problem at its most concrete. And it is, according to a growing body of interdisciplinary scholarship, one of the most consequential dimensions of digital equity — one that existing policy frameworks, including most EdTech regulation, have systematically failed to address.

Defining Data Sovereignty: More Than a Legal Concept

Data sovereignty is often reduced to a technical or legal question: who has custody of data? Where does it reside? Which jurisdiction's laws apply? These are relevant questions, but the research literature identifies them as the surface layer of a significantly deeper set of concerns.

Hummel et al. (2021), in a comprehensive review of the field, identify three core values around which data sovereignty discourse consistently organizes: control and power over data flows; inclusion and deliberation in governance decisions; and privacy as a baseline right, not a compliance requirement. Abbas et al. (2024) extend this framework through a social contract lens, arguing that meaningful data sovereignty requires not just technical control but legitimate, community-endorsed governance authority over what data is collected, for what purpose, by whom, and under what conditions of accountability.

Rolan et al. (2020) push the concept further still, explicitly linking data sovereignty to digital equity — positing a future in which data is treated as sovereign to individuals, families, and communities, not as raw material to be extracted by platform operators and converted into commercial or institutional value. In this framing, data sovereignty is not a technical specification. It is a vision of human dignity in digital form.

"Data is not just information. In frontier and indigenous communities, data about local practices, land use, cultural knowledge, and educational patterns constitutes an extension of collective identity and self-determination. Treating it as a commercially extractable resource is not a neutral technical decision. It is an act of appropriation." — AFIRMASI Research Team, synthesizing Hummel et al. (2021) and Rolan et al. (2020)

Indigenous and Village Communities: The Highest-Stakes Case

The research literature on indigenous and village-level data governance documents a pattern that will be immediately recognizable to anyone who has worked in AFIRMASI's operating context. Reyes-García et al. (2022), studying community-based environmental monitoring programs across multiple continents, find that Indigenous and rural communities routinely possess sophisticated, place-based ecological knowledge accumulated over generations. When digital monitoring tools arrive — drones, sensors, mobile data collection apps — this knowledge is suddenly legible to, and extractable by, outside institutions in ways it never was before.

The communities that generate this knowledge rarely control the platforms that capture it. They rarely have input into how it is analyzed, shared, or published. And they rarely receive proportional benefit when it is used for scientific publications, policy decisions, or commercial applications. The result, Reyes-García et al. argue, is a form of digital colonialism: the technological amplification of a historical pattern in which external actors extract value from indigenous and frontier communities without adequate consent, compensation, or governance participation.

Islam et al. (2024) and Moudgalya & Swaminathan (2024) extend this analysis to racialized and marginalized communities in digital health and education contexts, arguing for governance frameworks that center self-determination, collective rights, and culturally grounded definitions of wellness and knowledge — as opposed to frameworks that default to individual consent models designed for atomized urban users with high digital literacy and strong legal recourse.

The distinction matters because individual consent models — the "I Agree" button at the bottom of a 40-page terms of service — are structurally inadequate for communities where the harms of data misuse are collective, where the relevant knowledge is communally held, and where the power asymmetry between the user and the platform operator is extreme.

Students, Classrooms, and the Surveillance Economy

The classroom is where the data sovereignty problem is most acute for AFIRMASI's operational context, and where the research literature is most alarming.

Hakimi et al. (2021), in a thematic review of educational digital trace data ethics covering 91 studies, document pervasive concerns about consent, privacy, and surveillance in EdTech contexts. The specific failure modes they identify are consistent across geographies: students are rarely meaningfully informed about what data is collected; consent frameworks are designed around adult autonomy norms that do not translate to child and adolescent contexts; and the data collected — keystroke patterns, reading speed, attention tracking, emotional inference — is qualitatively more invasive than the data collected in earlier generations of educational technology.

Vetter & McDowell (2023), analyzing EdTech platform surveillance across the COVID-19 digital education expansion, describe a "spectrum of surveillance" in which platform design choices about data collection systematically produce epistemic inequality: students at well-resourced schools with privacy-conscious IT departments and informed parent communities are protected by technical and governance safeguards that students in under-resourced schools — including rural and frontier schools — simply do not have access to.

"A student in a frontier region deserves the exact same data protection as a student in an elite urban academy. This is not a technical aspiration. It is a rights claim — and current EdTech policy fails to honor it in the vast majority of frontier contexts." — AFIRMASI Research Team, synthesizing Sun (2023) and Vetter & McDowell (2023)

Sun (2023), examining the legal landscape of student privacy across jurisdictions, finds "gaps, guesswork, and ghosts" in regulatory coverage — a phrase precise enough to be worth quoting directly. In most frontier education contexts, including Indonesia's 3T regions, there is no enforceable regulatory framework specifically governing EdTech data collection from minors in low-digital-literacy environments. This is not a gap waiting to be filled. It is an active space of harm.

Putri et al. (2024), studying digital security vulnerabilities in Indonesian village communities (specifically Desa Pematang Jering), document that rural and frontier residents face compounded risk: insufficient digital literacy to recognize data risks, inadequate regulatory protection even when risks are identified, and limited legal recourse when personal data is misused through social media and platform apps.

Digital Equity Cannot Exist Without Data Sovereignty

The research literature increasingly treats data sovereignty not as a separate concern from digital equity but as its necessary condition. Rolan et al. (2020) frame it most explicitly: without equitable control over data, access to digital tools is not empowerment. It is enrollment into a system of extraction.

A student who can access an AI tutoring platform but whose learning patterns are harvested, sold, and used to train commercial models without her consent has not gained a digital right. She has been granted digital access on terms that benefit the platform operator more than they benefit her. The device is hers to hold. The value generated by her use of it is not.

Moudgalya & Swaminathan (2024), working specifically in the intersection of data sovereignty and AI education for justice-oriented communities, call for three operational commitments that map directly onto AFIRMASI's design principles: community-based design (systems built with communities, not for them); data literacy as part of the educational offering itself, not assumed as a prerequisite; and shared governance structures that give affected communities formal authority over data decisions, not merely consultative input.

AFIRMASI's Data Sovereignty Architecture

Because our AI systems run offline on local hardware in 3T schools, the most fundamental data sovereignty provision is built into the technical architecture: student interaction data never leaves the local network. There is no cloud endpoint. There is no central server in Jakarta. There is no API call to an external service that logs and retains session data. The interaction happens locally, the data is stored locally, and the teacher or school administrator is the custodian — not a platform company.

This is not merely a privacy feature. It is a governance statement: the community is the steward of its own data. AFIRMASI operates as a technical enabler and training partner, not as a data custodian or data broker. We do not harvest, we do not aggregate, and we do not monetize student interaction data from 3T communities — because that data is not ours.

Beyond the technical layer, AFIRMASI's Community Data Protocol establishes formal governance structures for any program that does involve data sharing — including research partnerships and impact assessments. Communities are informed partners in these processes, with explicit consent procedures, plain-language data use agreements translated into local languages where applicable, and formal rights to withdraw data from research datasets without consequence.

This is what data sovereignty looks like in practice. It is not a privacy policy buried in platform terms of service. It is a set of enforceable governance commitments embedded in program design from the outset — because retrofitting data sovereignty after a system is deployed is, as the literature consistently shows, insufficient.

The Larger Argument

The convergence across sociology, information science, education research, indigenous studies, and digital health is striking: meaningful digital equity requires equitable control over data. This control must be grounded in local values, enforced through governance structures that communities themselves participate in designing, and protected by technical architectures that make extraction structurally difficult — not merely contractually prohibited.

For Indonesia's 3T communities, this is not an abstract policy debate. It is a concrete design requirement for every digital tool, AI system, and EdTech platform that enters their schools and homes. The question is not whether these communities will be datafied. They already are. The question is whether that datafication will happen to them, or with them, and for whom its value will ultimately accrue.

AFIRMASI's answer is the latter. We build systems in which frontier communities are the owners of their data, the stewards of their knowledge, and the primary beneficiaries of the AI tools that learn from their context. Any other approach is not digital equity. It is digital extraction with a modern interface.