Data Sources
Table of Contents
Data Sources
Our Lexical Foundation: Wiktionary
Taskusanakirja’s comprehensive Finnish-English dictionary is built upon data sourced from Wiktionary, the collaborative multilingual dictionary project. According to Wiktionary’s own copyright policy, the original text of its entries is dual-licensed to the public under both the Creative Commons Attribution-ShareAlike 4.0 International License (CC BY-SA 4.0) and the GNU Free Documentation License (GFDL), Version 1.1 or any later version.
This dual-license framework ensures that the data will remain free forever and can be used by anyone, subject to the conditions of those licenses. As a user of Taskusanakirja’s derived dictionary data, you must comply with the terms of both licenses, which primarily require that you attribute the source and share any derivative works under the same licenses.
Example Sentences: Tatoeba
To provide real-world context, Taskusanakirja embeds example sentences sourced from the Tatoeba project, an open, collaborative database of sentences and translations.
License for Sentences
The vast majority of textual sentences from Tatoeba are licensed under the Creative Commons Attribution 2.0 France (CC-BY 2.0 FR) license.
This license grants you the freedom to:
- Share: Copy and redistribute the sentences.
- Adapt: Remix, transform, and build upon the sentences.
- Use Commercially: Use the sentences for any purpose, including commercial ones.
Attribution for Sentences
The core requirement of the CC-BY license is attribution. When you use a sentence from Tatoeba, you must give appropriate credit by citing the original author of that sentence. Taskusanakirja provides this attribution for every example sentence displayed.
Disclaimer
As stated in their Terms of Use, Tatoeba is a community-driven project. The validity and accuracy of sentences and their translations are not guaranteed by any professional intervention. They are provided for linguistic utility and should be used with this understanding.
Open Data Commitment & Commercial Use
The licenses for both Wiktionary and Tatoeba data permit commercial use. The “free” in their licenses refers to freedom, not price.
Therefore, while the data itself is licensed for free use, we may charge a fee for the service of accessing our processed, curated, and conveniently packaged datasets. This fee covers our development, hosting, and maintenance costs. Any data you purchase and download from us remains under its original open licenses, granting you all the freedoms that come with them.
Available Datasets
- Inflection Database (Available with Taskusanakirja Pro): Our complete Finnish inflection database, derived from Wiktionary.
- Format: SQLite database
- License: CC BY-SA 4.0 & GFDL
- Download: Available upon request with a purchase of Taskusanakirja Pro.
- Dictionary Trie Data (Available with Taskusanakirja Pro): The core dictionary data file, derived from Wiktionary.
- Format: JSON
- License: CC BY-SA 4.0 & GFDL
- Download: Available upon request with a purchase of Taskusanakirja Pro.
- Tatoeba Example Sentences Database: Our curated collection of Finnish-English example sentences from Tatoeba.
- Format: SQLite database
- License: CC BY 2.0 FR
- Download: Available upon request with a purchase of Taskusanakirja Pro.
Attribution and Your Obligations
When using Taskusanakirja’s derived data, you must provide attribution and comply with the respective license terms for each data source (Wiktionary and/or Tatoeba).
Contributing Back
We strongly encourage users to contribute improvements back to the source projects. Strengthening the open linguistic data ecosystems of Wiktionary and Tatoeba is what makes tools like Taskusanakirja possible.
Questions?
If you have questions about data licensing, need the data in a different format, or want to discuss large-scale usage, please contact us at:
Email: andrew@siilikuin.com
Company Website: https://siilikuin.com
Company Registration: Siilikuin, Finnish Y-tunnus / business ID: 3372332-8
Location: Finland