News Release

Sep. 16, 2020 JPXTSE TSE to launch TSE Timely Disclosure Corpus Service

 

With the aim of utilizing machine translation, etc. to meet the growing need for disclosure in English, from September 16, 2020, Tokyo Stock Exchange, Inc. (TSE) will launch its "TSE Timely Disclosure Corpus Data Service".
Corpus is an accumulation of digitized natural language sentences utilized for research on natural language processing, etc. and in recent years, it is employed especially for machine translation.
With the use of the TSE Timely Disclosure Corpus Service, it is expected that provision of information will be strengthened by efficient and sophisticated creation of English disclosure documents from listed companies and translation companies and that more overseas investors will use timely disclosure documents created by utilizing machine translation.

Service Overview

The data provided by this service as the "TSE Timely Disclosure Corpus" is as follows:

Data Type Data Overview
Timely Disclosure Monolingual Corpus(Jpn or Eng)
Both Japanese and English corpuses are constructed from text that is automatically extracted from timely disclosure documents, etc. (in PDF) during a specific time period.
Timely Disclosure Parallel Corpus(Jpn and Eng)
A parallel corpus of Japanese and English is based on the above monolingual corpus from timely disclosure documents.
Parallel corpus for 1 year
(+ monolingual corpus pair file)
A set of the above parallel corpus (Jpn-Eng) and text-extracted Japanese-English files (monolingual corpus) constructed from the same corpus based on timely disclosure documents.

Data provision environment

TSE uploads TSE Timely Disclosure Corpus into buckets on the public cloud provided by Amazon Web Services. Customers are then provided with an Access Key ID, etc. which can be used to download the data they wish to use from the buckets through the Internet.

Fees and how to apply

(The service was terminated in 2022.)

 

Contact : For inquiries about the initiation procedure, contract and billing

Tokyo Stock Exchange, Inc. Information Services Department
E-mail:inf_dev@jpx.co.jp


Contact : For inquiries about TSE Timely Disclosure Corpus Data and “TSE Timely Disclosure Corpus Service Guide”

JPX Market Innovation & Research, Inc. Frontier Development
E-mail:jpx-fintech@jpx.co.jp