Subject categorisation experiments with AI in MTMT (hdl:21.15109/ARP/VWQFD2)

View:

Part 1: Document Description
Part 2: Study Description
Part 5: Other Study-Related Materials
Entire Codebook

(external link)

Document Description

Citation

Title:

Subject categorisation experiments with AI in MTMT

Identification Number:

hdl:21.15109/ARP/VWQFD2

Distributor:

ARP

Date of Distribution:

2026-05-08

Version:

2

Bibliographic Citation:

Micsik, András; Tanácsi, Roland, 2026, "Subject categorisation experiments with AI in MTMT", https://hdl.handle.net/21.15109/ARP/VWQFD2, ARP, V2

Study Description

Citation

Title:

Subject categorisation experiments with AI in MTMT

Identification Number:

hdl:21.15109/ARP/VWQFD2

Authoring Entity:

Micsik, András (HUN-REN SZTAKI)

Tanácsi, Roland (HUN-REN SZTAKI)

Date of Production:

2025-11-15

Software used in Production:

Python

Grant Number:

RRF-2.3.1-21-2022-00004

Distributor:

ARP

Access Authority:

Micsik, András

Depositor:

Micsik, András

Date of Deposit:

2026-02-03

Holdings Information:

https://hdl.handle.net/21.15109/ARP/VWQFD2

Study Scope

Keywords:

Computer and Information Science, Számítástechnika és informatika, subject classification, scientific categorization, transformer models, Support Vector Classifier, data cleaning, large language models

Topic Classification:

artificial intelligence

Abstract:

Code, sample data and results for subject categorisation experiments with AI in MTMT

Methodology and Processing

Sources Statement

Data Access

Notes:

<a href="http://creativecommons.org/licenses/by-nc-nd/4.0">CC BY-NC-ND 4.0</a>

Other Study Description Materials

Related Publications

Citation

Title:

Tanácsi, R., & Micsik, A. (2026). A Comparative Evaluation of AI Approaches to Large-Scale Scientific Subject Classification. Big Data and Cognitive Computing, 10(5), 151.

Identification Number:

10.3390/bdcc10050151

Bibliographic Citation:

Tanácsi, R., & Micsik, A. (2026). A Comparative Evaluation of AI Approaches to Large-Scale Scientific Subject Classification. Big Data and Cognitive Computing, 10(5), 151.

Other Study-Related Materials

Label:

README.txt

Notes:

text/plain

Other Study-Related Materials

Label:

lvl4-mtmt-large-multiclass-svm-rbf.zip

Notes:

application/octet-stream

Other Study-Related Materials

Label:

svm_rbf_confusion_matrix_percent.csv

Notes:

text/csv

Other Study-Related Materials

Label:

annif.csv

Notes:

text/csv

Other Study-Related Materials

Label:

embedding_scikit.csv

Notes:

text/csv

Other Study-Related Materials

Label:

scibert_lvl3.csv

Notes:

text/csv

Other Study-Related Materials

Label:

scibert_lvl4.csv

Notes:

text/csv

Other Study-Related Materials

Label:

scibert_lvl4_subtopics.csv

Notes:

text/csv

Other Study-Related Materials

Label:

scibert_moe.csv

Notes:

text/csv

Other Study-Related Materials

Label:

frascati_mapping.json

Notes:

application/json

Other Study-Related Materials

Label:

sample_evaluation_data.csv

Notes:

text/csv

Other Study-Related Materials

Label:

sample_evaluation_data.npy

Notes:

application/octet-stream

Other Study-Related Materials

Label:

sample_training_data.csv

Notes:

text/csv

Other Study-Related Materials

Label:

sample_training_data.npy

Notes:

application/octet-stream

Other Study-Related Materials

Label:

eval_svm_rbf.py

Notes:

text/x-python

Other Study-Related Materials

Label:

requirements.txt

Notes:

text/plain

Other Study-Related Materials

Label:

train_svm_rbf.py

Notes:

text/x-python