Dataset Concepts

One of LightTag's core concepts is a dataset. We think it is important to have data grouped together in some cohesive way, for example "This data is training" and "this data is test", or " This data came from source A and that data came from source B".

Datasets are that logical collection of data. Speaking of data, we usually call one piece of text to be annotated an Example and a dataset is a collection of examples.

When you make a job, it will always be to annotate a dataset and the job is done when every example in the dataset has been annotated by as many annotators as you specified.

LightTag keeps your metadata

Often, you have text to annotate with some additional metadata, like it's id in your database or some comments. LightTag will keep that data intact and optionally display it to your annotators. Importantly, it will be there when you download your annotations.

Adding Data From CSVs

To add a dataset from a CSV simple select the csv file in the data upload screen

Then select the column in the csv you want to annotate

My CSV Data looks wrong

CSV is an evil format, and sometimes it's hard to parse. Use the following python code to fix csvs that look wrong

import pandas as pd
MyBadData = pd.read_csv('/path/to/my.csv')
MyBadData.to_csv('/path/to/my/fixed.csv')

Adding Data From JSONs

Adding a JSON file is exactly the same. We expect your JSON to be an array of objects. Something like this:

[
{
"Review Status": null,
"QueryUsed": "3827 Aspen Creek Ave North Las Vegas, NV 89031",
"Output": "Ave"
},
{
"Review Status": null,
"QueryUsed": "3827 Aspen Creek, Broomfield, CO Ave North Las Vegas, NV 89031",
"Output": "Ave"
},
{
"Review Status": null,
"QueryUsed": "3911 South, Independence, MO third str",
"Output": "third str"
},
{
"Review Status": null,
"QueryUsed": "3Apartments in Wichita, KS all bills paid 2 bed ",
"Output": "3"
},
{
"Review Status": null,
"QueryUsed": "5840 Farrington oad",
"Output": "5840 oad"
},
{
"Review Status": null,
"QueryUsed": "5840 Farrington road",
"Output": "5840"
},
]

Pasting Data

If you don't have any files in JSON or CSV, you can paste text in. LightTag will split it into sentences for you and each sentence will be an example in the dataset.

Did this answer your question?