Usage Guide¶
This guide explains how to use the tea-data-file-conversion package via the command-line interface (CLI) and programmatically through the processor module.
Command-Line Interface (CLI)¶
The CLI provides a simple way to convert fixed-width files into CSV format using a dynamic YAML schema.
Display Help¶
To see the available commands and options, run:
tea-data-file-conversion --help
Process a Fixed-Width File¶
To convert a fixed-width file to CSV using a schema, use the following command:
tea-data-file-conversion input_file.txt --output_file output.csv --schema_folder path/to/schema
input_file.txt: Path to the fixed-width input file.--output_file output.csv(optional): Path for the converted CSV file. If omitted,_output.csvis appended to the input filename.--schema_folder path/to/schema(optional): Path to the folder containing YAML schema files.
Exporting YAML Schema Templates¶
To export default YAML schema templates for customization, run:
tea-data-file-conversion --export_templates --schema_folder path/to/schema_folder
This will copy built-in templates into the specified folder.
Processor Module¶
If you need to integrate file conversion into a Python script, use the processor module.
Processing a Fixed-Width File Programmatically¶
You can use the process_file function to process a file programmatically:
from tea_data_file_conversion.processor import process_file
# Define file paths
input_file = "data/input_file.txt"
output_file = "data/output_file.csv"
schema_folder = "schemas"
# Process the file
df = process_file(input_file, output_file=output_file, schema_folder=schema_folder)
print("File converted successfully!")
Exporting YAML Schema Templates Programmatically¶
To export the built-in YAML schema templates via a script:
from tea_data_file_conversion.processor import export_templates
export_templates("schemas")
Generating a YAML Schema from a CSV File¶
If you have a CSV defining field positions, you can convert it to a YAML schema:
from tea_data_file_conversion.processor import csv_to_schema_yaml
csv_to_schema_yaml("fields.csv", yaml_output_file="schema.yaml")
YAML Schema Format¶
YAML configuration files define how the fixed-width file should be parsed. They follow this format:
fields:
- start: 1
end: 10
output_field: "StudentID"
keep: true
- start: 11
end: 20
output_field: "Score"
keep: false
Schema Validation¶
The validate_yaml_config function ensures schemas follow this structure. If the schema is invalid, an error will be raised.
Error Handling¶
If an error occurs while processing a file, ensure:
The schema file exists and is correctly formatted.
The input file follows the expected fixed-width format.
When using the processor module, wrap function calls in a try/except block:
try:
process_file("input.txt", "output.csv", schema_folder="schemas")
except Exception as e:
print(f"Error: {e}")