tea_data_file_conversion package¶
- tea_data_file_conversion.export_templates(schema_folder)[source]¶
Export sample YAML template files to a specified folder.
The function copies files from the built-in default_schema directory (packaged with this module) into the target folder while preserving the original directory structure.
- Parameters:
schema_folder (str) – The destination folder for exporting the template YAML files.
Notes
The function exits after exporting the template files.
- tea_data_file_conversion.process_file(input_file, output_file=None, schema_folder=None, filter_columns=False)[source]¶
Process an input fixed-width file and output a CSV file.
- The function:
- Determines the appropriate YAML schema based on header info. - Loads and validates the schema. - Processes the input file and writes the output DataFrame to CSV.
- Parameters:
input_file (str) – The path to the fixed-width input file.
output_file (str, optional) – File path for the output CSV. Defaults to input file name with ‘_output.csv’ appended.
schema_folder (str, optional) – Folder where the YAML schema files are located; defaults to the current folder.
filter_columns (bool, optional) – If True, only load columns flagged with “keep”: true (default is False).
- Returns:
The processed DataFrame.
- Return type:
pd.DataFrame
- tea_data_file_conversion.validate_yaml_config(config, file_path)[source]¶
Validate the structure of the YAML configuration.
The configuration must be a dictionary containing a key ‘fields’ mapping to a list. Each field in the list must contain ‘start’, ‘end’, and ‘output_field’ keys.
- Parameters:
config (dict) – The YAML configuration dictionary.
file_path (str) – File path used for reporting in error messages.
- Raises:
ValueError – If the configuration does not adhere to the expected schema.
Submodules¶
tea_data_file_conversion.cli module¶
Command-line interface for fixed-width file processing.
This module provides an entry point to either process a fixed-width file into CSV format using a dynamic YAML schema or export default YAML templates.
tea_data_file_conversion.processor module¶
Processor module for fixed-width file conversion.
- This module provides functions to:
- Load and validate YAML schema configurations. - Process fixed-width files into structured DataFrame objects. - Export template YAML schema files. - Convert CSV files into YAML schema files interactively.
- tea_data_file_conversion.processor.csv_to_schema_yaml(csv_file, yaml_output_file=None)[source]¶
Convert a CSV file into a YAML schema file for fixed-width processing.
This function loads a CSV file, lists available columns, and interactively prompts the user to select fields corresponding to start, end, and output values, then writes out a YAML file with the chosen configuration.
- Parameters:
csv_file (str) – Path to the input CSV file.
yaml_output_file (str, optional) – Output file path for the YAML schema. If omitted, a default name is generated.
- tea_data_file_conversion.processor.export_templates(schema_folder)[source]¶
Export sample YAML template files to a specified folder.
The function copies files from the built-in default_schema directory (packaged with this module) into the target folder while preserving the original directory structure.
- Parameters:
schema_folder (str) – The destination folder for exporting the template YAML files.
Notes
The function exits after exporting the template files.
- tea_data_file_conversion.processor.load_yaml_config(file_path)[source]¶
Load a YAML configuration file for processing.
- Parameters:
file_path (str) – The path to the YAML configuration file.
- Returns:
The parsed YAML configuration.
- Return type:
dict
- Raises:
ValueError – If there is an error parsing the YAML file.
- tea_data_file_conversion.processor.process_file(input_file, output_file=None, schema_folder=None, filter_columns=False)[source]¶
Process an input fixed-width file and output a CSV file.
- The function:
- Determines the appropriate YAML schema based on header info. - Loads and validates the schema. - Processes the input file and writes the output DataFrame to CSV.
- Parameters:
input_file (str) – The path to the fixed-width input file.
output_file (str, optional) – File path for the output CSV. Defaults to input file name with ‘_output.csv’ appended.
schema_folder (str, optional) – Folder where the YAML schema files are located; defaults to the current folder.
filter_columns (bool, optional) – If True, only load columns flagged with “keep”: true (default is False).
- Returns:
The processed DataFrame.
- Return type:
pd.DataFrame
- tea_data_file_conversion.processor.process_fixed_width_file(input_file, schema_config, skip_header=False, filter_columns=False)[source]¶
Process a fixed-width file using the provided YAML schema configuration.
It determines column boundaries based on the schema, reads the file using pandas, and applies optional filtering to only return columns marked to be kept.
- Parameters:
input_file (str) – The path to the fixed-width text file.
schema_config (dict) – Schema configuration dictionary with field definitions.
skip_header (bool, optional) – Skip the header row if True (default is False).
filter_columns (bool, optional) – If True, return only DataFrame columns that are marked with “keep”: true.
- Returns:
DataFrame with the processed data.
- Return type:
pd.DataFrame
- tea_data_file_conversion.processor.validate_yaml_config(config, file_path)[source]¶
Validate the structure of the YAML configuration.
The configuration must be a dictionary containing a key ‘fields’ mapping to a list. Each field in the list must contain ‘start’, ‘end’, and ‘output_field’ keys.
- Parameters:
config (dict) – The YAML configuration dictionary.
file_path (str) – File path used for reporting in error messages.
- Raises:
ValueError – If the configuration does not adhere to the expected schema.