openfactcheck.evaluator.LLMEvaluator#
- class openfactcheck.evaluator.LLMEvaluator(ofc)[source][source]#
This class is used to evaluate the performance of a Language Model.
- Parameters:
model_name (str) – The name of the Language Model.
input_path (Union[str, pd.DataFrame]) – The path to the CSV file or the DataFrame containing the LLM responses. The CSV file should have the following two columns: - index: The index of the response. - response: The response generated by the LLM.
output_path (str) – The path to store the output files.
dataset_path (str) – The path to the dataset file containing the questions.
datasets (list) – The list of datasets to evaluate the LLM on.
analyze (bool) – Whether to analyze the results.
save_plots (bool) – Whether to save the plots.
save_report (bool) – Whether to save the report.
ofc (OpenFactCheck)
- model_name#
The name of the Language Model.
- Type:
str
- run_id#
The unique identifier for the run.
- Type:
str
- input_path#
The path to the CSV file or the DataFrame containing the LLM responses.
- Type:
Union[str, pd.DataFrame]
- output_path#
The path to store the output files.
- Type:
str
- dataset_path#
The path to the dataset file containing the questions.
- Type:
str
- datasets#
The list of datasets to evaluate the LLM on.
- Type:
list
- combined_result#
The combined evaluation results for all datasets.
- Type:
dict
- evaluate(model_name: str, input_path: Union[str, pd.DataFrame], output_path: str = "", dataset_path: str = "", datasets: list = ["snowballing"], analyze: bool = True, save_plots: bool = True, save_report: bool = True):
This function evaluates the performance of the Language Model.
- read_input():
This function reads the input file and dataset file and returns a DataFrame containing the combined data.
- filter_responses(df: pd.DataFrame, dataset: str):
Filter the responses based on the dataset.
- generate_plots(fig_path, save_plots=True):
Generate plots for the evaluation
- __init__(ofc)[source][source]#
Initialize the FreeTextEvaluator object.
- Parameters:
ofc (OpenFactCheck)
Methods
__init__
(ofc)Initialize the FreeTextEvaluator object.
assess_freetext
(output_path)Assess the free-text experiment, i.e., the number and type of claims, this is, Exact Matching (EM).
calculate_price
(num_claims[, cost_openai, ...])Calculate the cost (in USD) of the API calls for the free-text experiment.
call_fresheval
(prefix, question, response, ...)Call the FreshEval API to evaluate responses.
call_openai_api
(prompt, temperature, max_tokens)Call the OpenAI API to generate responses.
cut_sentences
(content)Cut the content into sentences.
cut_sub_string
(input_string[, window_size, ...])Cut the input string into sub-strings of a fixed window size.
evaluate
(model_name, input_path[, ...])evaluate_freetext
(llm_responses, model_name, ...)Evaluate the LLM responses on free-text datasets.
evaluate_freshqa
(llm_responses)Evaluate the responses generated by the LLM on FreshQA questions.
evaluate_selfaware
(llm_responses)evaluate_snowballing
(llm_responses)Evaluate the LLM responses on the Snowballing dataset.
extract_ratings
(response)Extract the rating from the evaluation response.
filter_responses
(df, dataset)freetext_barplot
(results[, fig_path, save])Create a barplot for the free-text evaluation results, ensuring full row utilization.
freshqa_piechart
(result[, fig_path, save])Plot a pie chart of the true and false answers on FreshQA.
generate_plots
([fig_path, save_plots])generate_report
(report_path)get_boolean
(response[, strict])Get a boolean value from the response.
get_unanswerable
(response, model, tokenizer)Predict whether the response is unanswerable or not.
group_cosine_similarity
(model, tokenizer, ...)Calculate the cosine similarity between two groups of sentences.
read_evaluations
()Read the evaluations from the output directory.
read_input
()This function reads the input file and dataset file and returns a DataFrame containing the combined data.
read_results
(evaluations)Read the results from the evaluations.
remove_punctuation
(input_string)Remove the punctuation from the input string.
selfaware_barplot
(result[, fig_path, save])Create a bar plot of the performance on the SelfAware dataset.
selfaware_cm
(labels, preds[, fig_path, save])Create a confusion matrix for the SelfAware dataset.
snowballing_barplot
(result[, fig_path, save])Create a bar plot of the accuracy of the LLM responses on the Snowballing dataset for each topic and the overall accuracy.
snowballing_cm
(labels, preds[, fig_path, save])Create a confusion matrix for the Snowballing dataset.
sum_all_elements
(obj)Sum all elements of an object.