Regression & Correlation Tools

The regression module provides comprehensive tools for statistical modeling and correlation analysis.

Regression Analysis Tools for RMCP MCP Server. This module provides comprehensive regression modeling capabilities including: - Linear regression with diagnostics - Logistic regression for binary outcomes - Correlation analysis with significance testing - Comprehensive model validation and statistics All tools support missing value handling, weighted observations, and return detailed statistical outputs suitable for research and business analysis. Example Usage:

>>> # Linear regression on sales data
>>> data = {"sales": [100, 120, 140], "advertising": [10, 15, 20]}
>>> result = await linear_model(context, {
...     "data": data,
...     "formula": "sales ~ advertising"
... })
>>> print(f"R-squared: {result['r_squared']}")

async rmcp.tools.regression.linear_model(context, params)[source]

Fit ordinary least squares (OLS) linear regression model. This tool performs comprehensive linear regression analysis using R’s lm() function. It supports weighted regression, missing value handling, and returns detailed model diagnostics including coefficients, significance tests, and goodness-of-fit. :type context: :param context: Request execution context for logging and progress :type params: :param params: Dictionary containing:

data: Dataset as dict of column_name -> [values]

formula: R formula string (e.g., “y ~ x1 + x2”)

weights: Optional array of observation weights

na_action: How to handle missing values (“na.omit”, “na.exclude”, “na.fail”)

Returns:

coefficients: Model coefficients by variable name
std_errors: Standard errors of coefficients
t_values: t-statistics for coefficient tests
p_values: p-values for coefficient significance
r_squared: Coefficient of determination
adj_r_squared: Adjusted R-squared
fstatistic: Overall F-statistic value
f_pvalue: p-value for overall model significance
residual_se: Residual standard error
fitted_values: Predicted values for each observation
residuals: Model residuals
n_obs: Number of observations used

Return type:

Dictionary containing

Example

>>> # Simple linear regression
>>> data = {
...     "price": [100, 120, 140, 160, 180],
...     "size": [1000, 1200, 1400, 1600, 1800]
... }
>>> result = await linear_model(context, {
...     "data": data,
...     "formula": "price ~ size"
... })
>>> print(f"Price increases ${result['coefficients']['size']:.2f} per sq ft")
>>> print(f"Model explains {result['r_squared']:.1%} of variance")
>>> # Multiple regression with weights
>>> data = {
...     "sales": [100, 150, 200, 250],
...     "advertising": [10, 20, 30, 40],
...     "price": [50, 45, 40, 35]
... }
>>> result = await linear_model(context, {
...     "data": data,
...     "formula": "sales ~ advertising + price",
...     "weights": [1, 1, 2, 2]  # Weight later observations more
... })

async rmcp.tools.regression.correlation_analysis(context, params)[source]

Compute correlation matrix with significance testing. This tool calculates pairwise correlations between numeric variables using Pearson, Spearman, or Kendall methods. It includes significance tests for each correlation and handles missing values appropriately. :type context: :param context: Request execution context for logging and progress :type params: :param params: Dictionary containing:

data: Dataset as dict of column_name -> [values]

variables: Optional list of variable names to include

method: Correlation method (“pearson”, “spearman”, “kendall”)

use: Missing value handling strategy

Returns:

correlation_matrix: Pairwise correlations as nested dict
significance_tests: p-values for each correlation
sample_sizes: Number of complete observations for each pair
variables_used: List of variables included in analysis
method_used: Correlation method applied

Return type:

Dictionary containing

Example

>>> # Basic correlation analysis
>>> data = {
...     "sales": [100, 150, 200, 250, 300],
...     "advertising": [10, 20, 25, 35, 40],
...     "price": [50, 48, 45, 42, 40]
... }
>>> result = await correlation_analysis(context, {
...     "data": data,
...     "method": "pearson"
... })
>>> sales_ad_corr = result["correlation_matrix"]["sales"]["advertising"]
>>> print(f"Sales-Advertising correlation: {sales_ad_corr:.3f}")
>>> # Spearman correlation for non-linear relationships
>>> result = await correlation_analysis(context, {
...     "data": data,
...     "method": "spearman",
...     "variables": ["sales", "advertising"]
... })

async rmcp.tools.regression.logistic_regression(context, params)[source]

Fit logistic regression model.

Return type:: dict[str, Any]

Functions

Linear Regression

async rmcp.tools.regression.linear_model(context, params)[source]

Fit ordinary least squares (OLS) linear regression model. This tool performs comprehensive linear regression analysis using R’s lm() function. It supports weighted regression, missing value handling, and returns detailed model diagnostics including coefficients, significance tests, and goodness-of-fit. :type context: :param context: Request execution context for logging and progress :type params: :param params: Dictionary containing:

data: Dataset as dict of column_name -> [values]

formula: R formula string (e.g., “y ~ x1 + x2”)

weights: Optional array of observation weights

na_action: How to handle missing values (“na.omit”, “na.exclude”, “na.fail”)

Returns:

coefficients: Model coefficients by variable name
std_errors: Standard errors of coefficients
t_values: t-statistics for coefficient tests
p_values: p-values for coefficient significance
r_squared: Coefficient of determination
adj_r_squared: Adjusted R-squared
fstatistic: Overall F-statistic value
f_pvalue: p-value for overall model significance
residual_se: Residual standard error
fitted_values: Predicted values for each observation
residuals: Model residuals
n_obs: Number of observations used

Return type:

Dictionary containing

Example

>>> # Simple linear regression
>>> data = {
...     "price": [100, 120, 140, 160, 180],
...     "size": [1000, 1200, 1400, 1600, 1800]
... }
>>> result = await linear_model(context, {
...     "data": data,
...     "formula": "price ~ size"
... })
>>> print(f"Price increases ${result['coefficients']['size']:.2f} per sq ft")
>>> print(f"Model explains {result['r_squared']:.1%} of variance")
>>> # Multiple regression with weights
>>> data = {
...     "sales": [100, 150, 200, 250],
...     "advertising": [10, 20, 30, 40],
...     "price": [50, 45, 40, 35]
... }
>>> result = await linear_model(context, {
...     "data": data,
...     "formula": "sales ~ advertising + price",
...     "weights": [1, 1, 2, 2]  # Weight later observations more
... })

Logistic Regression

async rmcp.tools.regression.logistic_regression(context, params)[source]

Fit logistic regression model.

Return type:: dict[str, Any]

Correlation Analysis

async rmcp.tools.regression.correlation_analysis(context, params)[source]

Compute correlation matrix with significance testing. This tool calculates pairwise correlations between numeric variables using Pearson, Spearman, or Kendall methods. It includes significance tests for each correlation and handles missing values appropriately. :type context: :param context: Request execution context for logging and progress :type params: :param params: Dictionary containing:

data: Dataset as dict of column_name -> [values]

variables: Optional list of variable names to include

method: Correlation method (“pearson”, “spearman”, “kendall”)

use: Missing value handling strategy

Returns:

correlation_matrix: Pairwise correlations as nested dict
significance_tests: p-values for each correlation
sample_sizes: Number of complete observations for each pair
variables_used: List of variables included in analysis
method_used: Correlation method applied

Return type:

Dictionary containing

Example

>>> # Basic correlation analysis
>>> data = {
...     "sales": [100, 150, 200, 250, 300],
...     "advertising": [10, 20, 25, 35, 40],
...     "price": [50, 48, 45, 42, 40]
... }
>>> result = await correlation_analysis(context, {
...     "data": data,
...     "method": "pearson"
... })
>>> sales_ad_corr = result["correlation_matrix"]["sales"]["advertising"]
>>> print(f"Sales-Advertising correlation: {sales_ad_corr:.3f}")
>>> # Spearman correlation for non-linear relationships
>>> result = await correlation_analysis(context, {
...     "data": data,
...     "method": "spearman",
...     "variables": ["sales", "advertising"]
... })