摘要

我们经常使用useGenerateKeys来返回自增主键，避免多一次查询。也会经常使用on duplicate key update，来进行批量的upsert。
用起来真的很方便，但是最近在用的过程中，不小心掉进了坑里，所以这里来记录一下～

问题

先描述一下这个坑：

<insert id="upsert" useGeneratedKeys="true" keyProperty="id" parameterType="java.util.List">
  insert into app_product_phase(id, name, created_by, created_at, updated_by, updated_at)
  VALUES
  <foreach collection="list" index="index" separator="," item="item">
    (#&#123;item.id&#125;, #&#123;item.name&#125;, #&#123;entity.createdBy&#125;, NOW(), #&#123;entity.updatedBy&#125;, NOW())
  </foreach>
  on duplicate key update updated_by = values(updated_by), updated_at = NOW()
</insert>

我需要将一个list批量插入到数据库中，起初，只是插入进去就可以了，所以这么写并没有任何问题～
但是后来由于业务原因，需要对这些数据做一些后续处理，所以需要拿到他们的自增id。
然后并没有修改代码，结果发现了很大的问题，除了第一条数据有id之后，其他数据的id均是0～
而且，即使第一条数据的id也不一定是对的～
这里分两种情况，如果第一条数据是新插入的行，那么这个id就是真实的～否则那就是后面数据中第一个新插入行的id～如果没有新插入行，那么这个id似乎是最后一条update那行数据的id（这里不怎么确定）。
但是，明明Mybatis的3.3.1之后的版本都是支持批量插入时返回自增主键的，就很困惑～
但是当我尝试把on duplicate key update去掉后，确实拿到的数据都是对的，那么问题就出现在这一句了～
接下来，我们就分析一下这到底是咋回事～

分析

Mybatis怎么获得自增主键的？

我们都知道，如果想在insert后得到数据的自增主键，那么我们需要使用useGeneratedKeys="true"。
所以很自然，我们需要去看看这个东西到底是怎么回事～
在Mybatis的官方文档中，有这么一句话：

useGeneratedKeys （仅适用于 insert 和 update）这会令 MyBatis 使用 JDBC 的 getGeneratedKeys 方法来取出由数据库内部生成的主键（比如：像 MySQL 和 SQL Server 这样的关系型数据库管理系统的自动递增字段），默认值：false。

所以，我们就知道了，Mybatis时使用JDBC的getGeneratedKeys方法做的这件事～

JDBC的getGeneratedKeys方法

我在项目中找到了对应的代码(mysql-connector-java:5.1.46:com/mysql/jdbc/StatementImpl.java:1576)

    /**
     * @throws SQLException
     */
    public java.sql.ResultSet getGeneratedKeys() throws SQLException &#123;
        synchronized (checkClosed().getConnectionMutex()) &#123;
            if (!this.retrieveGeneratedKeys) &#123;
                throw SQLError.createSQLException(Messages.getString("Statement.GeneratedKeysNotRequested"), SQLError.SQL_STATE_ILLEGAL_ARGUMENT,
                        getExceptionInterceptor());
            &#125;

            if (this.batchedGeneratedKeys == null) &#123;
                if (this.lastQueryIsOnDupKeyUpdate) &#123;
                    return this.generatedKeysResults = getGeneratedKeysInternal(1);
                &#125;
                return this.generatedKeysResults = getGeneratedKeysInternal();
            &#125;

            Field[] fields = new Field[1];
            fields[0] = new Field("", "GENERATED_KEY", Types.BIGINT, 20);
            fields[0].setConnection(this.connection);

            this.generatedKeysResults = com.mysql.jdbc.ResultSetImpl.getInstance(this.currentCatalog, fields, new RowDataStatic(this.batchedGeneratedKeys),
                    this.connection, this, false);

            return this.generatedKeysResults;
        &#125;
    &#125;

划重点啊～请看中间部分的if判断，这里是把是否使用on duplicate key update分别做了不同的处理，其中使用on duplicate key update的调用了getGeneratedKeysInternal()函数，并传入1作为参数；而其他的则调用getGeneratedKeysInternal()函数。

我们先看getGeneratedKeysInternal()函数：

    protected ResultSetInternalMethods getGeneratedKeysInternal() throws SQLException &#123;
        long numKeys = getLargeUpdateCount();
        return getGeneratedKeysInternal(numKeys);
    &#125;

很简单，他其实就是取了一下受影响的行数，之后同样调用了带参数的getGeneratedKeysInternal()方法。那么接下来看一下这个方法：

    protected ResultSetInternalMethods getGeneratedKeysInternal(long numKeys) throws SQLException &#123; // 1
        synchronized (checkClosed().getConnectionMutex()) &#123;
            Field[] fields = new Field[1];
            fields[0] = new Field("", "GENERATED_KEY", Types.BIGINT, 20);
            fields[0].setConnection(this.connection);
            fields[0].setUseOldNameMetadata(true);

            ArrayList<ResultSetRow> rowSet = new ArrayList<ResultSetRow>();

            long beginAt = getLastInsertID(); // 2

            if (beginAt < 0) &#123; // looking at an UNSIGNED BIGINT that has overflowed
                fields[0].setUnsigned();
            &#125;

            if (this.results != null) &#123;
                String serverInfo = this.results.getServerInfo();

                //
                // Only parse server info messages for 'REPLACE' queries
                //
                if ((numKeys > 0) && (this.results.getFirstCharOfQuery() == 'R') && (serverInfo != null) && (serverInfo.length() > 0)) &#123;
                    numKeys = getRecordCountFromInfo(serverInfo);
                &#125;

                if ((beginAt != 0 /* BIGINT UNSIGNED can wrap the protocol representation */) && (numKeys > 0)) &#123;
                    for (int i = 0; i < numKeys; i++) &#123; // 3
                        byte[][] row = new byte[1][];
                        if (beginAt > 0) &#123;
                            row[0] = StringUtils.getBytes(Long.toString(beginAt));
                        &#125; else &#123;
                            byte[] asBytes = new byte[8];
                            asBytes[7] = (byte) (beginAt & 0xff);
                            asBytes[6] = (byte) (beginAt >>> 8);
                            asBytes[5] = (byte) (beginAt >>> 16);
                            asBytes[4] = (byte) (beginAt >>> 24);
                            asBytes[3] = (byte) (beginAt >>> 32);
                            asBytes[2] = (byte) (beginAt >>> 40);
                            asBytes[1] = (byte) (beginAt >>> 48);
                            asBytes[0] = (byte) (beginAt >>> 56);

                            BigInteger val = new BigInteger(1, asBytes);

                            row[0] = val.toString().getBytes();
                        &#125;
                        rowSet.add(new ByteArrayRow(row, getExceptionInterceptor()));
                        beginAt += this.connection.getAutoIncrementIncrement(); // 4
                    &#125;
                &#125;
            &#125;

            com.mysql.jdbc.ResultSetImpl gkRs = com.mysql.jdbc.ResultSetImpl.getInstance(this.currentCatalog, fields, new RowDataStatic(rowSet),
                    this.connection, this, false);

            return gkRs;
        &#125;
    &#125;

这个函数稍有点长，不过我们不用太在意细节～我们要看的就几点：

参数：主键的数量。上文可知，使用on duplicate key update时，我们传入的这个参数是1，没使用时，传入的才是真正插入到数据库中的行数。这个是为啥呢，请看下文～
获取上一次insert的（第一个）主键
根据受影响的行数（也就是新插入的行数），依次处理主键
因为是自增主键，所以获取自增的增量，逐个累加来计算主键。

从这几段代码可以看出来，我们上面sql执行后得到的结果，完全是没有问题的，即使用了on duplicate key update语句时，只有第一条数据的主键是正确的。
因为，在jdbc中，的的确确就只给我们拿了第一行数据的主键。
不过，从这里也可以看出这么做的原因，那就是，如果只是插入的话，那么主键是可以被计算出来的；而如果是对已经存在的数据进行修改的话，那么主键是无法通过这种方式计算出来的。
这也是为啥使用on duplicate key update时，numKeys参数传入的是1。

此外，如果使用了on duplicate key update，那么通过getLargeUpdateCount()方法获取到的受影响行数也是不准确的，具体请见下文哈～

受影响行数

在Mysql的官方文档中是这么写的：

With ON DUPLICATE KEY UPDATE, the affected-rows value per row is 1 if the row is inserted as a new row, 2 if an existing row is updated, and 0 if an existing row is set to its current values. If you specify the CLIENT_FOUND_ROWS flag to the mysql_real_connect() C API function when connecting to mysqld, the affected-rows value is 1 (not 0) if an existing row is set to its current values.

也就是说，在on duplicate key update时，这个受影响行数分3种情况，第一种是插入，插入一行就返回1；第二种是修改，修改一行就返回2；第三种是修改了但是数据未变，这种情况根据不同的配置，返回0或1。
而如果我们的语句中不带on duplicate key update，那么受影响行数自然就是插入的行数了～

补充

为啥getLastInsertID()方法只返回第一个插入行（注意是新插入行，不是第一个受影响行）的主键呢，如果直接返回全部的不就好了吗？也不用再逐行再计算一边了，还能解决这个on duplicate key update的问题。
在Mysql的官方文档上有这么一句话哈～

For a multiple-row insert, LAST_INSERT_ID() and mysql_insert_id() actually return the AUTO_INCREMENT key from the first of the inserted rows. This enables multiple-row inserts to be reproduced correctly on other servers in a replication setup.

再深入的调查就没有了～如有大神看见，请指点～

小结

在Mybatis的3.3.1之后的版本中，可以在批量insert中得到所有数据的主键。
但是如果是批量upsert，即使用了on duplicate key update，不行～

关于Mybatis使用useGeneratedKeys获取自增主键

摘要

问题

分析

Mybatis怎么获得自增主键的？

JDBC的getGeneratedKeys方法

受影响行数

补充

小结